The idea behind this script is to take a dataset as input and return a "clean" dataset with no missing values (except possibly in the objective) and only "preferred" fields.
The script "completes" missing fields by using predictive models to impute value where they are missing. The result is a dataset with the columns containing missing values replaced by columns with the missing values imputed. In addition, for each completed column, we add a binary column indicating whether or not the value was missing in the original dataset. Finally, we also remove non-preferred columns.
This is a simple script that, given an input dataset, creates an anomaly detector and uses it to identify its top anomalous rows, proceeding then to create a new dataset without them using a Flatline filter.
This script applies a simple transformation to a dataset in order to cast time series forecasting as a supervised learning problem.
The input dataset should include one or more time series as numeric fields. For each numeric field in the input, the script generates additional numeric fields containing row-shifted values of the original field. The user may then use the shifted values as predictors in any supervised learning model.
In addition to the input dataset id, the script takes as input two integers defining the limits of the sliding window, i.e. the minimum and maximum row shifts to consider. Naturally, to implement a forecasting model, the maximum shift should be less than 0 in order to only consider past values as inputs.
Check this readme for an example of the application.
This script implements a dataset generator that, given an input dataset and an item field in it, creates a new dataset with a column for each of the items in the field. Each column is an indicator of whether the value of the field in the instance contains the item denoted the corresponding column.
Generation of the new columns is accomplished via Flatline. This WhizzML script needs only to construct the adequate Flatline string and send it to BigML's dataset service. The expression is simple: a list of checks, one for each possible item, using the Flatline built-in contains-items?.