The idea behind this script is to take a dataset as input and return a "clean" dataset with no missing values (except possibly in the objective) and only "preferred" fields.
The script "completes" missing fields by using predictive models to impute value where they are missing. The result is a dataset with the columns containing missing values replaced by columns with the missing values imputed. In addition, for each completed column, we add a binary column indicating whether or not the value was missing in the original dataset. Finally, we also remove non-preferred columns.
The idea behind this script is to take a dataset as input and return a "clean" dataset with no missing values (except possibly in the objective) and only "preferred" fields.
The script "completes" missing fields by using predictive models to impute value where they are missing. The result is a dataset with the columns containing missing values replaced by columns with the missing values imputed. In addition, for each completed column, we add a binary column indicating whether or not the value was missing in the original dataset. Finally, we also remove non-preferred columns.
The idea behind this script is to take a dataset as input and return a "clean" dataset with no missing values (except possibly in the objective) and only "preferred" fields.
The script "completes" missing fields by using predictive models to impute value where they are missing. The result is a dataset with the columns containing missing values replaced by columns with the missing values imputed. In addition, for each completed column, we add a binary column indicating whether or not the value was missing in the original dataset. Finally, we also remove non-preferred columns.