This script implements feature selection using
a version of the Boruta algorithm
to detect important and unimportant fields in your dataset. The algorithm:
Retrieves the dataset information.
Creates a new extended dataset. In the new dataset, each field has a corresponding shadow field which has the same type but contains a random sample of the values contained in the original one.
Creates a random forest from the extended dataset.
Extracts the maximum of the importances for the shadow fields.
Uses this maximum plus (minus) a minimum gain as threshold. Any of the original fields scoring less than the minimal threshold are considered unimportant and fields scoring more than the maximum threshold are considered important.
Fields marked as unimportant are removed from the list of fields to be used as input fields for new datasets.
The procedure is repeated, and a new extended dataset is created with the remaining fields. The process stops when it reaches the user-given number of runs or when all the original fields in the dataset are marked as important or unimportant.
When iteration stops, a new dataset is created where unimportant fields have been removed.
This script implements feature selection using
a version of the Boruta algorithm
to detect important and unimportant fields in your dataset. The algorithm:
Retrieves the dataset information.
Creates a new extended dataset. In the new dataset, each field has a corresponding shadow field which has the same type but contains a random sample of the values contained in the original one.
Creates a random forest from the extended dataset.
Extracts the maximum of the importances for the shadow fields.
Uses this maximum plus (minus) a minimum gain as threshold. Any of the original fields scoring less than the minimal threshold are considered unimportant and fields scoring more than the maximum threshold are considered important.
Fields marked as unimportant are removed from the list of fields to be used as input fields for new datasets.
The procedure is repeated, and a new extended dataset is created with the remaining fields. The process stops when it reaches the user-given number of runs or when all the original fields in the dataset are marked as important or unimportant.
When iteration stops, a new dataset is created where unimportant fields have been removed.
The output of the script is a dataset ID.
This script implements feature selection using
a version of the Boruta algorithm
to detect important and unimportant fields in your dataset. The algorithm:
Retrieves the dataset information.
Creates a new extended dataset. In the new dataset, each field has a corresponding shadow field which has the same type but contains a random sample of the values contained in the original one.
Creates a random forest from the extended dataset.
Extracts the maximum of the importances for the shadow fields.
Uses this maximum plus (minus) a minimum gain as threshold. Any of the original fields scoring less than the minimal threshold are considered unimportant and fields scoring more than the maximum threshold are considered important.
Fields marked as unimportant are removed from the list of fields to be used as input fields for new datasets.
The procedure is repeated, and a new extended dataset is created with the remaining fields. The process stops when it reaches the user-given number of runs or when all the original fields in the dataset are marked as important or unimportant.
When iteration stops, a new dataset is created where unimportant fields have been removed.