The objective of this script is to perform a k-fold cross validation of a
logistic regression built from a dataset. The algorithm:
Divides the dataset in k parts
Holds out the data in one of the parts and builds a logistic regression
with the rest of data
Evaluates the logistic regression with the hold out data
The second and third steps are repeated with each of the k parts, so that
k evaluations are generated
Finally, the evaluation metrics are averaged to provide the cross-validation
metrics.
The output of the script will be an evaluation ID. This evaluation is a
cross-validation, meaning that its metrics are averages of the k evaluations
created in the cross-validation process.
The objective of this script is to perform a k-fold cross validation of a
deepnet built from a dataset. The algorithm:
Divides the dataset in k parts
Holds out the data in one of the parts and builds a deepnet
with the rest of data
Evaluates the deepnet with the hold out data
The second and third steps are repeated with each of the k parts, so that
k evaluations are generated
Finally, the evaluation metrics are averaged to provide the cross-validation
metrics.
The output of the script will be an evaluation ID. This evaluation is a
cross-validation, meaning that its metrics are averages of the k evaluations
created in the cross-validation process.
Best-first feature selection with cross-validationpgonzalezcarrizo
Find the best features for modeling using a greedy algorithm. Extends the best-first feature selection script that only worked with models and used split-evaluation
Given a dataset, encodes categorical fields using ordinal encoding, which uses a single column of integers to represent field classes (levels). It then creates a new dataset, with additional fields containing ordinal encodings of the categorical fields.
If classes have a known order (such as Like, Somewhat Like, Neutral, Somewhat Dislike, and Dislike), the integer mapping can be supplied; otherwise, integers are assigned by class count, in descending order (in the case of ties, classes are ordered alphabetically).
The script is meant for datasets that contain images. It transforms the information in the dataset to an editable source, where users can inspect images entirely and see or update its labels and regions. Datasets created using a batch prediction will also add the prediction fields, but their names might be changed to avoid duplicated field names.
A score_threshold parameter has been added to allow users to filter predicted regions. Regions whose score value are below the score_threshold will be discarded.