Development mode is required
You can change your settings in your account page.
Production mode is required
You can change your settings in your account page.
A dataset is a structured version of a source where each field has been processed and serialized according to its type. A field can be numeric, categorical, text, or date-time.
BigML computes a histogram for each numeric or categorical field (and very soon for text fields too). Date-time automatically generated fields (day, month, year, etc) also have associated histograms.
A model represents a set of correlation patterns automatically inferred from the statistical relationships across the fields in your dataset. A model can be used individually, or in groups called ensembles, to solve a variety of classification and regression tasks.
You can explore, filter, and export models or incorporate them into your own smart application using our API.
You can a use a model (or ensemble) to make predictions. That is, to find the category or expected value of the objective field for new instances. BigML will automatically generate, interactive forms to help you input new data using the fields from a model or ensemble.
Each prediction comes with a confidence value that measures the model's certainty on the prediction.
An ensemble is composed of multiple models over different subsamples of your dataset.
Ensembles generalize better to new data and can help reduce the error of single machine-learned models that might overfit your data.
You can create ensembles in just one click or select the number of models or the level of parallelism used to create them.
An evaluation provides an easy way to estimate the performance of a predictive model.
You can evaluate both single models and ensembles and also compare evaluations.
Depending on the task solved by the model or ensemble (i.e., regression or classification) different performance measures will be computed.
Set up a Source To create a new source, just drag and drop your data file onto BigML's interface, or select the file you want to use with the upload icon. Sources can be created from almost any tabular data (csv or arff files). You can gzip (.gz) or compress (.bz2) them to save bandwidth. You can also create sources from remote locations using protocols such as HTTP(S), s3, azure, or odata. You can upload files of up to 64GB or up to 5TB if you use remote S3 buckets. Once your source has been created, you can use a configuration panel to update types, names, labels, descriptions, and other parsing preferences. How to create a source
A source view will display the initial part of your data source using a table with a row for each field (column) in your file and the first 25 instances (rows) of your file in a column each. That is, we provide a column-wise view of the first 25 rows of your input data to help inspect sources with hundreds of fields. You can set different "parsing options" like locale, separator, missing tokens, or data types in the configuration panel.
- Create a dataset in just one click.
- Use a configuration panel to select specific fields of your source to create a dataset.
Create a Dataset To create a new dataset, just use the 1-click dataset button from a source view if you want to include all the fields and the complete source, or use the configure dataset panel to select a few specific fields or limit the total size of data to analyze. BigML will start computing the distribution of values for each of the fields in your dataset. This process can take from a few seconds to a few hours depending on the size of your data. As your dataset is being created, BigML shows you a visualization that gives you immediate feedback about your data. How to create a dataset
A dataset view will show you a table with the number of instances, missing values, errors, and a histogram for each field in your dataset. You can mouse over histograms to get more specific information. For each numeric field, the minimum, the mean, the median, maximum, and the standard deviation are also computed.
- Split your dataset into training and test sets.
- Create a model in just one click.
- Use a configuration panel to select which field to predict, which input fields, and how many instances to use.
Create a Model To create a model, just use the 1-click model from a dataset view if you want to use all active fields and instances in your dataset to generate a model that predicts (i.e., has as objective field) the last column in the input data. You can also use a configure model panel to select a different objective field, specific fields, or sample your dataset using multitude of options. Your dataset will be processed to build a predictive model that will not only show you the most relevant patterns in your data but also will allow you to generate predictions for new data instances. How to create a model
A model view provides a great visual tree representation of your data—an easy way of understanding and interacting with it. You can adjust both the level of support and confidence (or expected error in regression models) to discover frequent or rare interesting patterns or filter the patterns that lead to a specific prediction. At every node, the model shows the most likely prediction together with the level of confidence. You can also visualize a model as a sunburst and color it by prediction or confidence to ease the discovery of insights in your data.
- Download your model in a variety programming languages or PMML
- Generate predictions using automatically generated web forms
- Evaluate your model to get an estimate of its predictive power
Generate Predictions To create a prediction, you can either use an automatically generated web form or a question by question interface. Just click the predict or predict question buttons and you'll be presented with the corresponding interfaces. When the number of input fields is big or you want to automate the generation of predictions you can use our API. Stay tuned for our upcoming High Performance Prediction Servers that will allow you to generate thousands of predictions per second. How to make predictions
A prediction view shows you a form to input a value for each field. Once you click the Predict button, you will get a prediction for the model's objective field together with the level of confidence (or expected error for regression models). You can give a name to each prediction and save it for further use.
- Generate question by question predictions.
Create Ensembles To create an ensemble, just use the one-click ensemble button from a dataset view if you want to use all the fields and all the instances in your dataset to predict the last field (objective field). You can also use a configure ensemble panel to change model-specific parameters as well as the total number of models in the ensemble, its type and the degree of parallelization used to build it. The ensemble will build as many models as you request using the level of parallelism selected or according to your subscription plan.
The ensemble view will display the configuration options for your ensemble and a table with a row per model in the ensemble. You will be able to inspect each model individually and see its data and predicted distributions.
- Generate a prediction using your new ensemble.
- Evaluate your ensemble to estimate its performance generating predictions.
Evaluate Models or Ensembles To evaluate a model (or ensemble) just click on the evaluate menu option from either the model (or ensemble) view or the dataset view. A configuration panel will help you make sure that the fields from the model and dataset match if they weren't created from the same source. You can also select a sample from the dataset. Several performance measures (accuracy, precision, recall, f-measure, and Phi) will be computed for classification models for each one of the classes and also the corresponding average measure for the whole model. In the case of regression models, the mean absolute error, the mean squared error, and the R squared will be computed.
An evaluation will show you the details of the model (or ensemble) and the dataset used and will display a number of performance measures computed depending on the type (classification or regression) of the model or ensemble. You will also be able to compare the performance of a model (or ensemble) versus a mode-based (or mean-based) model and a random model. An evaluation can also be compared against a previous evaluation if both have been performed using the same dataset and the same sampling.
- Compare evaluations.