Detecting Unusual Instances
Find the top anomalous instances in your dataset and easily select or filter them. This technique is widely used for fraud detection, data cleansing tasks, predictive maintenance, and intrusion detection, among others.
Anomaly Detection helps identify outliers in your data. The BigML platform provides one of the most effective, state-of-the-art methods to detect unusual patterns that may point out fraud or data quality issues without the need for labeled data. This unsupervised learning technique assigns a score to each instance of your dataset between 0% and 100%, where a score of 60% or above usually points to outliers. In some cases, isolating and removing outliers may result in notable accuracy and evaluation performance improvements in classification and regression models.Sign up now! It's free!
Anomaly detection is extensively used in fraud detection, predictive maintenance, intrusion detection or any other kind of unusual behavior where the practitioner is looking to find a proverbial needle in a haystack. Anomaly detection is also handy in identifying data quality issues so you can clean your data and reduce noise before training any models.
BigML offers an optimized implementation of the Isolation Forest algorithm, a highly scalable method competitive with the state-of-the-art anomaly detection. Anomaly detection based on this approach has empirically performed significantly better than other state-of-the-art methods. Because this is an unsupervised method, there is no need to label your data. BigML Anomaly Detection is very robust to noise, highly efficient in terms of computational costs, almost parameter-free, and can handle numeric and categorical data with or without missing values.
BigML Anomalies not only identify the anomalous instances but also highlight which features contribute the most to the identified anomalies. BigML makes this possible by automatically computing the relative importances of each feature on the anomaly scores. You can conveniently visualize your feature importance rankings by using the histograms in the accompanying data panel which lets you introspect each feature's impact on any anomalous instance.
You have many options in putting your Anomaly Detector to use when it comes to scoring. You can score individual data points against your model by using the BigML Dashboard. BigML will provide you with an anomaly score as a percentage, where higher scores reflects greater anomalies. You can also score multiple instances through BigML's Batch Anomaly Score feature. The output can be downloaded as a customizable .csv file that contains the scores and optionally the field importances. You can also ask BigML to create a new dataset that excludes (or only includes) the anomalous instances for further analyses.
In addition to the point-and-click mode on BigML Dashboard, you can create and manage your Anomaly Detectors programmatically via BigML's REST API and bindings for most popular programming languages e.g., Python, Node.js, Java, Swift, C#. Because BigML Dashboard is built on the same API, any model you train and and every prediction you make from those models are also accessible from the Dashboard immediately. Anomaly Detectors are also supported by WhizzML, our domain-specific language for automating Machine Learning workflows, implementing high-level Machine Learning algorithms, and sharing them with others.