Sending Request...
Sending Request...
Given an input dataset, we use SMACdown to find the best parameters for creating an ensemble from that dataset.
The script uses as inputs, beside the identifier of the dataset, the evaluation metric we maximize (defaulting to average_phi), the objective field and a string used as a prefix when naming intermediate resources created by the workflow. You can select the metric to optimize (see below).
Classification metrics:
average_recall
average_phi
accuracy
average_precision
average_f_measure
Regression metrics:
r_squared
mean_absolute_error
mean_squared_erro
This workflow will generate a big number of auxiliary resources when executed. To instruct the script to delete all of them before finishing set the delete-resources
execution input parameter to true
.
This routine implements K-means clustering using the Pham-Dimov-Nguyen algorithm for choosing the best K.
"Selection of K in K-means clustering", Proc. IMechE, Part C: J. Mechanical Engineering Science, v.219
(best-cluster ...)
(best-cluster dataset cluster-args k)
Inputs:
* dataset
: (string) Dataset ID for the dataset to be clustered
* cluster-args
: (map) cluster function arguments
* k
: (number) number of clusters
Output: (cluster) Created cluster for K
This function is used by (best-k-means ...)
to do a K-means
clustering of the dataset
using the WhizzML cluster function with
the specified K.
This routine implements K-means clustering using the Pham-Dimov-Nguyen algorithm for choosing the best K.
'Selection of K in K-means clustering' Proc. IMechE, Part C: J. Mechanical Engineering Science, v.219
Inputs:
* dataset
: (string) Dataset ID for the dataset to be clustered
* cluster-args
: (map) cluster arguments for the cluster search operation
* k-min
: (number) minimum value of k
* k-max
: (number) maximum value of k
* bestcluster-args
: (map) cluster arguments for the final best cluster operation
* clean
: (boolean) Delete all but the optimal cluster
* logf
: (boolean) Enable logging
Output: (batchcentroid) Batchcentroid for best K-means clustering
This routine uses the Pham-Dimov-Nguyen algorithm to create a WhizzML batchcentroid object
and WhizzML dataset annotated with the best K-means clustering of the
supplied dataset
.
The clusters-args
and bestcluster-args
parameters are maps that
one can use to optionally specify all the parameters for the cluster
function except the dataset
, k
, and name
parameters. (See the
'Clusters Arguments' table in the BigML 'Clusters' documentation for
details.) cluster-args
is used in the search phase for the best
K. bestcluster-args
allows one to specify different args for the
final stage of clustering with the best K. In particular, one might
do clustering on samples of the dataset
during the search phase to
save time and other resources, then do the best clustering on the full
dataset
.
If bestcluster-args
matches cluster-args
, the result for the best
K generated with cluster-args
during the search phase is returned
by (best-k-means ....)
. If bestcluster-args
differs from
cluster-args
, the dataset
is re-clustered with the best K and
that is returned by (best-k-means ....)
.