CLUSTER command clusters the rows in the input data table into groups of rows that are similar.
CLUSTER command takes no inputs, since it uses all input data.
This can be adjusted by using the
ignore option to ignore certain columns when clustering.
The output is a new field
cluster_id, which outputs the label for the cluster of each row in the data.
We also return the columns for the probabilities (
probabilities_) of each data point belonging to a specific cluster.
You can read more about how
CLUSTER works and how to get the best out of it in the tutorial
CLUSTER([, min_cluster_size=<min_cluster_size>, ignore=<column_names>])
ignorecan be used to specify columns (as a comma separated list) returned by the
SELECTstatement but which you want the
min_cluster_sizecan be used to specify columns the minimum size of a cluster.
Appends a new column to the input dataset named
cluster_id which has an integer value and describes for each row
what cluster, or grouping, that row belongs to.
Some points maybe considered outside a grouping, sometimes called noise. These are given the
A column is appended to the input dataset with a column for each class prefixed
Clusters all the rows in the
SELECT * FROM customer WHERE churn=TRUE CLUSTER()