PREDICT
The PREDICT
command lets you run a prediction on a given column using the rest of the column names specified in the SELECT
statement as inputs to the prediction.
This means that what you specify in the column specification part of the SELECT
statement determines the output prediction, so you often have to experiment with what inputs work best for a specific prediction.
Syntax
PREDICT(<column_name> [, options])
column_name
- the column to predict.
Options
model
- Specifies a particular model override. Default is "clf" for classification. Use "reg" for regression.use_automl
- Enables automated machine learning. Default is False.auto_remove
- Automatically removes irrelevant features from the model. Default is True.transform_numeric
- Transforms numeric features for better model performance. Default is True.transform_time
- Transforms time features for better model performance. Default is True.what_if
- Runs a what-if analysis for a given feature. Default is None.ab_test_with
- Allows for A/B testing using a specified ID. Default is None.abc_test_with
- Allows for A/B/C testing using a specified ID. Default is None.max_classes
- Limits the number of classes in classification models. Default is 10.ignore
- Comma-separated list of column names to ignore during prediction. No spaces between names.selector
- A regular expression to subselect columns to be used for prediction.
Returns
What is returned by the model depends on the exact options used.
model='clf'
: In the case of a classification with multiple classes, a column is appended to the input dataset with a column for each class prefixed withprobability_
. For common binary classes (0/1, True/False, Yes/No), we return onlyprobability
, representing the positive class (1, True, Yes).model='reg'
: In the case of a regression, a column is appended to the input dataset with the column nameprediction
.transform_numeric=True
: Additional numeric columns may be added aslog_<column>
if the column spans >2.5 orders of magnitude.transform_time=True
: Additional time columns may be added as<column>_day
,<column>_month
, etc, if the column is a recognised datetime column.what_if=<column>
: Additionalprobability_<column_value>
columns will be added, where every rows'<column>
value has been changed to that value and the probability recalculated. This allows users to dowhat_if
scenarios.ab_test_with=<column>
:probability_ab
andab_group
columns will return, showing the group each row belongs to.A
is the same asprobability
(i.e. the treated group) andB
is a randomly chosen probability from the distribution (i.e. the control group).abc_test_with=<column>
:probability_abc
andabc_group
columns will return, showing the group each row belongs to.A
is the same asprobability
(i.e. the treated group) andB
is a randomly chosen probability from the distribution (i.e. the control group) andC
returnsNone
.
Examples
Basic Prediction
Predict the column name churn
using all other fields in the table customers
.
SELECT * FROM customers PREDICT(churn)
Specify Input and Output
Predict the column name churn
using only the columns country
and age
from the table customers
. Also return the prediction
with these input variables.
SELECT country, age, churn, prediction FROM customers PREDICT(churn)
Ignoring Specific Columns
Predict the column name churn
using the columns country
and age
from the table customers
and ignore the user_id
and name
columns returned by the SELECT
statement. Also return the prediction
with these input variables.
SELECT user_id, name, country, age, churn, prediction FROM customers PREDICT(churn, auto_remove=True)
Using Automated Machine Learning
Enable the automated machine learning feature to predict conversion_rate
in the table marketing_data
.
SELECT * FROM marketing_data PREDICT(conversion_rate, use_automl=True)
What-If Analysis
Perform a what-if analysis for the feature age
while predicting churn
.
SELECT * FROM customers PREDICT(churn, what_if='age')
Running A/B Tests
Run an A/B test using user_id
as the ID while predicting subscription_renewal
in the table subscriptions
.
SELECT * FROM subscriptions PREDICT(subscription_renewal, ab_test_with='user_id')
Running A/B/C Tests
Run an A/B/C test using user_id
as the ID while predicting purchase
in the table e_commerce
.
SELECT * FROM e_commerce PREDICT(purchase, abc_test_with='user_id')
Limiting Number of Classes
Limit the number of classes to 5 while predicting product_category
in the table inventory
.
SELECT * FROM inventory PREDICT(product_category, max_classes=5)
Ignoring Specific Columns
SELECT * FROM customers PREDICT(churn, ignore='user_id,name')
Using Regular Expression for Column Selection
SELECT * FROM customers PREDICT(churn, selector='^age|country$')