Skip to main content

PREDICT

The PREDICT command lets you run a prediction on a given column using the rest of the column names specified in the SELECT statement as inputs to the prediction.

This means that what you specify in the column specification part of the SELECT statement determines the output prediction, so you often have to experiment with what inputs work best for a specific prediction.

Syntax

PREDICT(<column_name> [, options])
  • column_name - the column to predict.

Options

  • model - Specifies a particular model override. Default is "clf" for classification. Use "reg" for regression.
  • use_automl - Enables automated machine learning. Default is False.
  • auto_remove - Automatically removes irrelevant features from the model. Default is True.
  • transform_numeric - Transforms numeric features for better model performance. Default is True.
  • transform_time - Transforms time features for better model performance. Default is True.
  • what_if - Runs a what-if analysis for a given feature. Default is None.
  • ab_test_with - Allows for A/B testing using a specified ID. Default is None.
  • abc_test_with - Allows for A/B/C testing using a specified ID. Default is None.
  • max_classes - Limits the number of classes in classification models. Default is 10.
  • ignore - Comma-separated list of column names to ignore during prediction. No spaces between names.
  • selector - A regular expression to subselect columns to be used for prediction.

Returns

What is returned by the model depends on the exact options used.

  • model='clf': In the case of a classification with multiple classes, a column is appended to the input dataset with a column for each class prefixed with probability_. For common binary classes (0/1, True/False, Yes/No), we return only probability, representing the positive class (1, True, Yes).
  • model='reg': In the case of a regression, a column is appended to the input dataset with the column name prediction.
  • transform_numeric=True: Additional numeric columns may be added as log_<column> if the column spans >2.5 orders of magnitude.
  • transform_time=True: Additional time columns may be added as <column>_day, <column>_month, etc, if the column is a recognised datetime column.
  • what_if=<column>: Additional probability_<column_value> columns will be added, where every rows' <column> value has been changed to that value and the probability recalculated. This allows users to do what_if scenarios.
  • ab_test_with=<column>: probability_ab and ab_group columns will return, showing the group each row belongs to. A is the same as probability (i.e. the treated group) and B is a randomly chosen probability from the distribution (i.e. the control group).
  • abc_test_with=<column>: probability_abc and abc_group columns will return, showing the group each row belongs to. A is the same as probability (i.e. the treated group) and B is a randomly chosen probability from the distribution (i.e. the control group) and C returns None.

Examples

Basic Prediction

Predict the column name churn using all other fields in the table customers.

SELECT * FROM customers PREDICT(churn)

Specify Input and Output

Predict the column name churn using only the columns country and age from the table customers. Also return the prediction with these input variables.

SELECT country, age, churn, prediction FROM customers PREDICT(churn)

Ignoring Specific Columns

Predict the column name churn using the columns country and age from the table customers and ignore the user_id and name columns returned by the SELECT statement. Also return the prediction with these input variables.

SELECT user_id, name, country, age, churn, prediction FROM customers PREDICT(churn, auto_remove=True)

Using Automated Machine Learning

Enable the automated machine learning feature to predict conversion_rate in the table marketing_data.

SELECT * FROM marketing_data PREDICT(conversion_rate, use_automl=True)

What-If Analysis

Perform a what-if analysis for the feature age while predicting churn.

SELECT * FROM customers PREDICT(churn, what_if='age')

Running A/B Tests

Run an A/B test using user_id as the ID while predicting subscription_renewal in the table subscriptions.

SELECT * FROM subscriptions PREDICT(subscription_renewal, ab_test_with='user_id')

Running A/B/C Tests

Run an A/B/C test using user_id as the ID while predicting purchase in the table e_commerce.

SELECT * FROM e_commerce PREDICT(purchase, abc_test_with='user_id')

Limiting Number of Classes

Limit the number of classes to 5 while predicting product_category in the table inventory.

SELECT * FROM inventory PREDICT(product_category, max_classes=5)

Ignoring Specific Columns

SELECT * FROM customers PREDICT(churn, ignore='user_id,name')

Using Regular Expression for Column Selection

SELECT * FROM customers PREDICT(churn, selector='^age|country$')