Skip to main content

AB_TEST

The AB_TEST command calculates a number of useful statistics for comparing groups, including statistical significance and a p-value, to determine if the two or more groups are significantly different. This method is most commonly applied to A/B testing, but can be used for any kind of statistical testing. This includes multiple groups, A/B/C testing, and comparing numerical or categorical.

The AB_TEST command takes two inputs: the target column (the column we wish to compare, e.g. conversion, churn, click-through-rate), and the treatment column (i.e. the group they belong to - A/B, gender, location... whatever you wish!). By default, the treatment column is assumed to be treatment, and needs to be specified otherwise.

The output will contain several new columns, which are explained in the Returns sections below.

Technical details:

  • numeric: T-test for the means of two independent samples of scores.
  • categorical: Chi-squared test of independence of variables..

Syntax

AB_TEST(<column_name>, [treatment=<column_name>>])

Options

  • treatment can be used to specify the column defining the group (A/B, gender, etc). Default is treatment.

Returns

Returns several statistical outputs below, allowing you to quickly determine if the groups are significantly different to one another:

  • numeric:
    • count - total number of valid (non-NULL) values in column
    • mean - mean value of outcome for each treatment
    • mean_upper - 95th percentile confidence interval on the mean value of outcome for each treatment
    • mean_lower - 5th percentile confidence interval on the mean value of outcome for each treatment
    • indexing - relative index of the treatment mean compared to all treatments
    • p_value - the p-value calculated by the statistical test (independent t-test)
    • stat_sig - if statistically significant (p_value < 0.05) then returns 1, else 0.
  • categorical:
    • count - total number of valid (non-NULL) values in column
    • percentage - percentage of outcome for each treatment
    • percentage_upper - 95th percentile confidence interval on the percentage of outcome for each treatment
    • percentage_lower - 5th percentile confidence interval on the percentage of outcome for each treatment
    • indexing - relative index of the treatment percentage compared to all treatments
    • p_value - the p-value calculated by the statistical test (chi-squared)
    • stat_sig - if statistically significant (p_value < 0.05) then returns 1, else 0.

Examples

A/B test to see if a new feature increases usage.

SELECT * FROM user AB_TEST(total_hours_active_per_week, treatment=feature_group)

Statistical test to see gender affects the types of products users buy.

SELECT * FROM user AB_TEST(product_category, treatment=gender)