# CORRELATION

The `CORRELATION`

command calculates the correlations in your dataset between each input column, including both numeric can categorical columns.

The `CORRELATION`

command takes no inputs, since it uses all input data.
This can be adjusted by using the `ignore`

option to ignore certain columns.

The output will contain a new field `column`

, that shows which correlation is being made to which input column.
Additionally, the output will contain NxN correlation values, i.e. correlations for all other columns and itself.

Note that input columns are exactly correlated with themselves, so the diagonal part of the result will be a set of ones. This is normal!

Technical details:

- numeric-numeric comparisons: Pearson's standard correlation coefficient
- categorical-categorical: Cramer's V statistic
- numeric-categorical: Correlation ratio

# Syntax

`CORRELATION([, ignore=<column_names>])`

**Options**

`ignore`

can be used to specify columns (as a comma separated list) returned by the`SELECT`

statement but which you want the`CORRELATION`

to ignore.

# Returns

Appends a new column to the input dataset named `column`

which describes which column the comparison is being made to for each row.
A column for each input feature that contain N (number of columns) correlation values, i.e. correlations for all other columns and itself.

# Examples

Get the CORRELATION values in a `customer`

table.

`SELECT * FROM customer CORRELATION`

Ignore the `customer_id`

.

`SELECT * FROM customer CORRELATION(ignore='customer_id')`

Show correlations only for gender.

`SELECT * FROM customer CORRELATION WHERE column='gender'`