Skip to main content

DESCRIBE

The DESCRIBE command summarises your dataset using different useful metrics, so you can get a quick overview without deep analysis.

The DESCRIBE command takes no inputs, since it uses all input data. This can be adjusted by using the ignore option to ignore certain columns.

The output is a new field description, which describes the summary metric being used for each row. Additionally, the output will contain summaries of the numeric columns in your dataset (using option data_type=numeric, which is the default), or the categorical columns (data_type=categorical).

Syntax

DESCRIBE([, data_type=<data_type>, ignore=<column_names>])

Options

  • ignore can be used to specify columns (as a comma separated list) returned by the SELECT statement but which you want the DESCRIBE to ignore.
  • data_type can be used to specify which data type to analyse. Must be one of categorical or numeric. numeric is default.

Returns

Appends a new column to the input dataset named description which describes the summary metric being used for each row. A column for each input feature with type data_type (numeric or categorical) with a summary statistic defined by the description for each row. The output columns are named after the input columns.

description metrics are below:

  • numeric:
    • count - total number of valid (non-NULL) values in column
    • mean - mean
    • std - standard deviation
    • min - minimum value
    • max - maximum value
    • 10% - 10th percentile value
    • 25% - 25th percentile value
    • 50% - median value
    • 75% - 75th percentile value
    • 90% - 90th percentile value
  • categorical:
    • count - total number of valid (non-NULL) values in column
    • unique - number of unique values
    • top - the most frequent value
    • freq - frequency of the top value
    • first - (timestamps only) - first timestamp
    • last - (timestamps only) - last timestamp

Examples

DESCRIBE the numeric values in a customer table.

SELECT * FROM customer DESCRIBE

DESCRIBE the numeric values explicitly in a customer table.

SELECT * FROM customer DESCRIBE(data_type='numeric')

DESCRIBE the categorical values in a customer table.

SELECT * FROM customer DESCRIBE(data_type='categorical')