Skip to main content

SIMILAR_TO

The SIMILAR_TO commands lets you compute a similarity score between a specific row and all other rows in the data set.

The command will append a new column similarity which has a score for each row between 0 and 1, where 1 indicates that the two rows are identical.

Syntax

SIMILAR_TO(<column_name>=<value>)
  • column_name defines the column name to use for specifying the row to compute similarities for. value defines the value to use. The condition must specific a unique row in the data set.
  • ignore can be used to specify columns (as a comma separated list) returned by the SELECT statement but which you want the SIMILAR_TO to ignore.

Returns

Appends one new column to the input data set: similarity. The column holds a value between 0 and 1 for each row, where 1 is the highest similarity score.

Examples

Computes the similarity between all rows and the row where the column user_id has value 50.

SELECT * FROM companies SIMILAR_TO(user_id=50)