Skip to main content

Getting Started with Infer for DBT

Generate API key

The first thing to do is to sign in to Infer in order to generate your API key, needed to setup the dbt-infer adapter.

  1. Go to Infer and follow the sign in process
  2. Once signed in go to Account
  3. Click "Generate" next to the "API Token" field and copy the token to the clipboard.

Install the dbt-infer adapter

The dbt-infer adapter is maintained via PyPi and installed with pip.

To install the latest dbt-infer package simply run the following within the same shell as you run dbt.

pip install dbt-infer

Versioning of dbt-infer follows the standard dbt versioning scheme - meaning if you are using dbt 1.2 the corresponding dbt-infer will be named 1.2.x where is the latest minor version number.

If you haven't already, you need to have installed the adapter package for your underlying data warehouse. For example, if your data warehouse is BigQuery then you need to also have installed the appropriate dbt-bigquery package, e.g.

pip install dbt-bigquery

Configure dbt-infer

In order to set up dbt-infer, we need to add a target to your profiles.yml file. A dbt target contains information about your connection to the warehouse, see here for more info.

The dbt-infer adapter requires five fields to be filled out in a new target in your profiles.yml file:

  • url: the URL of Infer API
  • username: your Infer API username
  • apikey your Infer API key
  • data_config: the config for your underlying data warehouse, e.g. BigQuery, Snowflake
  • type: name of the adapter. Must be infer.

The schema to define a target using dbt-infer looks like this:

<profile-name>:
target: <target-name>
outputs:
<target_name>:
url: <infer-api-endpoint>
username: <infer-api-username>
apikey: <infer-apikey>
type: infer
data_config:
<here goes your normal data warehouse config>

For example, if you are using BigQuery as your underlying data warehouse the target might look like this:

my-bigquery-profile:
target: infer-bigquery
infer-bigquery:
apikey: 1234567890abcdef
username: my_name@example.com
url: https://app.getinfer.io
type: infer
data_config:
dataset: my_dataset
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: bq-user-creds.json
location: EU
method: service-account
priority: interactive
project: my-big-query-project
threads: 1
type: bigquery

All Done! Time for ML Analytics!

That's it! You can now use SQL-inf in your dbt models.


Using SQL-inf

You do not need to change anything in your existing DBT models when switching to use SQL-inf they will all work the same as before but you now have the ability to use SQL-inf commands as native SQL functions.

To get you started we will give a brief example of what such a model might look like. You can find other more complex examples on the Examples page, including a GitHub repo with all of the examples in one place. Feel free to clone this repo to get started from scratch! Otherwise follow our examples below.

In our simple example, we will show how to use a previous user_features model to predict churn by predicting the column has_churned in our new model predict_user_churn.

predict_user_churn.sql
{{
config(
materialized = "table"
)
}}

with predict_user_churn_input as (
select * from {{ ref('user_features') }}
)

SELECT * FROM predict_user_churn_input PREDICT(has_churned, ignore=user_id)

As usual, the result of this is written into a new table, called predict_user_churn.

Not that we ignore user_id from the prediction. This is because we think that the user_id might, and should, not influence our prediction of churn, so we remove it. We also use the convention of pulling together the inputs for our prediction in a CTE, named predict_user_churn_input. This is not needed, you could do this within the final SELECT-statement, but using CTEs is good practice as it makes the model easier to understand and edit.


How it works

The dbt-infer adapter splits the execution of your models into two steps:

  • Any models that do not use SQL-inf commands will be executed as normal within your data warehouse.

  • Models that take advantage of SQL-inf will be executed in 5 steps:

    • The model query will be parsed by Infer. This will tell dbt-infer what data to fetch from your data warehouse.
    • The necessary data will be fetched from your data warehouse.
    • The SQL-inf commands will be executed within Infer.
    • The results will be saved to a temporary table in your data warehouse.
    • The remaining part of the dbt model will be executed as usual within your data warehouse.

The 5 steps of the SQL-inf execution can be visualised in the following way:

Security

The execution model of SQL-inf means that:

  • Infer does not have access to your data warehouse, since it uses dbt and your existing data warehouse connection for that
  • Infer does not store any of your data beyond the lifetime of an SQL-inf transactions

Furthermore, all data handled by Infer is encrypted and secured to the highest standards.

You can read more about the security of Infer here.


Limitations

  • dbt-infer only supports SQL-inf commands with models that are table materialized, ie you cannot use SQL-inf commands in a model that is materialized as a view.
  • dbt-infer currently does not work with DBT Cloud.
  • SQL-inf cannot be used within CTEs - instead materialise them into tables as the output of a model and then use that in downstream models