Skip to main content

DBT Package

The Infer dbt package allows you to connect your dbt instance to Infer and use the SQL-inf syntax to perform ML Analytics in your dbt models.

Installation

First you need to setup your Infer account and generate your API key.

Read about how to do that in our Getting your API Credentials section.

Installing the dbt-infer package

To install the dbt-infer package simply run the following within the same shell as you run dbt.

pip install dbt-infer

If you haven't already, you need to have installed the adapter package for your underlying data warehouse. For example, if your data warehouse is BigQuery then you need to also have installed the appropriate dbt-bigquery package, e.g.

pip install dbt-bigquery

Setting up dbt-infer

In order to set up dbt-infer, we need to add a target to your profiles.yml file. A dbt target contains information about your connection to the warehouse, see here for more info.

The dbt-infer adapter requires five fields to be filled out in a new target in your profiles.yml file:

  • url: the URL of Infer API
  • username: your Infer API username
  • apikey your Infer API key
  • data_config: the config for your underlying data warehouse, e.g. BigQuery, Snowflake
  • type: name of the adapter. Must be infer.

The schema to define a target using dbt-infer looks like this:

<profile-name>:
target: <target-name>
outputs:
<target_name>:
url: <infer-api-endpoint>
username: <infer-api-username>
apikey: <infer-apikey>
type: infer
data_config:
<here goes your normal data warehouse config>

For example, if you are using BigQuery as your underlying data warehouse the target might look like this:

my-bigquery-profile:
target: infer-bigquery
infer-bigquery:
apikey: 1234567890abcdef
username: my_name@example.com
url: https://app.getinfer.io
type: infer
data_config:
dataset: my_dataset
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: bq-user-creds.json
location: EU
method: service-account
priority: interactive
project: my-big-query-project
threads: 1
type: bigquery

Usage

You can now use SQL-inf commands within your dbt models!

For example, here is a simple model that uses a previous user_features model to predict churn by predicting the column has_churned. Simply create copy this file into your run directory:

predict_user_churn.sql
{{
config(
materialized = "table"
)
}}

with users as (
select * from {{ ref('user_features') }}
)

SELECT * FROM users PREDICT(has_churned, ignore=user_id)

and run dbt run.

As usual, the result of this is then written into a new table, called predict_user_churn.

Here is another simple example, where we compute the sentiment of customer feedback. based on another model customer_feedback, and write the result into the table customer_sentiment.

customer_sentiment.sql
{{
config(
materialized = "table"
)
}}

SELECT * FROM {{ ref('user_conversations') }} SENTIMENT("user feedback")

How it works

Any models that do not use SQL-inf commands will be executed as normal within your data warehouse.

Models that take advantage of SQL-inf will be executed in 5 steps:

  • The model query will be parsed by Infer. This will tell dbt-infer what data to fetch from your data warehouse.
  • The necessary data will be fetched from your data warehouse.
  • The SQL-inf commands will be executed within Infer.
  • The results will be saved to a temporary table in your data warehouse.
  • The remaining part of the dbt model will be executed as usual within your data warehouse.

This means that

  • Infer does not have access to your data warehouse, since it uses dbt for that
  • Infer does not store any of your data beyond the transaction of executing SQL-inf, as described above

Furthermore, all data is encrypted and secured to the highest standards. Read more about the security of Infer here.

Limitations

  • dbt-infer only supports SQL-inf commands with models that are table materialized.
  • dbt-infer currently does not work with DBT Cloud.
  • SQL-inf cannot be used with in CTEs - instead materialise them into tables as the output of a model and then use that in downstream models