DBT Package
The Infer dbt package allows you to connect your dbt instance to Infer and use the SQL-inf syntax to perform ML Analytics in your dbt models.
Installation
First you need to setup your Infer account and generate your API key.
Read about how to do that in our Getting your API Credentials section.
Installing the dbt-infer package
To install the dbt-infer
package simply run the following within the same shell as you run dbt.
pip install dbt-infer
If you haven't already, you need to have installed the adapter package for your underlying data warehouse.
For example, if your data warehouse is BigQuery then you need to also have installed the appropriate dbt-bigquery
package, e.g.
pip install dbt-bigquery
Setting up dbt-infer
In order to set up dbt-infer
, we need to add a target to your profiles.yml
file. A dbt target contains information about your connection to the warehouse, see here for more info.
The dbt-infer
adapter requires five fields to be filled out in a new target in your profiles.yml
file:
url
: the URL of Infer APIusername
: your Infer API usernameapikey
your Infer API keydata_config
: the config for your underlying data warehouse, e.g. BigQuery, Snowflaketype
: name of the adapter. Must beinfer
.
The schema to define a target using dbt-infer
looks like this:
<profile-name>:
target: <target-name>
outputs:
<target_name>:
url: <infer-api-endpoint>
username: <infer-api-username>
apikey: <infer-apikey>
type: infer
data_config:
<here goes your normal data warehouse config>
For example, if you are using BigQuery as your underlying data warehouse the target might look like this:
my-bigquery-profile:
target: infer-bigquery
infer-bigquery:
apikey: 1234567890abcdef
username: my_name@example.com
url: https://app.getinfer.io
type: infer
data_config:
dataset: my_dataset
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: bq-user-creds.json
location: EU
method: service-account
priority: interactive
project: my-big-query-project
threads: 1
type: bigquery
Usage
You can now use SQL-inf commands within your dbt models!
For example, here is a simple model that uses a previous user_features
model to predict churn
by predicting the column has_churned
. Simply create copy this file into your run directory:
{{
config(
materialized = "table"
)
}}
with users as (
select * from {{ ref('user_features') }}
)
SELECT * FROM users PREDICT(has_churned, ignore=user_id)
and run dbt run
.
As usual, the result of this is then written into a new table, called predict_user_churn
.
Here is another simple example, where we compute the sentiment of customer feedback. based on
another model customer_feedback
, and write the result into the table customer_sentiment
.
{{
config(
materialized = "table"
)
}}
SELECT * FROM {{ ref('user_conversations') }} SENTIMENT("user feedback")
How it works
Any models that do not use SQL-inf commands will be executed as normal within your data warehouse.
Models that take advantage of SQL-inf will be executed in 5 steps:
- The model query will be parsed by Infer. This will tell
dbt-infer
what data to fetch from your data warehouse. - The necessary data will be fetched from your data warehouse.
- The SQL-inf commands will be executed within Infer.
- The results will be saved to a temporary table in your data warehouse.
- The remaining part of the dbt model will be executed as usual within your data warehouse.
This means that
- Infer does not have access to your data warehouse, since it uses dbt for that
- Infer does not store any of your data beyond the transaction of executing SQL-inf, as described above
Furthermore, all data is encrypted and secured to the highest standards. Read more about the security of Infer here.
Limitations
dbt-infer
only supports SQL-inf commands with models that are table materialized.dbt-infer
currently does not work with DBT Cloud.- SQL-inf cannot be used with in CTEs - instead materialise them into tables as the output of a model and then use that in downstream models