Getting Started with Infer for DBT
Generate API key
The first thing to do is to sign in to Infer in order to generate your API key, needed to setup the dbt-infer
adapter.
- Go to Infer and follow the sign in process
- Once signed in go to Account
- Click "Generate" next to the "API Token" field and copy the token to the clipboard.
Install the dbt-infer adapter
The dbt-infer
adapter is maintained via PyPi and installed with pip
.
To install the latest dbt-infer
package simply run the following within the same shell as you run dbt.
pip install dbt-infer
Versioning of dbt-infer
follows the standard dbt versioning scheme - meaning if you are using dbt 1.2 the
corresponding dbt-infer
will be named 1.2.x where is the latest minor version number.
If you haven't already, you need to have installed the adapter package for your underlying data warehouse.
For example, if your data warehouse is BigQuery then you need to also have installed the appropriate dbt-bigquery
package, e.g.
pip install dbt-bigquery
Configure dbt-infer
In order to set up dbt-infer
, we need to add a target to your profiles.yml
file. A dbt target contains information about your connection to the warehouse, see here for more info.
The dbt-infer
adapter requires five fields to be filled out in a new target in your profiles.yml
file:
url
: the URL of Infer APIusername
: your Infer API usernameapikey
your Infer API keydata_config
: the config for your underlying data warehouse, e.g. BigQuery, Snowflaketype
: name of the adapter. Must beinfer
.
The schema to define a target using dbt-infer
looks like this:
<profile-name>:
target: <target-name>
outputs:
<target_name>:
url: <infer-api-endpoint>
username: <infer-api-username>
apikey: <infer-apikey>
type: infer
data_config:
<here goes your normal data warehouse config>
For example, if you are using BigQuery as your underlying data warehouse the target might look like this:
my-bigquery-profile:
target: infer-bigquery
infer-bigquery:
apikey: 1234567890abcdef
username: my_name@example.com
url: https://app.getinfer.io
type: infer
data_config:
dataset: my_dataset
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: bq-user-creds.json
location: EU
method: service-account
priority: interactive
project: my-big-query-project
threads: 1
type: bigquery
All Done! Time for ML Analytics!
That's it! You can now use SQL-inf in your dbt models.
Using SQL-inf
You do not need to change anything in your existing DBT models when switching to use SQL-inf – they will all work the same as before – but you now have the ability to use SQL-inf commands as native SQL functions.
To get you started we will give a brief example of what such a model might look like. You can find other more complex examples on the Examples page, including a GitHub repo with all of the examples in one place. Feel free to clone this repo to get started from scratch! Otherwise follow our examples below.
In our simple example, we will show how to use a previous user_features
model to predict churn
by predicting the column has_churned
in our new model predict_user_churn
.
{{
config(
materialized = "table"
)
}}
with predict_user_churn_input as (
select * from {{ ref('user_features') }}
)
SELECT * FROM predict_user_churn_input PREDICT(has_churned, ignore=user_id)
As usual, the result of this is written into a new table, called predict_user_churn
.
Not that we ignore user_id
from the prediction.
This is because we think that the user_id
might, and should, not influence our prediction of churn, so we remove it.
We also use the convention of pulling together the inputs for our prediction in a CTE, named predict_user_churn_input
.
This is not needed, you could do this within the final SELECT
-statement, but using CTEs is good practice
as it makes the model easier to understand and edit.
How it works
The dbt-infer
adapter splits the execution of your models into two steps:
Any models that do not use SQL-inf commands will be executed as normal within your data warehouse.
Models that take advantage of SQL-inf will be executed in 5 steps:
- The model query will be parsed by Infer. This will tell
dbt-infer
what data to fetch from your data warehouse. - The necessary data will be fetched from your data warehouse.
- The SQL-inf commands will be executed within Infer.
- The results will be saved to a temporary table in your data warehouse.
- The remaining part of the dbt model will be executed as usual within your data warehouse.
- The model query will be parsed by Infer. This will tell
The 5 steps of the SQL-inf execution can be visualised in the following way:
Security
The execution model of SQL-inf means that:
- Infer does not have access to your data warehouse, since it uses dbt and your existing data warehouse connection for that
- Infer does not store any of your data beyond the lifetime of an SQL-inf transactions
Furthermore, all data handled by Infer is encrypted and secured to the highest standards.
You can read more about the security of Infer here.
Limitations
dbt-infer
only supports SQL-inf commands with models that are table materialized, ie you cannot use SQL-inf commands in a model that is materialized as a view.dbt-infer
currently does not work with DBT Cloud.- SQL-inf cannot be used within CTEs - instead materialise them into tables as the output of a model and then use that in downstream models