Post-Load (dbt) Transformations
Saddle Data integrates directly with dbt Core to enable powerful post-load transformations. Once your data has been synced to your destination warehouse (Snowflake, Databricks, Postgres, etc.), Saddle Data can automatically trigger a dbt job to model, test, and document your data.
How it Works
When you enable dbt transformations for a Flow, Saddle Data performs the following steps after a successful sync:
- Secure Clone: The worker securely clones your dbt project repository using the credentials provided in your Integration.
- Profile Generation: A
profiles.ymlfile is automatically generated using the connection details from your Flow's destination. This ensures dbt runs against the exact same environment where your data was just loaded. - Execution: The worker executes
dbt build, which runs your models, tests, and snapshots. - Logging: The full output of the dbt run is captured and displayed in the Flow Run history logs.
Configuration
To enable dbt transformations:
- Go to the Flow Editor.
- Click Add Transformation and select Post-Load (dbt).
- Configure the following settings:
- Repository URL: The HTTPS URL of your git repository (e.g.,
https://github.com/org/dbt_project.git). - Branch: The branch to checkout (defaults to
main). - Git Integration: Select a GitHub or GitLab integration to provide the necessary access token for cloning private repositories.
- Project Subpath: (Optional) If your
dbt_project.ymlis not in the root of the repo, specify the folder path here. - dbt Version: Select the version of dbt Core to use (e.g.,
1.9.0). The worker comes pre-installed with common versions for fast execution.
- Repository URL: The HTTPS URL of your git repository (e.g.,
Supported Adapters
Saddle Data automatically selects the correct dbt adapter based on your Flow's destination connection. Currently supported adapters include:
- dbt-snowflake
- dbt-databricks
- dbt-postgres
Execution Plane
dbt jobs run on the same execution plane as your Flow:
- Cloud Mode: Runs on Saddle Data's managed infrastructure.
- Remote Agent: Runs securely within your own infrastructure (VPC/On-Prem). The Remote Agent handles the git clone and dbt execution locally, ensuring your code and data never leave your environment.