Skip to main content

Post-Load (dbt) Transformations

Saddle Data integrates directly with dbt Core to enable powerful post-load transformations. Once your data has been synced to your destination warehouse (Snowflake, Databricks, Postgres, etc.), Saddle Data can automatically trigger a dbt job to model, test, and document your data.

How it Works

When you enable dbt transformations for a Flow, Saddle Data performs the following steps after a successful sync:

  1. Secure Clone: The worker securely clones your dbt project repository using the credentials provided in your Integration.
  2. Profile Generation: A profiles.yml file is automatically generated using the connection details from your Flow's destination. This ensures dbt runs against the exact same environment where your data was just loaded.
  3. Execution: The worker executes dbt build, which runs your models, tests, and snapshots.
  4. Logging: The full output of the dbt run is captured and displayed in the Flow Run history logs.

Configuration

To enable dbt transformations:

  1. Go to the Flow Editor.
  2. Click Add Transformation and select Post-Load (dbt).
  3. Configure the following settings:
    • Repository URL: The HTTPS URL of your git repository (e.g., https://github.com/org/dbt_project.git).
    • Branch: The branch to checkout (defaults to main).
    • Git Integration: Select a GitHub or GitLab integration to provide the necessary access token for cloning private repositories.
    • Project Subpath: (Optional) If your dbt_project.yml is not in the root of the repo, specify the folder path here.
    • dbt Version: Select the version of dbt Core to use (e.g., 1.9.0). The worker comes pre-installed with common versions for fast execution.

Supported Adapters

Saddle Data automatically selects the correct dbt adapter based on your Flow's destination connection. Currently supported adapters include:

  • dbt-snowflake
  • dbt-databricks
  • dbt-postgres

Execution Plane

dbt jobs run on the same execution plane as your Flow:

  • Cloud Mode: Runs on Saddle Data's managed infrastructure.
  • Remote Agent: Runs securely within your own infrastructure (VPC/On-Prem). The Remote Agent handles the git clone and dbt execution locally, ensuring your code and data never leave your environment.