Infrastructure as Code (IaC)

Saddle Data is built with Infrastructure as Code (IaC) as a first-class citizen. While the web dashboard is excellent for interactive exploration, power users and enterprise teams can manage their entire data stack programmatically.

The Declarative Model

Saddle Data uses a declarative YAML specification to define your data environment. Instead of calling multiple API endpoints to create individual resources, you define the desired state of your Integrations, Connections, and Flows in a single file and "apply" it.

Example Configuration

version: "1"

# Define reusable authorizations
integrations:
  - name: "prod-db-auth"
    type: "postgres"

# Define specific data sources and destinations
connections:
  - name: "ecommerce-source"
    integration: "prod-db-auth"
    type: "postgres"
    capability: "source"
    config:
      host: "db.example.com"
      database: "orders"
      user: "saddle_reader"

# Define the data pipeline
flows:
  - name: "Orders to Warehouse"
    source_connection: "ecommerce-source"
    destination_connection: "bigquery-dest"
    schedule: "0 * * * *" # Hourly
    config:
      mappings:
        - source_entity: "public.orders"
          dest_entity: "raw_orders"
          sync_mode: "incremental_deduped"
          cursor_field: "updated_at"
          primary_key: "id"

The "Apply" Workflow

To update your configuration, you send the YAML file to the apply endpoint. Saddle Data identifies resources by their Name and performs an idempotent "upsert":

If a resource with the given name does not exist, it is created.
If a resource with the given name already exists, it is updated to match the specification.

Applying via CLI

You can use curl to apply a configuration file:

curl -X POST https://api.saddledata.io/v1/organizations/:orgId/iac/apply \
  -H "X-Api-Key: sd_your_api_key" \
  -H "Content-Type: application/x-yaml" \
  --data-binary @saddle-config.yaml

Benefits of IaC

Version Control: Store your data pipeline definitions in Git alongside your application code.
Environment Parity: Easily replicate your configuration across staging and production environments.
Automation: Integrate data pipeline management into your existing CI/CD pipelines (e.g., GitHub Actions, GitLab CI).
Terraform Integration: The "Apply" model maps cleanly to Terraform resources, enabling full lifecycle management of Saddle Data via the official Terraform Provider (coming soon).

Best Practices

Secrets Management (Remote Agents): If you are using Remote Agents, avoid hardcoding sensitive credentials in your YAML files. Instead, use local secret references that your agent will resolve at runtime:
- Environment Variables: Use the env: prefix (e.g., password: "env:DB_PASSWORD").
- AWS Secrets Manager: Use the aws:secretsmanager: prefix (e.g., password: "aws:secretsmanager:prod/db/creds").
Secrets Management (Cloud Flows): For flows running in the Saddle Data Cloud, credentials are automatically managed and stored securely in our internal Secret Manager. When using IaC for Cloud flows, you can:
- Omit the credential fields in your YAML (if the integration already exists).
- Provide a placeholder (e.g., "********") and update the credentials once via the Saddle Data UI.
Idempotency Keys: Always use the X-Idempotency-Key header when applying configuration from automated scripts to prevent duplicate operations in case of network retries.
Review Before Apply: Use a standard code review process for your YAML configurations before deploying them to production.

The Declarative Model​

Example Configuration​

The "Apply" Workflow​

Applying via CLI​

Benefits of IaC​

Best Practices​