Skip to main content

Infrastructure as Code (IaC)

Saddle Data is built with Infrastructure as Code (IaC) as a first-class citizen. While the web dashboard is excellent for interactive exploration, power users and enterprise teams can manage their entire data stack programmatically.

The Declarative Model

Saddle Data uses a declarative YAML specification to define your data environment. Instead of calling multiple API endpoints to create individual resources, you define the desired state of your Integrations, Connections, and Flows in a single file and "apply" it.

Example Configuration

version: "1"

# Define reusable authorizations
integrations:
- name: "prod-db-auth"
type: "postgres"

# Define specific data sources and destinations
connections:
- name: "ecommerce-source"
integration: "prod-db-auth"
type: "postgres"
capability: "source"
config:
host: "db.example.com"
database: "orders"
user: "saddle_reader"

# Define the data pipeline
flows:
- name: "Orders to Warehouse"
source_connection: "ecommerce-source"
destination_connection: "bigquery-dest"
schedule: "0 * * * *" # Hourly
config:
mappings:
- source_entity: "public.orders"
dest_entity: "raw_orders"
sync_mode: "incremental_deduped"
cursor_field: "updated_at"
primary_key: "id"

The "Apply" Workflow

To update your configuration, you send the YAML file to the apply endpoint. Saddle Data identifies resources by their Name and performs an idempotent "upsert":

  1. If a resource with the given name does not exist, it is created.
  2. If a resource with the given name already exists, it is updated to match the specification.

Applying via CLI

You can use curl to apply a configuration file:

curl -X POST https://api.saddledata.io/v1/organizations/:orgId/iac/apply \
-H "X-Api-Key: sd_your_api_key" \
-H "Content-Type: application/x-yaml" \
--data-binary @saddle-config.yaml

Benefits of IaC

  • Version Control: Store your data pipeline definitions in Git alongside your application code.
  • Environment Parity: Easily replicate your configuration across staging and production environments.
  • Automation: Integrate data pipeline management into your existing CI/CD pipelines (e.g., GitHub Actions, GitLab CI).
  • Terraform Integration: The "Apply" model maps cleanly to Terraform resources, enabling full lifecycle management of Saddle Data via the official Terraform Provider (coming soon).

Best Practices

  • Secrets Management (Remote Agents): If you are using Remote Agents, avoid hardcoding sensitive credentials in your YAML files. Instead, use local secret references that your agent will resolve at runtime:
    • Environment Variables: Use the env: prefix (e.g., password: "env:DB_PASSWORD").
    • AWS Secrets Manager: Use the aws:secretsmanager: prefix (e.g., password: "aws:secretsmanager:prod/db/creds").
  • Secrets Management (Cloud Flows): For flows running in the Saddle Data Cloud, credentials are automatically managed and stored securely in our internal Secret Manager. When using IaC for Cloud flows, you can:
    • Omit the credential fields in your YAML (if the integration already exists).
    • Provide a placeholder (e.g., "********") and update the credentials once via the Saddle Data UI.
  • Idempotency Keys: Always use the X-Idempotency-Key header when applying configuration from automated scripts to prevent duplicate operations in case of network retries.
  • Review Before Apply: Use a standard code review process for your YAML configurations before deploying them to production.