Infrastructure as Code (IaC)
Saddle Data is built with Infrastructure as Code (IaC) as a first-class citizen. While the web dashboard is excellent for interactive exploration, power users and enterprise teams can manage their entire data stack programmatically.
The Declarative Model
Saddle Data uses a declarative YAML specification to define your data environment. Instead of calling multiple API endpoints to create individual resources, you define the desired state of your Integrations, Connections, and Flows in a single file and "apply" it.
Example Configuration
version: "1"
# Define reusable authorizations
integrations:
- name: "prod-db-auth"
type: "postgres"
# Define specific data sources and destinations
connections:
- name: "ecommerce-source"
integration: "prod-db-auth"
type: "postgres"
capability: "source"
config:
host: "db.example.com"
database: "orders"
user: "saddle_reader"
# Define the data pipeline
flows:
- name: "Orders to Warehouse"
source_connection: "ecommerce-source"
destination_connection: "bigquery-dest"
schedule: "0 * * * *" # Hourly
config:
mappings:
- source_entity: "public.orders"
dest_entity: "raw_orders"
sync_mode: "incremental_deduped"
cursor_field: "updated_at"
primary_key: "id"
The "Apply" Workflow
To update your configuration, you send the YAML file to the apply endpoint. Saddle Data identifies resources by their Name and performs an idempotent "upsert":
- If a resource with the given name does not exist, it is created.
- If a resource with the given name already exists, it is updated to match the specification.
Applying via CLI
You can use curl to apply a configuration file:
curl -X POST https://api.saddledata.io/v1/organizations/:orgId/iac/apply \
-H "X-Api-Key: sd_your_api_key" \
-H "Content-Type: application/x-yaml" \
--data-binary @saddle-config.yaml
Benefits of IaC
- Version Control: Store your data pipeline definitions in Git alongside your application code.
- Environment Parity: Easily replicate your configuration across staging and production environments.
- Automation: Integrate data pipeline management into your existing CI/CD pipelines (e.g., GitHub Actions, GitLab CI).
- Terraform Integration: The "Apply" model maps cleanly to Terraform resources, enabling full lifecycle management of Saddle Data via the official Terraform Provider (coming soon).
Best Practices
- Secrets Management (Remote Agents): If you are using Remote Agents, avoid hardcoding sensitive credentials in your YAML files. Instead, use local secret references that your agent will resolve at runtime:
- Environment Variables: Use the
env:prefix (e.g.,password: "env:DB_PASSWORD"). - AWS Secrets Manager: Use the
aws:secretsmanager:prefix (e.g.,password: "aws:secretsmanager:prod/db/creds").
- Environment Variables: Use the
- Secrets Management (Cloud Flows): For flows running in the Saddle Data Cloud, credentials are automatically managed and stored securely in our internal Secret Manager. When using IaC for Cloud flows, you can:
- Omit the credential fields in your YAML (if the integration already exists).
- Provide a placeholder (e.g.,
"********") and update the credentials once via the Saddle Data UI.
- Idempotency Keys: Always use the
X-Idempotency-Keyheader when applying configuration from automated scripts to prevent duplicate operations in case of network retries. - Review Before Apply: Use a standard code review process for your YAML configurations before deploying them to production.