Skip to main content

Remote Agents

Saddle Data's Remote Agent architecture allows you to run data pipelines within your own infrastructure while managing them from the Saddle Data Cloud. This "Hybrid Data Plane" model gives you the ease of use of a SaaS platform with the security and compliance of a self-hosted solution.

Architecture

The Remote Agent is a lightweight Docker container that acts as a worker node. It connects to the Saddle Data Control Plane (SaaS) to fetch jobs, execute them locally, and report the status back.

  • Control Plane (SaaS): Manages configuration, scheduling, user access, and orchestration. It never sees your data rows, only metadata (row counts, success/failure status).
  • Data Plane (Agent): Runs inside your VPC or on-premise server. It connects directly to your databases (Source and Destination) and performs the extraction and loading. Data flows directly from Source to Destination without leaving your network.

Security

The Remote Agent is designed with security as a primary requirement.

1. Outbound-Only Connection

The Agent makes an outbound HTTPS connection to the Saddle Data API (https://api.saddledata.io). It does not require opening any inbound ports on your firewall. You do not need to expose your database to the public internet or whitelist Saddle Data's cloud IPs.

2. Hybrid Encryption for Credentials

When you run a flow on a Remote Agent, Saddle Data needs to pass the database credentials to the worker. To ensure these credentials are never exposed in transit or accessible to unauthorized parties, we use Hybrid Encryption:

  1. When the Agent starts, it generates a unique RSA Keypair.
  2. It sends its Public Key to the Control Plane during the heartbeat process.
  3. When a job is queued for the Agent, the API encrypts the flow configuration (including database passwords) using a random AES key.
  4. This AES key is then encrypted with the Agent's Public Key.
  5. Only the specific Agent holding the matching Private Key can decrypt and execute the job.

3. Local Secret Injection (Bring Your Own Secrets)

For maximum security, you may prefer not to store your database credentials in the Saddle Data Cloud at all. The Remote Agent supports Local Secret Injection:

  • In the Saddle Data UI, instead of entering your actual password (e.g., superSecret123), you can enter a reference to an environment variable prefixed with env: (e.g., env:PG_PASSWORD).
  • When you run the Agent container, you pass this environment variable:
    docker run ... -e PG_PASSWORD=superSecret123 ...
  • When the Agent executes the flow, it will resolve env:PG_PASSWORD to the local value superSecret123.
  • This ensures your actual credentials never leave your infrastructure, even in encrypted form.

4. AWS Secrets Manager Integration

For enterprise environments on AWS, the Agent can directly resolve secrets from AWS Secrets Manager.

  • In the UI credential field, enter a reference starting with aws:secretsmanager: followed by the secret name (e.g., aws:secretsmanager:prod/db/password).
  • Ensure your Agent container has access to AWS credentials via Environment Variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) or an IAM Role attached to the EC2 instance or ECS task.
  • The Agent will fetch the secret value at runtime using the AWS SDK.

5. Data Residency

Since the data pipeline runs entirely within the Agent container, your data never passes through Saddle Data's servers. This helps you meet strict data residency requirements (e.g., GDPR, HIPAA) by keeping data within a specific region or country.

Use Cases

  • Private Databases: Access databases in a private VPC (AWS, GCP, Azure) without setting up complex VPNs or SSH tunnels.
  • On-Premise Data: Sync data from legacy on-premise SQL servers to the cloud.
  • Compliance: Ensure PII or sensitive data never leaves your controlled infrastructure.
  • Reduced Latency: Run transformations closer to where your data lives.