Skip to main content

SFTP Server

Overview

The SFTP (Secure File Transfer Protocol) connector allows you to securely exchange files with remote servers. Saddle Data can use SFTP as both a data source and a destination, making it an ideal choice for legacy system integration and secure partner data exchange.

Prerequisites

Before connecting to an SFTP server, ensure you have:

  • Host and Port: The address and port (default is 22) of your SFTP server.
  • Authentication: Either a username and password or a private SSH key.
  • Access Permissions: The user must have read permissions for source paths and write permissions for destination paths.

Configuration

Saddle Data separates SFTP configuration into Integrations (credentials and server info) and Connections (path and format info). This allows you to reuse the same server credentials across multiple data flows.

1. SFTP Integration

When creating an SFTP Integration, you provide the server and security details:

  • Host: The hostname or IP address of the SFTP server.
  • Port: The port the SFTP server is listening on (default is 22).
  • Username: The username for authentication.
  • Password: The password for the user (optional if using a private key).
  • Private Key: The SSH private key for key-based authentication (optional if using a password).

2. SFTP Connection

Once an Integration is created, you create a Connection to define how data is accessed:

  • Capability: Choose "Source", "Destination", or "Both".
  • Remote Path: The directory path on the SFTP server to read from or write to.
  • File Pattern (Source only): A regular expression to filter files (e.g., orders_.*\.csv).
  • File Format: The format of the files (CSV, TSV, Excel, JSON Array, or JSON Lines).

Sync Modes

SFTP as a Source

When using SFTP as a source, Saddle Data scans the configured path for files matching the pattern.

  • Full Refresh: Processes all matching files in the directory.
  • Incremental: Processes only new or modified files based on their modification timestamps. Saddle Data maintains a cursor of the last processed file's timestamp to ensure no data is missed.

SFTP as a Destination

When using SFTP as a destination, Saddle Data will generate and upload files to the configured remote path.

  • Incremental - Append: Creates a new file for each sync run containing the records processed during that run.
  • Dynamic Naming: The filename is automatically generated using the Destination Table Name from your mapping and a timestamp (e.g., users_export_20240309_120000.csv).
  • Excel Sheets: If using the Excel format, the Destination Table Name is also used as the sheet name within the workbook.

Supported File Formats

Saddle Data supports the following formats for both reading and writing:

  • CSV (Comma Separated): Standard comma-separated values.
  • TSV (Tab Separated): Tab-separated values, useful for data containing commas.
  • Excel (.xlsx): Modern Microsoft Excel workbooks.
  • JSON Array (Standard): A standard JSON file containing an array of objects.
  • JSON Lines (Stream-friendly): A format where each line is a valid JSON object. This is highly recommended for large datasets as it allows for efficient streaming.

Schema Discovery

When used as a source, Saddle Data will "peek" at the most recent file matching your pattern to automatically discover the schema.

  • CSV/TSV: Headers are extracted from the first row using the appropriate delimiter.
  • Excel: Headers are extracted from the first row of the first sheet.
  • JSON/JSONL: Keys are extracted from the first few objects in the file.

All discovered fields are initially treated as String types, which can then be cast to other types using In-Flight Transformations.

Declarative Configuration

apiVersion: v1
kind: Connection
metadata:
name: sftp-connection
spec:
connectorId: sftp
integrationId: sftp-integration-id
configuration:
capability: source
path: /inbox/orders
file_pattern: .*
file_format: csv