Skip to main content

S3-Compatible Storage

Overview

The S3-Compatible Storage connector allows you to extract data from and load data into Amazon S3 and other compatible storage providers (like Cloudflare R2, Google Cloud Storage, or DigitalOcean Spaces).

Saddle Data features Intelligent Data Lake Discovery, using AI to automatically group files into logical tables based on naming conventions, making it easy to manage complex buckets.

Prerequisites

Before connecting, ensure you have:

  • Credentials: An Access Key ID and Secret Access Key.
  • Permissions: s3:ListBucket, s3:GetObject (for sources), and s3:PutObject (for destinations).
  • Bucket Info: The name of the bucket and the region.
  • Endpoint URL: Required if using a non-AWS provider (e.g., https://<account-id>.r2.cloudflarestorage.com).

Configuration

Saddle Data separates S3 configuration into Integrations (credentials) and Connections (bucket/path settings).

1. S3 Integration

Create an integration to store your security credentials:

  • Provider: AWS, Cloudflare R2, GCS, DigitalOcean, or Other.
  • Access Key ID / Secret Access Key: Your API credentials.
  • Region: The storage region (e.g., us-east-1).
  • Endpoint URL: Only required for non-AWS providers.

2. S3 Connection

Create a connection to define the data location:

  • Bucket Name: The name of your S3 bucket.
  • Remote Path: The prefix/folder path within the bucket.
  • File Format: CSV, TSV, Excel, JSON Array, or JSON Lines.

Intelligent Discovery

When you use S3 as a source, Saddle Data automatically analyzes the filenames in your bucket using AI. It groups related files (e.g., orders_2024_01.csv and orders_2024_02.csv) into logical Tables.

In the Flow Editor, you will see these AI-generated tables. Selecting one will automatically configure the correct file patterns to ensure historical and future files are captured.

Sync Modes

S3 as a Source

  • Full Refresh: Processes all files matching the table's pattern.
  • Incremental: Processes only new or modified files based on their S3 LastModified timestamp. Saddle Data maintains a cursor to ensure no data is missed.

S3 as a Destination

  • Incremental - Append: Creates a new file for each sync run. Filenames are generated using the Destination Table Name and a timestamp.
  • Excel Sheets: If using the Excel format, multiple table mappings in a single flow will be bundled into individual sheets within a single workbook.

Supported File Formats

  • CSV / TSV: Standard delimited text files.
  • Excel (.xlsx): Modern Microsoft Excel workbooks.
  • JSON Array: A standard JSON file containing an array of objects.
  • JSON Lines: A stream-friendly format where each line is a valid JSON object.

Declarative Configuration

apiVersion: v1
kind: Connection
metadata:
name: s3-connection
spec:
connectorId: s3
integrationId: aws-integration-id
configuration:
capability: source
bucket: my-bucket
path: ''
file_pattern: .*
file_format: csv