Skip to main content

In-Flight Transformations

Saddle Data allows you to perform powerful in-flight transformations on your data as it moves from source to destination. This means you can clean, reshape, and filter your data without needing a separate transformation tool.

Transformations are defined within a Flow's configuration and are executed in the order they are listed.

Core Mapping Features

Capabilities such as Selecting Columns, Renaming Fields, and Type Casting are now handled directly in the Schema Mapping tab for a more intuitive and performant experience.

The currently supported transformation types are:

  • Filter Rows: Remove rows that don't match a specified condition. For example, you could filter a users table to only include users where is_active = true.
  • Flatten JSON: Expand nested JSON objects and arrays into separate columns. For example, a metadata object containing { "color": "blue", "size": "large" } can be flattened into metadata_color and metadata_size columns. This also supports flattening arrays by appending index suffixes (e.g., tags_0, tags_1).
  • Hash Column: Replace sensitive data with a cryptographic hash (SHA-256). This is the preferred method for protecting PII while still allowing for data joining and analysis.
  • Mask Column: Partially or fully redact data. For example, you could mask a credit card number to only show the last 4 digits (************1234).

Managed Transformations

Transformations that are injected automatically by the Data Governance system are known as Managed Transformations.

  • Lock Icon: These transformations appear in the Flow Editor with a lock icon.
  • Persistence: They cannot be edited or removed by standard users.
  • Centralized Control: To modify a managed transformation, an authorized user must update the underlying policy in the Governance Control Center.
  • The Circuit Breaker: At runtime, if a managed transformation is detected as missing or tampered with, the Saddle Data Worker will trigger a Circuit Breaker event and immediately abort the sync to prevent a data leak.

These transformations can be chained together to create a powerful data quality pipeline. See the How-To Guides for a practical example.