In-Flight Transformations
Saddle Data allows you to perform powerful in-flight transformations on your data as it moves from source to destination. This means you can clean, reshape, and filter your data without needing a separate transformation tool.
Transformations are defined within a Flow's configuration and are executed in the order they are listed.
Capabilities such as Selecting Columns, Renaming Fields, and Type Casting are now handled directly in the Schema Mapping tab for a more intuitive and performant experience.
The currently supported transformation types are:
- Filter Rows: Remove rows that don't match a specified condition. For example, you could filter a
userstable to only include users whereis_active = true. - Flatten JSON: Expand nested JSON objects and arrays into separate columns. For example, a
metadataobject containing{ "color": "blue", "size": "large" }can be flattened intometadata_colorandmetadata_sizecolumns. This also supports flattening arrays by appending index suffixes (e.g.,tags_0,tags_1). - Hash Column: Replace sensitive data with a cryptographic hash (SHA-256). This is the preferred method for protecting PII while still allowing for data joining and analysis.
- Mask Column: Partially or fully redact data. For example, you could mask a credit card number to only show the last 4 digits (
************1234).
Managed Transformations
Transformations that are injected automatically by the Data Governance system are known as Managed Transformations.
- Lock Icon: These transformations appear in the Flow Editor with a lock icon.
- Persistence: They cannot be edited or removed by standard users.
- Centralized Control: To modify a managed transformation, an authorized user must update the underlying policy in the Governance Control Center.
- The Circuit Breaker: At runtime, if a managed transformation is detected as missing or tampered with, the Saddle Data Worker will trigger a Circuit Breaker event and immediately abort the sync to prevent a data leak.
These transformations can be chained together to create a powerful data quality pipeline. See the How-To Guides for a practical example.