Row transforms - Planasonix

Row transforms operate on records without changing the fundamental column layout (except where a node explicitly adds helper columns). You use them to cut volume, impose ordering for deterministic processing, probe datasets, and remove duplicates.

Filter

Filter keeps rows that match a boolean expression. Configuration:

Expression: A predicate evaluated per row (for example, order_status = 'paid' and order_total > 0).
Null handling: Decide how NULL comparisons behave; explicit IS NULL checks avoid surprises.

Example: You ingest clickstream events but only want rows where event_name is purchase_completed and currency is USD:

event_name = 'purchase_completed' AND currency = 'USD'

(Exact expression syntax follows your workspace dialect—use the in-app helper and preview.) When to use: Early in the graph to reduce bytes moved through expensive joins.

Sample

Sample takes a random or stratified subset of rows for exploration, testing, or cost control. Configuration:

Sample rate or row limit: Fixed fraction (10%) or max rows (50_000).
Seed (when available): Reproducible samples for tests.
Stratification keys (when available): Preserve rare segment representation.

Example: Before training a fraud model downstream, sample 5% of transactions stratified by country so evaluation metrics are not dominated by one region. When to use: Development pipelines, QA harnesses, or staged rollouts where full scans are unnecessary.

Sort

Sort orders rows by one or more keys. Configuration:

Sort keys: Columns with ascending or descending order.
Nulls first/last: Explicit placement prevents flaky joins or window partitions.
Stability: Pair sort with a tie-breaker column (such as event_id) when duplicates on sort keys exist.

Example: You batch-update a warehouse table keyed by (customer_id, effective_date); sort by those columns so merge operators observe deterministic runs and easier diffing in logs. When to use: Before nodes that assume order (some window setups, certain file writers), or when breaking ties for deduplication.

Unique (deduplicate)

Unique removes duplicate rows—either full-row duplicates or duplicates by a key subset. Configuration:

Key columns: Deduplicate on user_id while keeping the first or last row per key according to sort order.
Keep policy: First vs last requires an upstream Sort when order matters.
Hash vs key: Full-row unique is cheap mentally; key-based unique matches business keys.

Example: CDC delivers occasional at-least-once duplicates on event_id. Sort by ingest_timestamp descending, then Unique on event_id keeping the first row to retain the latest version. When to use: After merges of streams, before cardinality-sensitive aggregations, or prior to loads that enforce primary keys.

If you need “best record wins” logic beyond first/last (for example, prefer non-null email), use a Window or Summarize pattern upstream of Unique, or move the rule to Custom SQL for clarity.

Explode

Explode converts array elements into separate rows, duplicating non-array columns for each element. Configuration:

Explode column: The array column to expand (e.g. tags, line_items).
Keep original: Optionally retain the original array column alongside the exploded rows.

Example: A row with tags: ["a", "b", "c"] produces three output rows — one per tag — with all other columns duplicated. When to use: After parsing JSON arrays from REST APIs or event payloads, before aggregating per-element.

Explode multiplies row count — one input row produces N output rows. Place Filter before Explode when possible to limit the expansion.

Z-Order Sort (professional+)

Z-Order Sort reorders rows using a space-filling curve that preserves multi-dimensional data locality. Query engines that read the output files can skip large ranges when filtering on any combination of the sorted columns. Configuration:

Columns: Two or more columns to include in the z-order sort.

Example: For an orders table filtered by region and order_date, z-ordering on both columns keeps related data co-located so queries skip 70–85% of files. When to use: Before writing to Iceberg or Delta Lake destinations, especially when downstream queries filter on multiple columns simultaneously. See the full Z-Order Sort reference for column selection tips and performance impact.

Patterns on the canvas

Shrink early

Place Filter and column projection (where available) close to the source to save compute on joins.

Sort before deterministic dedupe

When duplicate resolution depends on time or version columns, Sort then Unique.

Validate with preview

After Sample, confirm distributions still reflect production skew if you use samples for QA sign-off.

Column transforms

Change types, derive fields, and apply window logic.

Aggregation

Roll up after you have narrowed and cleaned rows.

Z-Order Sort reference

Full guide to multi-dimensional sort configuration and performance.

Explode reference

Detailed explode configuration and patterns.

​Filter

​Sample

​Sort

​Unique (deduplicate)

​Explode

​Z-Order Sort (professional+)

​Patterns on the canvas

​Related nodes

Column transforms

Aggregation

Z-Order Sort reference

Explode reference

Filter

Sample

Sort

Unique (deduplicate)

Explode

Z-Order Sort (professional+)

Patterns on the canvas

Related nodes