Node categories
Sources
Ingest from connections, tables, CDC, Iceberg, webhooks, and iterators.
Row transforms
Filter, sort, sample, and deduplicate rows.
Column transforms
Dates, windows, schema mapping, and guided data prep.
Joins and unions
Combine datasets with joins, fuzzy matching, and unions.
Aggregation
Summarize, pivot, and unpivot for analytics-ready shapes.
Parsers and builders
Parse CSV, JSON, and XML; build structured payloads for downstream systems.
Control flow
Branching, errors, loops, splits, and cross-pipeline triggers.
Data quality
Validate, profile, detect PII, generate test rows, and notify owners.
Advanced
SQL, scripts, notebooks, UDFs, streaming, ML, and geospatial operations.
Destinations
Write to warehouses, lakes, cloud storage, and webhooks.
How to choose a node
Start from the data shape
If data is not yet tabular, use a parser or source that emits rows. If it is already relational, jump to transforms or joins.
Prefer native nodes over scripts
Built-in nodes carry clearer metadata for lineage and optimization. Use Custom SQL or Python when you need logic that no single node expresses cleanly.
Pipeline authoring resources
Pipeline canvas
Learn how to wire nodes, preview data, and debug runs.
Variables
Parameterize node settings for reuse across environments.