Join
Join merges two inputs on key columns using standard relational semantics. Join types:| Type | Result |
|---|---|
| Inner | Only keys present in both inputs. |
| Left | All rows from the left input; matched right columns or nulls. |
| Right | All rows from the right input; matched left columns or nulls. |
| Full | All keys from either side; unmatched sides null-padded. |
| Cross | Cartesian product—use sparingly and filter immediately. |
- Left / right inputs: Wire two upstream edges.
- Key mapping:
customer_id↔cust_idwhen names differ. - Null-safe equality (when offered): Treat
NULL = NULLif your rule requires it—otherwise unexpected drops occur on inner joins. - Select list: Choose columns to carry forward; disambiguate names with aliases.
orders (left) with customers (right) on orders.customer_id = customers.id using a left join so unpaid carts still appear with nullable customer attributes.
Fuzzy Join
Fuzzy Join matches rows when keys are approximately equal—names, addresses, or human-entered text with typos. Configuration:- Similarity metric: Edit distance, token overlap, or phonetic match—per product options.
- Threshold: Minimum score to accept a match; tune on labeled samples.
- Blocking keys (when available): Restrict comparisons to same
postal_codeprefix or first letter buckets to keep runtime feasible.
vendor_name from invoices to a reference suppliers table where strings differ ("Acme LLC" vs "Acme Incorporated"). Start with high thresholds; inspect unmatched rows in preview.
Union
Union stacks two or more inputs vertically when columns correspond. Configuration:- Column alignment: By name or position—prefer by name to reduce breakage when column order changes.
- Type coercion: Harmonize types (
INTvsBIGINT) before union when strict mode is enabled. - Distinct: Optional
UNION DISTINCTbehavior to drop identical rows across branches.
sales_us, sales_eu, and sales_apac tables that share schema into sales_all for downstream aggregation.
Design checklist
Normalize keys first
Trim strings, standardize casing, and parse dates before joins on those fields.
Profile cardinalities
Preview row counts after inner joins; investigate explosion after cross joins or many-to-many fuzzy matches.
Related nodes
Aggregation
Summarize after you establish the correct grain.
Column transforms
Prepare join keys and handle type mismatches.