Validation
Validation applies schema and rule checks to incoming rows. Configuration:- Schema constraints: Required columns, non-null keys, type compatibility.
- Business rules: Domain checks (
amount >= 0,status IN (...)), regex patterns, referential checks against small lookup tables when supported. - On failure: Fail the run, route bad rows to a reject path, or count violations—per node settings.
Data Profiling
Data Profiling computes statistics on columns: null rates, distinct counts, min/max, histograms, and inferred patterns. Configuration:- Sample size: Full scan vs sample for large tables.
- Columns: All vs selected sensitive dimensions.
- Output destination: Inline report vs persisted profile table for trend charts.
email week over week to catch silent breakage.
PII Detection
PII Detection scans text and structured fields for likely personal data using patterns and models (capabilities vary by plan). Configuration:- Detector packs: Email, phone, government IDs, payment artifacts.
- Masking vs tagging: Flag columns for governance, or mask values before logging.
- Locale: Tune detectors for country-specific ID formats.
notes columns.
Row Generator
Row Generator produces synthetic or templated rows for tests and demos. Configuration:- Row count and seed for reproducibility.
- Column generators: Ranges, enumerations, UUIDs, Faker-style patterns when available.
- Schema target: Match production schema to test downstream nodes without copying real customer data.
Notification
Notification sends alerts when upstream conditions trigger—often paired with validation or error paths. Configuration:- Channels: Email, Slack, Microsoft Teams, PagerDuty—per integration.
- Message template: Include pipeline name, run URL, row counts, and top error messages.
- Severity: Route SLA breaches to paging; send warnings to a team channel only.
Recommended layout
Profile after structural parsers
Run Data Profiling once files parse successfully; skip expensive profiling on known-bad batches.
Validate before expensive joins
Fail fast on key violations before paying for large shuffle operations.
Related topics
Governance
Contracts, catalog, and policy alignment.
Observability
Dashboards, diagnostics, and DLQ patterns.