Concepts
A backfill is still a pipeline run, but the orchestration supplies bounded partitions instead of only “latest” state.- Source nodes read each slice (for example one day of events) according to parameters you pass.
- Destination nodes must handle overwrite, merge, or append semantics you designed.
- Downstream schedules may need pausing so backfill and incremental loads do not fight for locks.
Backfill does not magically change retention in object storage or warehouses; ensure upstream data still exists for the range you request.
Configuring date ranges
Choose boundaries
Pick inclusive start and end partition values (often
YYYY-MM-DD). Align to how the source is partitioned.Set run parameters
Map range tokens to pipeline variables (
start_date, end_date, hours, etc.) your nodes reference.- Calendar days
- Hourly slices
- Custom keys
Best for nightly batch warehouses partitioned by
dt.Incremental vs full strategies
| Strategy | When to use | Risk |
|---|---|---|
| Incremental backfill | Reprocess only missing or corrected partitions | Must trust watermark metadata; bugs can skip slices |
| Full table rebuild | Schema overhaul or corrupted dimension | Highest load; requires maintenance window |
| Merge / upsert | Idempotent writes keyed by business id | Depends on warehouse merge performance and locks |
Full backfills often pair with temporary tables and atomic swap patterns to keep production readers consistent mid-run.
Monitoring backfill progress
During execution, watch:- Completed vs remaining partitions in the run detail view
- Per-slice duration trends (slowdown hints at skewed keys or hot partitions)
- Warehouse slot usage and retry counts
Chained pipelines
Chained pipelines
Backfill upstream facts before dimensions when foreign keys must exist; or use DAG ordering in an external orchestrator.
Streaming + batch
Streaming + batch
Pause CDC consumers if they compete for the same destination table during full rebuilds.
Related topics
Schedules
Ongoing incremental loads after backfill completes.
Run history
Inspect slice-level status and logs.