Backfill - Planasonix

Backfill replays your pipeline logic across a date range (or other partition keys) that already passed. You use it after fixing bugs, adding columns, or onboarding a new destination that needs history.

Concepts

A backfill is still a pipeline run, but the orchestration supplies bounded partitions instead of only “latest” state.

Source nodes read each slice (for example one day of events) according to parameters you pass.
Destination nodes must handle overwrite, merge, or append semantics you designed.
Downstream schedules may need pausing so backfill and incremental loads do not fight for locks.

Backfill does not magically change retention in object storage or warehouses; ensure upstream data still exists for the range you request.

Configuring date ranges

Choose boundaries

Pick inclusive start and end partition values (often YYYY-MM-DD). Align to how the source is partitioned.

Set run parameters

Map range tokens to pipeline variables (start_date, end_date, hours, etc.) your nodes reference.

Select environment

Run backfills in staging first when volumes are large or logic recently changed.

Launch

Start from Orchestration → Backfill (or the pipeline action menu). Confirm estimated cost if the UI surfaces projections.

Calendar days
Hourly slices
Custom keys

Best for nightly batch warehouses partitioned by dt.

Incremental vs full strategies

Strategy	When to use	Risk
Incremental backfill	Reprocess only missing or corrected partitions	Must trust watermark metadata; bugs can skip slices
Full table rebuild	Schema overhaul or corrupted dimension	Highest load; requires maintenance window
Merge / upsert	Idempotent writes keyed by business id	Depends on warehouse merge performance and locks

For incremental models, add assertions (row count floors, null rate checks) per slice so a silent skip does not mark success.

Full backfills often pair with temporary tables and atomic swap patterns to keep production readers consistent mid-run.

Monitoring backfill progress

During execution, watch:

Completed vs remaining partitions in the run detail view
Per-slice duration trends (slowdown hints at skewed keys or hot partitions)
Warehouse slot usage and retry counts

Parallelism that works for nightly incremental loads may throttle sources during backfill. Cap concurrency to respect API quotas and DBA limits.

Cancel oversized jobs from the run page; document whether partial partitions committed so you can resume safely.

Chained pipelines

Backfill upstream facts before dimensions when foreign keys must exist; or use DAG ordering in an external orchestrator.

Streaming + batch

Pause CDC consumers if they compete for the same destination table during full rebuilds.

Schedules

Ongoing incremental loads after backfill completes.

Run history

Inspect slice-level status and logs.

Airflow integration Data governance overview

​Concepts

​Configuring date ranges

​Incremental vs full strategies

​Monitoring backfill progress

​Related topics

Schedules

Run history

Concepts

Configuring date ranges

Incremental vs full strategies

Monitoring backfill progress

Related topics