Skip to main content
Data diagnostics helps you understand shape, distribution, and freshness problems inside datasets without exporting everything to a notebook first. Use it when dashboards look wrong but pipelines show green checks.

Diagnostic tools

1

Select a dataset or run artifact

Pick a table, stage output, or sample file from a recent run.
2

Launch profiling

Compute column statistics: null rates, distinct counts, min/max for numerics, top values for categoricals.
3

Compare baselines

Contrast the current profile to a saved baseline or prior run to spot drift.
4

Trace upstream

Jump to the pipeline nodes that produced the artifact and open recent logs.

Data profiling

Profiling is read-only against connections your role can access. Large tables use sampling strategies documented in the UI—note sample percentages when interpreting rare-value counts.
Check primary key uniqueness and foreign key presence when diagnostics expose join cardinality hints.

Anomaly detection

Anomaly detection models seasonality on volume, null rate, or numeric aggregates. When today’s batch diverges from the forecast band, diagnostics opens an incident card with likely upstream nodes.
Seasonal businesses (retail holidays, fiscal close) need custom calendars; configure blackouts or adjusted baselines before major events.
If legitimate marketing spikes trigger alerts, widen bands temporarily and document the business event in the alert note.
Silent failures can occur when bad data still matches schema—pair profiling with business reconciliation queries monthly.

Dead letter queue

Inspect rows that failed validation or loads.

Data contracts

Encode rules diagnostics surfaces informally today.