Overview
Column statistics are automatically computed during every Parquet write and stored in both Iceberg snapshot properties and Delta Lakeadd action metadata. Query engines use these statistics for predicate pushdown — skipping entire data files that cannot contain matching rows.
What’s Computed
| Statistic | Description | Storage |
|---|---|---|
| Min value | Smallest value in the column | Iceberg manifest + Delta stats JSON |
| Max value | Largest value in the column | Iceberg manifest + Delta stats JSON |
| Null count | Number of null values | Iceberg manifest + Delta stats JSON |
| Distinct estimate | Approximate unique values | Iceberg snapshot properties |
How It Improves Query Performance
When a query engine reads an Iceberg table or Delta table, it checks the per-file column statistics before downloading data files. If a file’s max value fordate is 2025-12-31, any query filtering date > 2026-01-01 skips the file entirely. This can reduce I/O by 90%+ for selective queries.
Viewing Column Stats
Navigate to Managed Lakehouse → select a table → Column Stats tab. The Column Stats panel displays:| Column | Type | Min | Max | Nulls | Distinct |
|---|---|---|---|---|---|
user_id | int64 | 1 | 982,451 | 0 | 245,112 |
event_type | string | click | view | 12 | 8 |
created_at | timestamp | 2025-01-01 | 2026-04-09 | 0 | 456,321 |
amount | float64 | 0.01 | 9,999.99 | 1,203 | 8,742 |
API
Response
Thresholds
Tables with more than 200 columns only compute statistics for primary keys, partition columns, sort columns, and system columns to keep commit latency low.
| Table Width | Columns Profiled |
|---|---|
| ≤ 200 columns | All columns |
| > 200 columns | Keys, partition, sort, and system columns only |
Storage Details
Iceberg
Statistics are stored as snapshot-level properties with keys:Delta Lake
Statistics are embedded in thestats JSON field of each add action in the transaction log:
Best Practices
Use with Z-Order
Z-ordered data has tighter min/max ranges per file, making statistics-based pruning even more effective.
Compact regularly
Compaction recalculates statistics across merged files for up-to-date min/max bounds.
Related
Z-Order Sort
Improve data locality for multi-column queries
Table Maintenance
Compaction refreshes column statistics automatically