Architecture
Every write is an atomic commit. The Delta transaction log records column schemas, file-level statistics (min/max/null counts), and version history so downstream consumers can perform time travel, predicate pushdown, and file skipping.Supported cloud providers
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
| Field | Description |
|---|---|
| Credential | AWS credential with s3:PutObject, s3:GetObject, s3:DeleteObject, and s3:ListBucket on the target bucket |
| Bucket | S3 bucket name (e.g., my-data-lake) |
| Prefix | Optional path prefix (e.g., warehouse/sales/) |
| Region | AWS region (e.g., us-east-1, us-west-2) |
Write modes
- Append
- Overwrite
Adds new Parquet files and a new Delta log version. Existing data remains untouched. Each pipeline run creates a new version number.Best for: event streams, logs, incremental loads, and any workload where historical data should not be modified.
Schema evolution
When schema evolution is enabled, the destination adapts to upstream changes automatically:First batch — type inference
Column types are inferred from the data: integers map to
long, decimals to double, strings to string, booleans to boolean, and ISO-8601 timestamps to timestamp.Subsequent batches — additive columns
If a new column appears in a later batch, it is appended to the Parquet schema and the Delta log. Existing columns retain their original types.
string, long, double, boolean, timestamp, date, integer, short, byte, float, binary.
Column statistics
When statistics are enabled, every Parquet file commit includes metadata in the Delta log:- numRecords — row count in the file
- minValues / maxValues — per-column extremes for numeric, string, date, and timestamp types
- nullCount — null values per column
Reading your tables
Once data lands, any Delta-compatible engine can query it immediately:Performance benchmarks
Benchmarked on a standard Planasonix worker writing to S3 (us-west-2):| Metric | Value |
|---|---|
| Throughput | ~167,000 rows/sec |
| Compression | Snappy |
| Batch size | 50,000 rows per Parquet file |
| 1M rows (5 columns) | ~6 seconds, 33 MB on S3 |
| Concurrency | Thread-safe with atomic version numbering |
Throughput scales linearly with batch size. Larger batches produce fewer, larger Parquet files — ideal for analytical query patterns. Smaller batches (5,000–10,000) suit near-real-time use cases.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| ”Access denied” on upload | Cloud credential lacks write permission | Grant PutObject / storage.objects.create / Blob Data Contributor on the bucket and prefix |
| Version conflict error | Concurrent writers claiming the same version | Retry — the destination uses atomic versioning and will claim the next available number |
Files exist but no _delta_log/ | First write failed mid-commit | Re-run the pipeline; the next write creates version 0 from scratch |
| Query engine can’t read table | Protocol version mismatch | Ensure your engine supports Delta reader version 1 and writer version 2 |
| Slow writes to S3 | Many small batches | Increase batch size to 50,000+ rows to reduce per-file overhead |
Comparison with other lake destinations
| Feature | Delta Lake | Iceberg | Fabric / OneLake |
|---|---|---|---|
| Cloud support | S3, GCS, Azure | S3, GCS, Azure | OneLake only |
| Table format | Delta Lake | Apache Iceberg | Delta Lake |
| SQL endpoint | Via external engine | Via external catalog | Built-in Fabric SQL |
| Write modes | Append, Overwrite | Append, Overwrite, Merge | Append, Upsert, Replace |
| Column statistics | Yes | Yes | Yes |
| Schema evolution | Yes | Yes | Yes |
| Cost | Storage only | Storage only | Fabric capacity units |
Related topics
Data warehouses
Warehouse connections for Snowflake, BigQuery, Databricks, Fabric, and more.
Destination nodes
Write modes, pre-flight checks, and other destination node types.
Cloud storage
Configure S3, GCS, and Azure Blob connections used by the Delta Lake destination.
Data contracts
Enforce schema and quality rules before data lands in your lake.