Skip to main content
The Delta Lake destination writes standards-compliant Delta Lake tables to any supported cloud object storage. Each batch of records is encoded as Parquet, uploaded to your storage path, and committed as a new Delta Lake version — readable by Spark, Databricks, DuckDB, Trino, and every other Delta-compatible engine.

Architecture

Every write is an atomic commit. The Delta transaction log records column schemas, file-level statistics (min/max/null counts), and version history so downstream consumers can perform time travel, predicate pushdown, and file skipping.

Supported cloud providers

FieldDescription
CredentialAWS credential with s3:PutObject, s3:GetObject, s3:DeleteObject, and s3:ListBucket on the target bucket
BucketS3 bucket name (e.g., my-data-lake)
PrefixOptional path prefix (e.g., warehouse/sales/)
RegionAWS region (e.g., us-east-1, us-west-2)

Write modes

Adds new Parquet files and a new Delta log version. Existing data remains untouched. Each pipeline run creates a new version number.Best for: event streams, logs, incremental loads, and any workload where historical data should not be modified.

Schema evolution

When schema evolution is enabled, the destination adapts to upstream changes automatically:
1

First batch — type inference

Column types are inferred from the data: integers map to long, decimals to double, strings to string, booleans to boolean, and ISO-8601 timestamps to timestamp.
2

Subsequent batches — additive columns

If a new column appears in a later batch, it is appended to the Parquet schema and the Delta log. Existing columns retain their original types.
3

Type consistency

Once a column type is established in the first batch, it remains fixed for the life of the table. Mismatched types in later batches are coerced where safe or rejected with an error.
Supported Delta types: string, long, double, boolean, timestamp, date, integer, short, byte, float, binary.

Column statistics

When statistics are enabled, every Parquet file commit includes metadata in the Delta log:
  • numRecords — row count in the file
  • minValues / maxValues — per-column extremes for numeric, string, date, and timestamp types
  • nullCount — null values per column
Query engines use these statistics for predicate pushdown and file skipping, dramatically reducing scan times on large tables.

Reading your tables

Once data lands, any Delta-compatible engine can query it immediately:
INSTALL delta;
LOAD delta;
SELECT * FROM delta_scan('s3://my-bucket/warehouse/sales/');

Performance benchmarks

Benchmarked on a standard Planasonix worker writing to S3 (us-west-2):
MetricValue
Throughput~167,000 rows/sec
CompressionSnappy
Batch size50,000 rows per Parquet file
1M rows (5 columns)~6 seconds, 33 MB on S3
ConcurrencyThread-safe with atomic version numbering
Throughput scales linearly with batch size. Larger batches produce fewer, larger Parquet files — ideal for analytical query patterns. Smaller batches (5,000–10,000) suit near-real-time use cases.

Troubleshooting

SymptomLikely causeFix
”Access denied” on uploadCloud credential lacks write permissionGrant PutObject / storage.objects.create / Blob Data Contributor on the bucket and prefix
Version conflict errorConcurrent writers claiming the same versionRetry — the destination uses atomic versioning and will claim the next available number
Files exist but no _delta_log/First write failed mid-commitRe-run the pipeline; the next write creates version 0 from scratch
Query engine can’t read tableProtocol version mismatchEnsure your engine supports Delta reader version 1 and writer version 2
Slow writes to S3Many small batchesIncrease batch size to 50,000+ rows to reduce per-file overhead

Comparison with other lake destinations

FeatureDelta LakeIcebergFabric / OneLake
Cloud supportS3, GCS, AzureS3, GCS, AzureOneLake only
Table formatDelta LakeApache IcebergDelta Lake
SQL endpointVia external engineVia external catalogBuilt-in Fabric SQL
Write modesAppend, OverwriteAppend, Overwrite, MergeAppend, Upsert, Replace
Column statisticsYesYesYes
Schema evolutionYesYesYes
CostStorage onlyStorage onlyFabric capacity units

Data warehouses

Warehouse connections for Snowflake, BigQuery, Databricks, Fabric, and more.

Destination nodes

Write modes, pre-flight checks, and other destination node types.

Cloud storage

Configure S3, GCS, and Azure Blob connections used by the Delta Lake destination.

Data contracts

Enforce schema and quality rules before data lands in your lake.