Skip to main content
The Managed Lakehouse destination writes Parquet data files once and commits metadata to both Apache Iceberg and Delta Lake in a single atomic operation. Every query engine in your stack — whether it speaks Iceberg or Delta — can read the same underlying data without conversion or duplication.
Managed Lakehouse is available on the Professional plan and above. Upgrade →

Architecture

Key design decisions:
  1. Single write, dual commit — Parquet files are uploaded once. Iceberg and Delta metadata are committed separately, eliminating data duplication.
  2. Iceberg-primary — Iceberg is the transactional source of truth. If the Iceberg commit succeeds but Delta fails, the pipeline retries Delta once and logs a warning without failing the run.
  3. Catalog-backed — Iceberg tables are registered in a catalog (AWS Glue Data Catalog or REST Catalog) for schema governance, time travel, and partition pruning.

Supported cloud providers

FieldDescription
CredentialAWS credential with s3:PutObject, s3:GetObject, s3:DeleteObject, s3:ListBucket on the target bucket
BucketS3 bucket name (e.g., my-data-lake)
PrefixPath prefix for data files (e.g., lakehouse/)
RegionAWS region (e.g., us-west-2)
For Iceberg via AWS Glue, the credential also needs:
  • glue:GetDatabase, glue:GetDatabases
  • glue:GetTable, glue:GetTables, glue:CreateTable, glue:UpdateTable

Iceberg catalog configuration

FieldDescription
Catalog Typeglue
RegionAWS region of the Glue Data Catalog
NamespaceGlue database name (e.g., analytics)
WarehouseS3 path for Iceberg data files (e.g., s3://my-bucket/lakehouse/)
AWS credentials are shared with the S3 storage credential. Glue permissions must include table create/update access.

Write modes

Adds new Parquet files and commits a new snapshot to both Iceberg and Delta. Existing data is preserved.Best for: event streams, logs, incremental loads, and any workload where historical data should not be modified.

Table formats

You can enable one or both formats:
FormatDescriptionWhen to use
Apache IcebergCatalog-backed, ACID-transactional table format with time travel, schema evolution, and hidden partitioningPrimary format for analytics engines (Spark, Trino, Athena, Flink)
Delta LakeFile-based transaction log with column statisticsWhen you also need Databricks/DuckDB compatibility or dual-engine access
By default, both formats are enabled. If you only need one, uncheck the other in the node configuration.

Advanced settings

Partition strategy

Partitioning organizes data files by column values for faster queries. Supported partition transforms:
TransformSyntaxExample
Identitycolumn_nameregion
Yearyear(column)year(created_at)
Monthmonth(column)month(created_at)
Dayday(column)day(created_at)
Hourhour(column)hour(event_time)
Bucketbucket(n, column)bucket(16, user_id)
Truncatetruncate(w, column)truncate(4, zip_code)

Schema evolution

When enabled (default), the destination automatically adapts to upstream schema changes:
1

First batch — schema inference

Column types are inferred from the data and registered in both the Iceberg catalog and Delta log.
2

New columns

If a new column appears in a later batch, it is added to the schema. Existing columns retain their original types.
3

Iceberg schema IDs

Iceberg tracks column identity by field ID, enabling safe renames and reordering without breaking downstream consumers.

Maintenance settings

SettingDefaultDescription
Snapshot Retention7 daysHow long to keep expired Iceberg snapshots and Delta versions before cleanup
Compaction Target128 MBTarget file size for compaction operations — smaller values produce more files but faster writes
Maintenance can be triggered manually via the API or scheduled automatically.

Reading your tables

SELECT * FROM glue_catalog.analytics.events;

API reference

The Managed Lakehouse API provides endpoints for table management, commit history, and maintenance operations.

List registered tables

GET /api/managed-lakehouse/tables

Register a new table

POST /api/managed-lakehouse/tables
Content-Type: application/json

{
  "connectionId": "uuid",
  "tableName": "events",
  "storageProvider": "s3",
  "storagePath": "s3://my-bucket/lakehouse/events",
  "icebergEnabled": true,
  "deltaEnabled": true,
  "icebergCatalogType": "glue",
  "icebergNamespace": "analytics"
}

View commit history

GET /api/managed-lakehouse/tables/{tableId}/commits?limit=50

Trigger maintenance

POST /api/managed-lakehouse/tables/{tableId}/maintenance
Content-Type: application/json

{
  "operation": "full_maintenance"
}
Available operations: snapshot_expiry, orphan_cleanup, compaction, metadata_cleanup, delta_checkpoint, full_maintenance.

Troubleshooting

SymptomLikely causeFix
”Iceberg catalog not found”Glue database doesn’t exist or wrong regionCreate the Glue database first; verify the region matches your bucket
”io scheme not registered”Missing S3/GCS I/O driverThis is an internal error — contact support
”PermanentRedirect” on S3Region mismatch between bucket and GlueEnsure the bucket region and Glue catalog region are the same
Delta commit failed, Iceberg succeededTransient storage errorThe destination retries Delta once automatically. Check S3 permissions.
”Access denied” on GlueIAM credential lacks Glue permissionsAdd glue:CreateTable, glue:UpdateTable, glue:GetTable to the IAM policy
Table visible in Iceberg but not DeltaOnly Iceberg format enabledCheck the format selection — enable Delta in the node configuration
Slow writesMany small batchesIncrease pipeline batch size to 10,000+ rows

Comparison with other destinations

FeatureManaged LakehouseDelta LakeIcebergFabric / OneLake
FormatsIceberg + DeltaDelta onlyIceberg onlyDelta only
Cloud supportS3, GCS, AzureS3, GCS, AzureS3, GCS, AzureOneLake only
CatalogGlue, RESTNone (file-based)Glue, RESTFabric
Write modesAppend, Overwrite, MergeAppend, OverwriteAppend, Overwrite, MergeAppend, Upsert
Time travelYes (Iceberg snapshots)Yes (Delta versions)YesYes
Schema evolutionYesYesYesYes
Dual-engine accessYesDelta engines onlyIceberg engines onlyFabric only
TierProfessional+All tiersProfessional+All tiers

Delta Lake destination

Standalone Delta Lake destination for simpler single-format workflows.

Destination nodes

All destination node types including Write, Cloud Destination, and Iceberg.

Cloud storage

Configure S3, GCS, and Azure Blob connections used by the lakehouse.

Data contracts

Enforce schema and quality rules before data lands in your lakehouse.