Overview
Streaming mode enables continuous data ingestion by committing data at regular intervals instead of waiting for the entire pipeline run to complete.Configuration
Enable streaming mode in the Managed Lakehouse destination node:| Setting | Default | Description |
|---|---|---|
| Streaming Mode | Off | Enable micro-batch commits |
| Commit Interval | 60 seconds | Maximum time between commits |
| Row Limit | 100,000 | Maximum rows before forced commit |
Backpressure
When storage upload latency increases, the system automatically reduces batch sizes to prevent memory buildup:| Avg Commit Latency | Batch Size | Action |
|---|---|---|
| < 2 seconds | 10,000 (default) | Normal operation |
| 2–5 seconds | 5,000 | Moderate backpressure |
| > 5 seconds | 2,000 | Heavy backpressure, warning logged |
GetBackpressureProfile() to adjust ArrowTuning accordingly.
Metrics
Monitor streaming throughput in the pipeline run summary:Tier Limits
| Professional | Premium | Enterprise | |
|---|---|---|---|
| Streaming tables | 2 | 10 | Unlimited |
| Min commit interval | 5 min | 1 min | 10 sec |
Best Practices
Start with 60s intervals
Begin with the default 60-second interval and adjust based on observed latency.
Monitor backpressure
If you see batch size reductions in logs, your storage may need scaling.
Use with compaction
Streaming creates many small files. Enable compaction via table maintenance.
Arrow-native sources
For maximum throughput, use Arrow-native sources (Kafka, Parquet) with streaming mode.