> ## Documentation Index > Fetch the complete documentation index at: https://docs.planasonix.com/llms.txt > Use this file to discover all available pages before exploring further. # Pipeline troubleshooting > Diagnose and resolve pipeline execution failures. When a pipeline fails, you want three answers: **which node**, **why**, and **whether the data is safe to retry**. Planasonix surfaces that through run history, **node status**, and **logs**. This page groups the failures operators see most often and the fastest ways to narrow them down. ## Common errors A single node turns red while upstream stayed green. Open **node details** for the **exception class**, **SQL state**, or **HTTP status**. Typical causes: **syntax** after variable substitution, **missing file**, **forbidden API**, or **type coercion** on null-heavy columns. Fix the node config or guard with **null-safe** expressions and **default values**. Errors mention **heap**, **spill**, **out of memory**, or **container killed**. Reduce **fan-in** (widen filters, push aggregation down), lower **parallelism**, or move heavy transforms to the **warehouse**. For file scans, narrow **partition pruning** and avoid reading wide Parquet into the driver. **Query timeout**, **stage timeout**, or **lease expired** usually means the warehouse or API is slower than your SLA. Tune **statement timeout** per environment, **batch size**, or **retry with backoff** for throttled APIs. Confirm the slowness is not **lock contention** on the source. **Column not found**, **cannot cast**, or **extra fields** appear after source upgrades. Refresh **schema drift** detection, update **contracts**, or add **evolve schema** steps. For semi-structured data, enforce **JSON paths** and reject unknown keys in production. ## Debugging with preview, node logs, and run details From the pipeline canvas or **Runs** list, select the **attempt** with the error badge. Note **start time**, **environment**, and **parameter overrides**. Expand the failed node and load **logs** filtered to **Error** and **Warn**. Follow stack traces to the first **caused by** line—later messages are often cascading. **Preview** samples rows through the subgraph. Use **limited** row counts and **masked** columns for PII. Preview hits the same connections as production—respect **rate limits** on external APIs. Use **diff** on Git commits or **version history** if the pipeline changed between green and red runs. Roll back one change at a time. Some failures are **transient** (network blips). Use **retry policies** on idempotent branches instead of manual reruns for every flake. ## Custom SQL engine errors Custom SQL nodes use **Apache DataFusion** (PostgreSQL-compatible) as the local engine. Common issues: DataFusion supports standard PostgreSQL functions. DuckDB-specific functions like `MEDIAN`, `LIST`, `EPOCH`, `STRFTIME`, and `REGEXP_MATCHES` are not available. Use standard equivalents: `PERCENTILE_CONT(0.5)` for median, `ARRAY_AGG` for list aggregation, `EXTRACT(EPOCH FROM ...)` for epoch, `TO_CHAR` for date formatting, and `~` operator for regex. DataFusion does not support `PIVOT` or `UNPIVOT` SQL syntax. Use `CASE WHEN` with aggregation for pivoting, or `UNION ALL` for unpivoting. See the [Custom SQL reference](/nodes/custom-sql) for examples. Use `TRY_CAST(value AS type)` instead of `CAST` when the input may contain invalid values. `TRY_CAST` returns `NULL` on failure instead of raising an error. Large queries may exceed the context deadline. Reduce the dataset with filters, or switch the execution engine to **Warehouse** to leverage distributed compute. ## Performance optimization tips Filter and project in **source queries** or **warehouse SQL** before you move large datasets through the orchestration tier. Too many tiny files hurts listing; too few huge files hurts parallelism. Aim for **128–512 MB** compressed objects where the format allows, subject to source constraints. Reuse **broadcast** or **cached** small lookups instead of repeating joins on every micro-batch in streaming paths. Backpressure in **streaming** or **orchestrated** jobs shows up as growing **lag** before hard failures. Alert on lag early. Capture a **baseline** duration after a healthy run; alert when p95 **doubles**—that catches regression before hard timeouts fire. ## Related topics Automated anomaly hints across runs. Inspect rejected rows and poison messages.