> ## Documentation Index > Fetch the complete documentation index at: https://docs.planasonix.com/llms.txt > Use this file to discover all available pages before exploring further. # Spark integration > Run notebooks with Spark and Databricks compute. Spark-backed notebooks let you work on data too large for single-node Python loops. Planasonix integrates with **Databricks** and compatible Spark clusters so you submit jobs using workspace-managed credentials. ## Databricks notebook support When your admin configures a **Databricks** connection, you can: * Select a **cluster** or **SQL warehouse** policy you are allowed to use * Run `%sql`, `%python`, or notebook cells that compile to Spark jobs * Browse Unity Catalog objects your entitlement exposes Use for exploration; remember to terminate idle clusters to control cost. Prefer job clusters for scheduled notebook nodes so policies enforce autoscaling and timeouts. Follow your organization’s **personal access token** or **OAuth** standards; never commit tokens into notebook source. ## Spark execution **Spark execution** details depend on your deployment: * **Cluster manager** – Databricks, EMR, Dataproc, or on-prem YARN/Kubernetes * **Runtime version** – align with production jobs to avoid subtle function differences * **Libraries** – install required JARs/Python wheels via cluster init scripts or workspace libraries Match Spark SQL dialect features between notebook and production transforms to avoid “works in dev, fails in prod” surprises. ## Performance and cost Filter early on partition columns; avoid `collect()` on large datasets—write results to staged tables instead. Cache only when reuse within the session justifies memory; drop caches before long idle periods. Link long-running notebook jobs to [cost insights](/observability/cost-insights) tags so finance sees Spark spend separately from warehouse SQL. ## Related topics Capabilities and enterprise requirements. Manage cluster policies and defaults.