Databricks notebook support
When your admin configures a Databricks connection, you can:- Select a cluster or SQL warehouse policy you are allowed to use
- Run
%sql,%python, or notebook cells that compile to Spark jobs - Browse Unity Catalog objects your entitlement exposes
- Interactive clusters
- Job clusters
Use for exploration; remember to terminate idle clusters to control cost.
Spark execution
Spark execution details depend on your deployment:- Cluster manager – Databricks, EMR, Dataproc, or on-prem YARN/Kubernetes
- Runtime version – align with production jobs to avoid subtle function differences
- Libraries – install required JARs/Python wheels via cluster init scripts or workspace libraries
Performance and cost
Partitioning
Partitioning
Filter early on partition columns; avoid
collect() on large datasets—write results to staged tables instead.Caching
Caching
Cache only when reuse within the session justifies memory; drop caches before long idle periods.
Observability
Observability
Link long-running notebook jobs to cost insights tags so finance sees Spark spend separately from warehouse SQL.
Related topics
Notebooks overview
Capabilities and enterprise requirements.
Compute
Manage cluster policies and defaults.