Spark integration - Planasonix

Databricks notebook support

When your admin configures a Databricks connection, you can:

Select a cluster or SQL warehouse policy you are allowed to use

Run %sql, %python, or notebook cells that compile to Spark jobs

Browse Unity Catalog objects your entitlement exposes

Use for exploration; remember to terminate idle clusters to control cost.

Follow your organization’s personal access token or OAuth standards; never commit tokens into notebook source.

Spark execution

Spark execution details depend on your deployment:

Cluster manager – Databricks, EMR, Dataproc, or on-prem YARN/Kubernetes

Runtime version – align with production jobs to avoid subtle function differences

Libraries – install required JARs/Python wheels via cluster init scripts or workspace libraries

Match Spark SQL dialect features between notebook and production transforms to avoid “works in dev, fails in prod” surprises.

Performance and cost

Partitioning

Filter early on partition columns; avoid collect() on large datasets—write results to staged tables instead.

Caching

Cache only when reuse within the session justifies memory; drop caches before long idle periods.

Observability

Link long-running notebook jobs to cost insights tags so finance sees Spark spend separately from warehouse SQL.

Notebooks overview

Capabilities and enterprise requirements.

Manage cluster policies and defaults.