Stored procedures, macros & UDFs for Databricks → Snowflake
Re-home reusable logic—from Spark SQL UDFs and notebook “macro” utilities to Snowflake UDFs and stored procedures—so behavior stays stable under retries, backfills, and real concurrency.
- Input
- Databricks Stored procedure / UDF migration logic
- Output
- Snowflake equivalent (validated)
- Common pitfalls
- Hidden dependencies: notebook macros rely on implicit session configs, catalogs, or temp views.
- Mixed-type branches: CASE/IF branches return different types; Snowflake needs explicit casts to preserve intent.
- NULL semantics drift: null-safe equality, joins, and string functions behave differently if not explicit.
Why this breaks
In Databricks estates, “procedural” logic is often distributed: Spark SQL UDFs, notebook helper functions, job parameters, and macro-like snippets embedded across pipelines. During migration, teams focus on SQL translation and MERGE patterns, but these reusable utilities quietly change behavior—causing downstream drift.
Common symptoms after migration:
- UDF outputs drift due to type coercion and NULL behavior differences
- Notebook macros that relied on Spark functions or implicit session settings stop matching
- Error handling changes (fail-fast vs continue), breaking orchestration expectations
- Dynamic SQL and identifier resolution behaves differently in Snowflake
- Side effects (audit/control tables) aren’t modeled, so retries/backfills double-apply or skip
A successful migration turns this scattered logic into explicit Snowflake routines with a test harness that proves parity.
How conversion works
- Inventory & classify reusable logic: Spark SQL UDFs, notebook utilities/macros, job libraries, and call sites across pipelines and BI layers.
- Extract the behavior contract: inputs/outputs, typing, null rules, side effects, error semantics, and performance constraints.
- Choose Snowflake target form per asset:
- SQL UDF for pure expressions
- JavaScript UDF for complex logic where JS is the right fit
- Stored procedure for control flow and multi-statement logic
- Set-based refactor where loop/cursor patterns appear
- Translate and normalize: casts, null-safe comparisons, timestamp intent (NTZ/LTZ/TZ), and deterministic ordering where logic depends on ranking/dedup.
- Validate with a harness: golden inputs/outputs, branch coverage, failure-mode tests, and side-effect assertions—then integrate into a representative pipeline slice.
Supported constructs
Representative Databricks procedural/macro constructs we commonly migrate to Snowflake routines (exact coverage depends on your estate).
| Source | Target | Notes |
|---|---|---|
| Spark SQL UDFs | Snowflake SQL UDFs | Pure expressions mapped with explicit casts and NULL behavior. |
| Notebook macro utilities | Snowflake procedures + views/UDFs | Hidden dependencies made explicit; interfaces stabilized. |
| Dynamic SQL generation | EXECUTE IMMEDIATE patterns with safe parameterization | Identifier resolution and escaping normalized. |
| Control-table driven rerun behavior | Idempotency markers + applied-window tracking | Retries/backfills become safe and auditable. |
| Timestamp/timezone logic | Explicit NTZ/LTZ/TZ handling | Prevents boundary-day drift in reporting and SCD logic. |
| Row-by-row notebook loops | Set-based SQL refactors | Avoids performance and cost cliffs. |
How workload changes
| Topic | Databricks | Snowflake |
|---|---|---|
| Where logic lives | Notebooks, jobs, Spark UDFs, and scattered utility code | Centralized routines (UDFs/procedures) with explicit contracts |
| Typing and coercion | Spark implicit casts often tolerated | Explicit casts required for stable outputs |
| Operational behavior | Reruns/retries often encoded in job structure | Idempotency and side effects must be explicit |
Examples
Illustrative patterns for moving Databricks UDFs and macro-style utilities into Snowflake routines. Adjust databases/schemas and types to match your environment.
-- Snowflake SQL UDF example
CREATE OR REPLACE FUNCTION util.safe_div(n NUMBER(38,6), d NUMBER(38,6))
RETURNS NUMBER(38,6)
AS
$$
IFF(d IS NULL OR d = 0, NULL, n / d)
$$;Common pitfalls
- Hidden dependencies: notebook macros rely on implicit session configs, catalogs, or temp views.
- Mixed-type branches: CASE/IF branches return different types; Snowflake needs explicit casts to preserve intent.
- NULL semantics drift: null-safe equality, joins, and string functions behave differently if not explicit.
- Dynamic SQL risks: string concatenation without bindings/escaping causes drift and security risk.
- Side effects ignored: audit/control table updates not recreated, breaking idempotency and restartability.
- Row-by-row patterns: Spark-era loops in notebooks become expensive or fragile if moved verbatim; prefer set-based SQL.
- No harness: without replayable tests, parity becomes a debate at cutover.
Validation approach
- Compile + interface checks: every UDF/procedure deploys successfully; argument/return types match the contract.
- Golden tests: curated input sets validate outputs for each routine (including NULL-heavy and boundary values).
- Branch + failure-mode coverage: exercise key branches and expected failures.
- Side-effect verification: assert expected writes to control/log/audit tables and idempotency behavior under retries.
- Integration replay: run routines inside representative pipelines and compare downstream KPIs/aggregates.
- Performance gate: confirm no hidden row-by-row scans; refactors validated with runtime/credit baselines.
Migration steps
- 01
Inventory reusable logic and call sites
Collect Spark SQL UDFs, notebook utilities/macros, and shared libraries. Map call sites across pipelines, dbt models, and BI queries.
- 02
Define the behavior contract
Specify inputs/outputs, typing, NULL rules, error semantics, side effects, and performance expectations. Decide target form (SQL UDF, JS UDF, procedure, or refactor).
- 03
Convert with safety patterns
Make casts explicit, normalize timezone intent, implement null-safe comparisons, and migrate dynamic SQL using safe parameterization and explicit identifier rules.
- 04
Build a replayable harness
Create golden input sets, boundary cases, and expected failures. Validate outputs and side effects deterministically so parity isn’t debated at cutover.
- 05
Integrate and cut over behind gates
Run routines in representative pipelines, compare downstream KPIs, validate retries/backfills, and cut over with rollback-ready criteria.
We inventory your UDFs and notebook macros, migrate a representative subset into Snowflake routines, and deliver a harness that proves parity—including side effects and failure behavior.
Get a conversion plan, review markers for ambiguous intent, and validation artifacts so procedural logic cutover is gated by evidence and rollback criteria.