Workload

Redshift stored procedures & UDFs to BigQuery

Re-home procedural logic—stored procedures, UDFs, and operational utilities—into BigQuery routines with an explicit behavior contract and validation harness so retries and backfills don’t change outcomes.

At a glance
Input
Redshift Stored procedure / UDF migration logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • Hidden state coupling: reliance on temp tables, session variables, or search_path-like behavior.
  • Mixed-type branches: CASE/IF returns mixed types; BigQuery requires explicit casts to preserve intent.
  • NULL semantics drift: comparisons and string functions behave differently unless made explicit.
Context

Why this breaks

In Redshift estates, procedural logic often carries the run contract: control tables, watermark advancement, dynamic SQL for partition/window refresh, and UDFs that embed business rules and typing assumptions. When migrated naïvely, routines may compile in BigQuery but drift in behavior—especially under retries, partial failures, and backfills.

Common symptoms after migration:

  • UDF outputs drift due to type coercion and NULL handling differences
  • Dynamic SQL changes behavior (quoting, identifier resolution, parameter binding)
  • Error handling semantics differ; pipelines fail differently or silently continue
  • Side effects (audit/control tables) aren’t recreated, breaking restartability
  • Row-by-row procedural patterns become expensive and fragile in BigQuery

A successful migration turns implicit behavior into a testable contract and validates it with a replayable harness.

Approach

How conversion works

  1. Inventory & classify procedures and UDFs, plus call sites across ETL and BI. Map dependencies (tables, views, control/audit tables).
  2. Extract the behavior contract: inputs/outputs, side effects, error semantics, state assumptions, and performance constraints.
  3. Choose target form per routine:
    • BigQuery SQL UDF for pure expressions
    • BigQuery JavaScript UDF for complex logic where JS is appropriate
    • BigQuery stored procedure (SQL scripting) for multi-statement control flow
    • Set-based refactor where procedural loops can be eliminated
  4. Rewrite dynamic SQL safely using parameter binding and explicit identifier rules.
  5. Validate with a harness: golden inputs/outputs, branch coverage, failure-mode tests, and side-effect assertions (control/audit writes).

Supported constructs

Representative Redshift procedural constructs we commonly migrate to BigQuery routines (exact coverage depends on your estate).

SourceTargetNotes
Redshift UDFs (SQL)BigQuery SQL UDFsPure expressions mapped with explicit casts and NULL behavior.
Redshift stored proceduresBigQuery stored procedures (SQL scripting)Control flow rewritten; state and side effects modeled explicitly.
Dynamic SQL patternsEXECUTE IMMEDIATE with parameter bindingNormalize identifier rules; reduce drift and injection risk.
Control tables for restartabilityApplied-window tracking + idempotency markersReruns are safe and auditable under failures/backfills.
Utility macros in ETL scriptsReusable views/UDFs/proceduresConsolidate reusable patterns into testable BigQuery assets.
Row-by-row transformsSet-based SQL refactorsAvoids cost and reliability cliffs in BigQuery.

How workload changes

TopicRedshiftBigQuery
Execution modelProcedures often rely on session context and ETL conventionsRoutines should be explicit about state, inputs, and side effects
Dynamic SQLConcatenated SQL strings commonPrefer parameter binding and explicit identifier rules
PerformanceRow-by-row loops sometimes toleratedSet-based refactors usually required for cost/latency
RestartabilityControl tables and script chains encode rerun behaviorIdempotency markers + applied-window tracking enforced
Execution model: We convert hidden assumptions into testable contracts.
Dynamic SQL: Reduces drift and injection risk.
Performance: Procedural loops are a frequent hidden cost after migration.
Restartability: Retries/backfills validated via simulations.

Examples

Illustrative patterns for moving Redshift UDF/procedure logic into BigQuery routines. Adjust datasets, types, and identifiers to match your environment.

-- BigQuery SQL UDF example
CREATE OR REPLACE FUNCTION `proj.util.safe_div`(n NUMERIC, d NUMERIC) AS (
  IF(d IS NULL OR d = 0, NULL, n / d)
);
Avoid

Common pitfalls

  • Hidden state coupling: reliance on temp tables, session variables, or search_path-like behavior.
  • Mixed-type branches: CASE/IF returns mixed types; BigQuery requires explicit casts to preserve intent.
  • NULL semantics drift: comparisons and string functions behave differently unless made explicit.
  • Dynamic SQL injection risk: string concatenation without bindings/escaping causes drift and security risk.
  • Side effects ignored: control-table and audit updates not modeled; retries/backfills double-apply or skip.
  • Row-by-row loops: procedural loops that should be rewritten as set-based SQL for cost and reliability.
  • No harness: without replayable tests, parity becomes a debate at cutover.
Proof

Validation approach

  • Compile + interface checks: each routine deploys and signatures match the contract (args/return types).
  • Golden tests: curated input sets validate outputs, including NULL-heavy and boundary cases.
  • Branch + failure-mode coverage: expected failures (invalid inputs, missing rows) are tested.
  • Side-effect verification: assert expected writes to control/log/audit tables and idempotency under retries.
  • Integration replay: run routines within representative pipelines and compare downstream KPIs/aggregates.
  • Performance gate: confirm no hidden row-by-row scans; set-based refactors validated with scan bytes/runtime baselines.
Execution

Migration steps

A sequence that keeps behavior explicit, testable, and safe to cut over.
  1. 01

    Inventory routines and build the call graph

    Collect Redshift procedures and UDFs, their callers, and dependent objects. Identify side effects: control/audit tables, temp objects, and dynamic SQL paths.

  2. 02

    Define the behavior contract

    For each routine, specify inputs/outputs, expected errors, side effects, restart semantics, and performance expectations. Decide the target form (SQL UDF, JS UDF, procedure, or set-based refactor).

  3. 03

    Convert logic with safety patterns

    Rewrite casts and NULL behavior explicitly, migrate dynamic SQL using parameter binding, and refactor row-by-row patterns into set-based SQL where feasible.

  4. 04

    Build a validation harness

    Create golden inputs, boundary cases, and failure-mode tests. Validate outputs and side effects deterministically so parity isn’t debated during cutover.

  5. 05

    Integrate, replay, and cut over

    Run routines within representative pipelines, compare downstream KPIs, validate reruns/backfills, and cut over behind gates with rollback-ready criteria.

Workload Assessment
Migrate procedural logic with a test harness

We inventory your Redshift procedures and UDFs, migrate a representative subset into BigQuery routines, and deliver a harness that proves parity—including side effects and rerun behavior.

Migration Acceleration
Cut over routines with proof-backed sign-off

Get an actionable conversion plan, review markers for ambiguous intent, and validation artifacts so procedural logic cutover is gated by evidence and rollback criteria.

FAQ

Frequently asked questions

Do we have to rewrite everything as BigQuery stored procedures?+
Not necessarily. Many utilities belong as SQL UDFs or views, and some procedural logic is best refactored into set-based SQL. We choose the target form per routine to minimize risk and cost.
What about dynamic SQL-heavy procedures?+
We migrate dynamic SQL using safe patterns: parameter binding, explicit escaping, and normalized identifier rules. Where dynamic SQL is overused, we refactor into deterministic templates.
How do you prove behavior parity?+
We build a replayable harness with golden inputs/outputs, branch and failure-mode coverage, and side-effect checks. Integration replay validates downstream metrics before cutover.
Will performance change after migration?+
It can. Row-by-row loops are the most common risk. We refactor to set-based SQL where possible and gate performance with scan bytes/runtime baselines so routines don’t become hidden BigQuery spend.