Workload

Redshift stored procedures & UDFs to BigQuery

Re-home procedural logic—stored procedures, UDFs, and operational utilities—into BigQuery routines with an explicit behavior contract and validation harness so retries and backfills don’t change outcomes.

Book assessment See migration approach

Back to pair page

At a glance

Input: Redshift Stored procedure / UDF migration logic
Output: BigQuery equivalent (validated)
Common pitfalls: Hidden state coupling: reliance on temp tables, session variables, or search_path-like behavior.
Mixed-type branches: CASE/IF returns mixed types; BigQuery requires explicit casts to preserve intent.
NULL semantics drift: comparisons and string functions behave differently unless made explicit.

Context

Why this breaks

In Redshift estates, procedural logic often carries the run contract: control tables, watermark advancement, dynamic SQL for partition/window refresh, and UDFs that embed business rules and typing assumptions. When migrated naïvely, routines may compile in BigQuery but drift in behavior—especially under retries, partial failures, and backfills.

Common symptoms after migration:

UDF outputs drift due to type coercion and NULL handling differences
Dynamic SQL changes behavior (quoting, identifier resolution, parameter binding)
Error handling semantics differ; pipelines fail differently or silently continue
Side effects (audit/control tables) aren’t recreated, breaking restartability
Row-by-row procedural patterns become expensive and fragile in BigQuery

A successful migration turns implicit behavior into a testable contract and validates it with a replayable harness.

Approach

How conversion works

Inventory & classify procedures and UDFs, plus call sites across ETL and BI. Map dependencies (tables, views, control/audit tables).
Extract the behavior contract: inputs/outputs, side effects, error semantics, state assumptions, and performance constraints.
Choose target form per routine:
- BigQuery SQL UDF for pure expressions
- BigQuery JavaScript UDF for complex logic where JS is appropriate
- BigQuery stored procedure (SQL scripting) for multi-statement control flow
- Set-based refactor where procedural loops can be eliminated
Rewrite dynamic SQL safely using parameter binding and explicit identifier rules.
Validate with a harness: golden inputs/outputs, branch coverage, failure-mode tests, and side-effect assertions (control/audit writes).

Supported constructs

Representative Redshift procedural constructs we commonly migrate to BigQuery routines (exact coverage depends on your estate).

Source	Target	Notes
Redshift UDFs (SQL)	BigQuery SQL UDFs	Pure expressions mapped with explicit casts and NULL behavior.
Redshift stored procedures	BigQuery stored procedures (SQL scripting)	Control flow rewritten; state and side effects modeled explicitly.
Dynamic SQL patterns	EXECUTE IMMEDIATE with parameter binding	Normalize identifier rules; reduce drift and injection risk.
Control tables for restartability	Applied-window tracking + idempotency markers	Reruns are safe and auditable under failures/backfills.
Utility macros in ETL scripts	Reusable views/UDFs/procedures	Consolidate reusable patterns into testable BigQuery assets.
Row-by-row transforms	Set-based SQL refactors	Avoids cost and reliability cliffs in BigQuery.

How workload changes

Topic	Redshift	BigQuery
Execution model	Procedures often rely on session context and ETL conventions	Routines should be explicit about state, inputs, and side effects
Dynamic SQL	Concatenated SQL strings common	Prefer parameter binding and explicit identifier rules
Performance	Row-by-row loops sometimes tolerated	Set-based refactors usually required for cost/latency
Restartability	Control tables and script chains encode rerun behavior	Idempotency markers + applied-window tracking enforced

Execution model: We convert hidden assumptions into testable contracts.

Dynamic SQL: Reduces drift and injection risk.

Performance: Procedural loops are a frequent hidden cost after migration.

Restartability: Retries/backfills validated via simulations.

Examples

Illustrative patterns for moving Redshift UDF/procedure logic into BigQuery routines. Adjust datasets, types, and identifiers to match your environment.

-- BigQuery SQL UDF example
CREATE OR REPLACE FUNCTION `proj.util.safe_div`(n NUMERIC, d NUMERIC) AS (
  IF(d IS NULL OR d = 0, NULL, n / d)
);

Avoid

Common pitfalls

Hidden state coupling: reliance on temp tables, session variables, or search_path-like behavior.
Mixed-type branches: CASE/IF returns mixed types; BigQuery requires explicit casts to preserve intent.
NULL semantics drift: comparisons and string functions behave differently unless made explicit.
Dynamic SQL injection risk: string concatenation without bindings/escaping causes drift and security risk.
Side effects ignored: control-table and audit updates not modeled; retries/backfills double-apply or skip.
Row-by-row loops: procedural loops that should be rewritten as set-based SQL for cost and reliability.
No harness: without replayable tests, parity becomes a debate at cutover.

Proof

Validation approach

Compile + interface checks: each routine deploys and signatures match the contract (args/return types).
Golden tests: curated input sets validate outputs, including NULL-heavy and boundary cases.
Branch + failure-mode coverage: expected failures (invalid inputs, missing rows) are tested.
Side-effect verification: assert expected writes to control/log/audit tables and idempotency under retries.
Integration replay: run routines within representative pipelines and compare downstream KPIs/aggregates.
Performance gate: confirm no hidden row-by-row scans; set-based refactors validated with scan bytes/runtime baselines.

Execution

Migration steps

A sequence that keeps behavior explicit, testable, and safe to cut over.

01
Inventory routines and build the call graph
Collect Redshift procedures and UDFs, their callers, and dependent objects. Identify side effects: control/audit tables, temp objects, and dynamic SQL paths.
02
Define the behavior contract
For each routine, specify inputs/outputs, expected errors, side effects, restart semantics, and performance expectations. Decide the target form (SQL UDF, JS UDF, procedure, or set-based refactor).
03
Convert logic with safety patterns
Rewrite casts and NULL behavior explicitly, migrate dynamic SQL using parameter binding, and refactor row-by-row patterns into set-based SQL where feasible.
04
Build a validation harness
Create golden inputs, boundary cases, and failure-mode tests. Validate outputs and side effects deterministically so parity isn’t debated during cutover.
05
Integrate, replay, and cut over
Run routines within representative pipelines, compare downstream KPIs, validate reruns/backfills, and cut over behind gates with rollback-ready criteria.

Workload Assessment

Migrate procedural logic with a test harness

We inventory your Redshift procedures and UDFs, migrate a representative subset into BigQuery routines, and deliver a harness that proves parity—including side effects and rerun behavior.

Book assessment See migration approach

Migration Acceleration

Cut over routines with proof-backed sign-off

Get an actionable conversion plan, review markers for ambiguous intent, and validation artifacts so procedural logic cutover is gated by evidence and rollback criteria.

Book assessment Explore workloads

FAQ

Frequently asked questions

Do we have to rewrite everything as BigQuery stored procedures?+

Not necessarily. Many utilities belong as SQL UDFs or views, and some procedural logic is best refactored into set-based SQL. We choose the target form per routine to minimize risk and cost.

What about dynamic SQL-heavy procedures?+

We migrate dynamic SQL using safe patterns: parameter binding, explicit escaping, and normalized identifier rules. Where dynamic SQL is overused, we refactor into deterministic templates.

How do you prove behavior parity?+

We build a replayable harness with golden inputs/outputs, branch and failure-mode coverage, and side-effect checks. Integration replay validates downstream metrics before cutover.

Will performance change after migration?+

It can. Row-by-row loops are the most common risk. We refactor to set-based SQL where possible and gate performance with scan bytes/runtime baselines so routines don’t become hidden BigQuery spend.

Redshift to BigQuery migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

SQL / query migration

Workload

Convert Redshift SQL to BigQuery with semantic parity and golden-query validation.

View page

ETL / pipeline migration

Workload

Migrate Redshift pipelines to BigQuery with restartability and incremental integrity gates.

View page