Workload

Snowflake SQL queries to BigQuery

Translate Snowflake-specific constructs—QUALIFY, VARIANT/FLATTEN, IFF/DECODE, time semantics, and MERGE/upsert patterns—into BigQuery Standard SQL with validation gates that catch semantic drift.

At a glance
Input
Snowflake SQL / query migration logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • QUALIFY drift: ordering is underspecified (no stable tie-breaker), so ‘top-1’ rows change under retries or parallelism.
  • JSON type surprises: values extracted as strings (or NULL) and silently change filter selectivity unless casted at the boundary.
  • Timezone assumptions: session timezone or NTZ/LTZ behavior is implicit in Snowflake; BigQuery requires explicit timezone conversions when business meaning depends on local time.
Context

Why Snowflake → BigQuery SQL breaks in real migrations

Most Snowflake SQL estates contain implicit correctness rules—NULL handling, casting, timezone assumptions, and semi-structured extraction—that can compile in BigQuery yet drift in meaning. The goal is not just syntactic translation, but a provable parity contract for your highest-impact queries.

Common symptoms after cutover:

  • KPI drift from unstable QUALIFY/window ordering or missing tie-breakers
  • JSON/VARIANT extraction returns different types (string vs numeric) and breaks filters
  • Timestamp logic shifts because session timezone was implicit in Snowflake
  • Full-table scans and cost spikes from missing partition pruning patterns
Approach

How conversion works

We convert Snowflake SQL to BigQuery by combining deterministic rewrite rules with a review-and-validate loop for the constructs that commonly carry hidden business meaning.

  1. Inventory & classify queries/views/models by consumer (BI, ELT, app), complexity, and risk patterns (QUALIFY, JSON, timezone, MERGE).
  2. Rewrite with rule-anchored mappings function equivalents, identifier/quoting normalization, and explicit cast strategies.
  3. Apply pattern libraries for QUALIFY/window filters, FLATTEN→UNNEST, VARIANT→JSON, and timezone normalization.
  4. Validate & gate with compilation, catalog checks, and golden-query parity + regression tests for edge windows.

Supported constructs

Representative constructs we commonly handle for Snowflake → BigQuery SQL conversion (exact coverage depends on your estate).

SourceTargetNotes
QUALIFY + windowed filtersQUALIFY (BigQuery) with deterministic ORDER BYTie-breakers enforced to avoid nondeterministic drift.
VARIANT access (col:path / GET / PARSE_JSON)JSON_VALUE/JSON_QUERY + explicit castsExtraction boundary defines types; ambiguous paths are review-marked.
FLATTEN + LATERAL joinsUNNEST patterns (INNER/LEFT as required)Empty arrays preserved when needed (LEFT JOIN UNNEST).
LISTAGG / STRING aggregationSTRING_AGG with ORDER BYOrdering preserved; distinct/NULL handling made explicit.
IFF / DECODE / NVLIF / CASE / COALESCEExplicit casts added where BigQuery coercion differs.
DATEADD/DATEDIFF/DATE_TRUNCDATE_ADD/DATE_DIFF/DATE_TRUNCUnit semantics normalized; timestamp vs date handled explicitly.

How the workload changes in BigQuery

TopicSnowflakeBigQuery
Dialect + functionsSnowflake SQL + rich VARIANT helpersBigQuery Standard SQL + JSON functions
Semi-structuredVARIANT + FLATTEN patternsJSON + UNNEST patterns
Performance modelMicro-partitions + clustering effectsPartition pruning + clustering + slot usage
Dialect + functions: Function mapping is easy; type intent and casting rules are the hard part.
Semi-structured: Choose scalar vs object extraction and preserve empty-array semantics intentionally.
Performance model: Converted queries often need pruning-aware rewrites to keep cost predictable.

Snowflake patterns → BigQuery equivalents

Examples show deterministic windowing, VARIANT/JSON extraction, and FLATTEN→UNNEST rewrites. Adjust paths and types to your schema.

-- Snowflake: QUALIFY + deterministic dedupe
SELECT *
FROM events
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY business_key
  ORDER BY event_ts DESC, src_lsn DESC NULLS LAST, ingested_at DESC
) = 1;
Info

Make ordering deterministic

If a query uses QUALIFY or windowed dedupe, enforce explicit tie-breakers (event time + stable offset + ingest time). BigQuery will happily execute an underspecified ORDER BY—until results drift.

Info

Treat JSON types as a contract

Snowflake VARIANT access frequently relies on implicit casts. In BigQuery, choose JSON_VALUE vs JSON_QUERY intentionally and cast to the target scalar type at the extraction boundary.

Avoid

Common pitfalls to avoid

  • QUALIFY drift: ordering is underspecified (no stable tie-breaker), so ‘top-1’ rows change under retries or parallelism.
  • JSON type surprises: values extracted as strings (or NULL) and silently change filter selectivity unless casted at the boundary.
  • Timezone assumptions: session timezone or NTZ/LTZ behavior is implicit in Snowflake; BigQuery requires explicit timezone conversions when business meaning depends on local time.
  • Cost regression: missing partition filters and join reordering cause unbounded scans; prune-aware rewrites should be part of conversion, not an afterthought.
Proof

Validation and reconciliation gates

We validate SQL conversion as a repeatable system: compile correctness, semantic parity for golden queries, and regression protection for edge cases (ties, late data windows, null-heavy cohorts).

Execution checks

  • Compiles under BigQuery Standard SQL
  • Explicit casts + deterministic ordering enforced

Parity checks

  • Golden-query output parity (agreed windows + parameters)
  • KPI aggregates by key dimensions; checksum aggregates for critical facts
  • Sampling diffs on edge cohorts (ties, null-heavy segments, timezone boundary days)

Performance checks

  • Partition pruning verified on top queries
  • Cost/latency baselines captured; regression thresholds defined
Execution

Migration steps

A typical sequence that keeps correctness measurable and cutover controlled.
  1. 01

    Inventory and prioritize the SQL estate

    Collect BI extracts, view DDL, dbt/ELT models, and app-embedded SQL. Rank by business impact, frequency, and risk patterns (QUALIFY, VARIANT, timezones, MERGE).

  2. 02

    Define the parity contract

    Lock in semantic expectations: NULL rules, casting strategy, timezone contract, tie-breakers for window logic, and JSON typing. Select a ‘golden query’ set for sign-off.

  3. 03

    Convert with rules + review markers

    Apply deterministic mappings for the common cases; flag ambiguous intent (implicit casts, JSON path uncertainty, collation expectations) with inline review markers.

  4. 04

    Run validation gates

    Compile in BigQuery, run smoke tests, execute golden queries, and compare aggregates/checksums over agreed windows. Fail fast on drift and fix at the pattern level.

  5. 05

    Performance-safe rewrites

    Add pruning-aware patterns (partition filters, join order, pre-aggregation) so converted queries don’t become unbounded scans. Confirm cost/latency with representative volumes.

Workload Assessment
Convert SQL with parity gates, not hope

We inventory your Snowflake SQL estate, convert a representative slice, and produce parity evidence on golden queries—plus a risk register for the constructs that carry business meaning.

Migration Acceleration
Ship BigQuery queries with proof-backed sign-off

Get a conversion plan, review markers, and validation artifacts (compile + parity + regression) so cutover is gated by evidence and rollback criteria.

FAQ

Frequently asked questions

Is Snowflake SQL directly compatible with BigQuery?+
Often it compiles after basic rewrites, but semantic parity depends on explicit ordering, casts, timezone rules, and JSON typing. We treat those as contracts and validate them with golden queries.
Can we preserve QUALIFY behavior exactly?+
Yes if we make ordering deterministic. When Snowflake queries rely on implicit ordering or ties, we add tie-breakers and validate with edge-case cohorts so ‘top-1’ selection never drifts.
What about VARIANT and FLATTEN-heavy workloads?+
We rewrite to JSON_VALUE/JSON_QUERY and UNNEST patterns, and we cast at extraction boundaries. We also preserve empty-array semantics intentionally (INNER vs LEFT UNNEST).
Do you optimize queries for BigQuery cost?+
Yes. We treat pruning as part of the conversion: partition filters, join patterns, and pre-aggregation where needed. Validation includes cost/latency baselines for the top queries.