Workload

Performance tuning & optimization for Snowflake -> BigQuery

Make BigQuery fast and predictable after migration. We tune queries, table layout, and execution strategy so dashboards refresh on time and scan costs stay stable as volume grows.

At a glance
Input
Snowflake Performance tuning & optimization logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • Partitioning after the fact: migrating tables without aligning partitions to query filters.
  • Clustering by intuition: clustering keys chosen without evidence from real predicates and join keys.
  • Unbounded MERGE: applying MERGE without scoping to the affected partition window.
Context

Why this breaks

Snowflake and BigQuery reward different habits. After migration, teams often keep Snowflake-era query shapes and expect the optimizer to rescue performance. The result is predictable: slower dashboards, higher scan bytes, and unstable costs.

Common post-migration symptoms:

  • Queries scan entire tables because partition filters aren’t pushed down
  • Heavy joins reshuffle large datasets; BI queries become expensive and slow
  • MERGE/upsert jobs scan full targets due to missing pruning boundaries
  • Semi-structured transforms (VARIANT->JSON) add expensive casts and repeated extraction
  • Concurrency spikes cause slot contention or unpredictable runtimes

Optimization isn’t “nice to have.” It’s the difference between a successful BigQuery cutover and a permanent cost/perf firefight.

Approach

How conversion works

  1. Baseline the top workloads: identify the most expensive and most business-critical queries/pipelines (dashboards, marts, incremental loads).
  2. Diagnose root causes: scan bytes, join patterns, skew, partition pruning, repeated JSON extraction, and MERGE scopes.
  3. Tune table layout: partitioning, clustering, and staging boundaries aligned to query access paths.
  4. Rewrite for pruning and reuse: predicate pushdown-friendly filters, pre-aggregation, materialized views, and de-duplication of expensive transforms.
  5. Capacity & cost governance: reservations/autoscaling posture, concurrency controls, and cost guardrails.
  6. Regression gates: performance baselines + thresholds so future changes don’t reintroduce scan blowups.

Supported constructs

Representative tuning levers we apply for Snowflake -> BigQuery workloads.

SourceTargetNotes
Snowflake clustering/micro-partition effectsBigQuery partitioning + clusteringAlign layout to filter + join access paths to maximize pruning.
BI query patterns (Looker/Tableau/PowerBI)Pre-aggregation + materialized views (where appropriate)Reduce repeated scans and stabilize refresh SLAs.
MERGE/upsert workloadsPartition-scoped staging + MERGE boundariesAvoid full-target scans by scoping apply windows.
VARIANT-heavy transformsTyped extraction tables + reuseExtract once, cast once-then join/aggregate on typed columns.
Warehouse sizing and concurrencyReservations/slots + workload managementPredictable performance under peak refresh and batch windows.
Ad-hoc expensive queriesGovernance: guardrails + cost controlsPrevent scan blowups from new patterns and unmanaged access.

How workload changes

TopicSnowflakeBigQuery
Primary cost driverWarehouse credits (compute time)Bytes scanned + slot time
Data layout impactMicro-partitions and clustering can hide suboptimal SQLPartitioning/clustering must match access paths
Concurrency behaviorWarehouse scaling modelSlots/reservations + concurrency policies
Optimization styleOften query-level tweaks and warehouse tuningPruning-aware rewrites + materialization + governance
Primary cost driver: Pruning and query shape dominate spend.
Data layout impact: Layout decisions become first-class performance levers.
Concurrency behavior: Peak BI refresh needs explicit capacity posture.
Optimization style: Tuning is holistic: SQL + layout + capacity + guardrails.

Examples

Illustrative BigQuery optimization patterns: enforce pruning, extract JSON once, and scope MERGEs. Replace datasets and fields to match your environment.

-- Pruning-friendly pattern: ensure partition filter is present
-- Example: fact table partitioned by DATE(event_ts)
SELECT
  country,
  SUM(revenue) AS rev
FROM `proj.mart.fact_orders`
WHERE DATE(event_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;
Avoid

Common pitfalls

  • Partitioning after the fact: migrating tables without aligning partitions to query filters.
  • Clustering by intuition: clustering keys chosen without evidence from real predicates and join keys.
  • Unbounded MERGE: applying MERGE without scoping to the affected partition window.
  • Repeated JSON extraction: calling JSON_VALUE/JSON_QUERY many times per row instead of extracting once into typed columns.
  • Over-materialization: creating many intermediate tables/views without controlling refresh cost.
  • Ignoring concurrency: BI refresh spikes overwhelm slots/reservations and create tail latency.
  • No regression gates: performance improvements disappear after the next model change.
Proof

Validation approach

  • Baseline capture: for each top query/pipeline, record runtime, bytes scanned, slot time, and output row counts.
  • Plan-level checks: confirm partition pruning and predicate pushdown on representative parameters.
  • Before/after evidence: show improvements on runtime + scan bytes; record exceptions and tradeoffs.
  • Correctness guardrails: KPI aggregates and golden queries ensure tuning doesn’t change semantics.
  • Regression thresholds: define alert thresholds (e.g., +25% bytes scanned or +30% runtime) and enforce in CI or scheduled checks.
  • Operational monitors: post-tuning dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.
Execution

Migration steps

A sequence that improves performance while protecting semantics.
  1. 01

    Identify top cost and SLA drivers

    Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.

  2. 02

    Create baselines and targets

    Capture current BigQuery job metrics (runtime, scan bytes, slot time) and define improvement targets. Freeze golden outputs so correctness doesn’t regress.

  3. 03

    Tune layout: partitioning and clustering

    Align partition keys to the most common filters and refresh windows. Choose clustering keys based on observed predicates and join keys-not guesses.

  4. 04

    Rewrite for pruning and reuse

    Apply pruning-aware SQL rewrites, reduce reshuffles, pre-aggregate where needed, and extract semi-structured fields once into typed tables for reuse.

  5. 05

    Capacity posture and governance

    Set reservations/slot strategy (or on-demand posture), tune concurrency, and implement guardrails to prevent scan blowups from new queries.

  6. 06

    Add regression gates

    Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.

Workload Assessment
Make BigQuery performance predictable after migration

We identify your highest-cost queries and pipelines, tune pruning and layout, and deliver before/after evidence with regression thresholds-so optimization sticks and costs stay stable.

Optimization Program
Prevent scan blowups with regression gates

Get an optimization backlog, tuned table layouts, and performance gates (runtime/bytes/slot thresholds) so future model changes don’t reintroduce slow refreshes or high spend.

FAQ

Frequently asked questions

Why did queries get slower after moving to BigQuery?+
Most often because partitioning/clustering wasn’t aligned to filters, and Snowflake-era query shapes don’t maximize pruning in BigQuery. We tune layout and rewrite queries to reduce scan bytes and reshuffles.
How do you keep optimization from changing results?+
We gate tuning with correctness checks: golden queries, KPI aggregates, and checksum-style diffs. Optimizations only ship when outputs remain within agreed tolerances.
Can you optimize MERGE/upsert pipelines too?+
Yes. We scope MERGEs to affected partitions, use staging boundaries, and design apply windows so you avoid full-target scans and unpredictable runtime.
Do you cover reservations and concurrency planning?+
Yes. We recommend a capacity posture (on-demand vs reservations), concurrency controls for BI refresh spikes, and monitoring/guardrails so performance stays stable as usage grows.