Workload

Performance tuning & optimization for Teradata -> BigQuery

Teradata tuning assumptions (PI/AMP distribution, stats, spool) don’t carry over. We tune BigQuery queries, table layout, and capacity so dashboards hit SLAs and scan costs stay predictable as data grows.

At a glance
Input
Teradata Performance tuning & optimization logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • PI/AMP thinking carried over: assuming the same distribution/locality behaviors exist in BigQuery.
  • No pruning strategy: partitioning is missing or not aligned to common filters, causing full scans.
  • Clustering by folklore: clustering keys chosen without evidence from predicates and join keys.
Context

Why this breaks

Teradata performance tuning is often encoded in physical design and optimizer expectations: Primary Index choices, AMP-local joins, collected stats, spool behavior, and workload management rules. When you migrate the schema “as-is” to BigQuery, queries may still run-but the execution model is different, so performance can collapse or costs can spike.

Common post-cutover symptoms:

  • Teradata-era “physical tuning” treated as schema; BigQuery queries scan too much
  • Joins that were AMP-local now reshuffle large datasets; runtimes blow up
  • Reused volatile tables / intermediate steps become expensive repeated scans
  • Heavy BI queries miss SLAs due to concurrency spikes and slot contention
  • Batch windows slip because incremental jobs aren’t pruning-aware

Optimization is how you replace Teradata’s PI/AMP playbook with a BigQuery-native, evidence-driven performance posture.

Approach

How conversion works

  1. Baseline the top workloads: identify the most expensive and most business-critical queries (BI dashboards, marts, batch transforms).
  2. Map Teradata tuning assumptions: where PI/AMP locality, stats, spool, and volatile tables were doing the work.
  3. Tune table layout in BigQuery: partitioning + clustering aligned to real access paths (filters + join keys).
  4. Rewrite SQL for pruning and reduced reshuffles: predicate pushdown-friendly filters, join strategy adjustments, pre-aggregation, and incremental scoping.
  5. Materialize strategically: precomputed aggregates / incremental snapshots where BI patterns repeatedly scan large facts.
  6. Capacity & governance: reservations/on-demand posture, concurrency controls for BI refresh spikes, and cost guardrails.
  7. Regression gates: baselines + thresholds so improvements persist and new releases don’t reintroduce scan blowups.

Supported constructs

Representative tuning levers we apply for Teradata -> BigQuery workloads.

SourceTargetNotes
Teradata PI/AMP locality assumptionsBigQuery partitioning + clusteringReplace physical-locality tuning with pruning and layout aligned to access paths.
Collected stats + optimizer hintsQuery rewrites + layout choices validated by job metricsUse evidence (bytes/slot time) instead of stats-driven expectations.
Volatile/intermediate tablesStrategic materializations + incremental stagingAvoid repeated scans; precompute where BI patterns demand it.
Large fact-table joinsPruning-aware join patterns + pre-aggregationReduce reshuffles and stabilize runtime under concurrency.
Workload management / query classesReservations/slots + concurrency policiesPredictable performance for peak BI refresh and batch windows.
Spool-sensitive query patternsGovernance: guardrails + cost controlsPrevent runaway scans and long-tail cost/perf regressions.

How workload changes

TopicTeradataBigQuery
Primary tuning leverPI/AMP distribution + collected statsPartitioning/clustering + pruning-aware SQL
Cost driverSystem resources / workload class limitsBytes scanned + slot time
Intermediate resultsVolatile tables and spool behavior commonMaterialize selectively; minimize repeated scans
Concurrency planningWorkload management rules and queuesReservations/slots + concurrency policies
Primary tuning lever: BigQuery tuning is dominated by scan reduction and layout-to-filter alignment.
Cost driver: Performance work is inseparable from cost governance.
Intermediate results: Strategic materialization often replaces volatile-table patterns.
Concurrency planning: Peak BI refresh requires explicit capacity posture.

Examples

Illustrative BigQuery optimization patterns after Teradata migration: enforce pruning, pre-aggregate, and set regression gates. Replace datasets and fields to match your environment.

-- Pruning-first query shape (fact table partitioned by DATE(txn_ts))
SELECT
  store_id,
  SUM(net_sales) AS net_sales
FROM `proj.mart.fact_sales`
WHERE DATE(txn_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;
Avoid

Common pitfalls

  • PI/AMP thinking carried over: assuming the same distribution/locality behaviors exist in BigQuery.
  • No pruning strategy: partitioning is missing or not aligned to common filters, causing full scans.
  • Clustering by folklore: clustering keys chosen without evidence from predicates and join keys.
  • Volatile-table reliance: Teradata intermediate tables become expensive repeated scans without materialization strategy.
  • Skew blindness: joins on highly-skewed keys cause disproportionate shuffles and slowdowns.
  • Concurrency surprises: BI refresh peaks overwhelm slots/reservations and create tail latency.
  • No regression gates: performance fixes disappear after the next model change.
Proof

Validation approach

  • Baseline capture: for each top query/pipeline, record runtime, bytes scanned, slot time, and output row counts.
  • Pruning checks: confirm partition pruning and predicate pushdown on representative parameters and common BI filters.
  • Before/after evidence: demonstrate improvements in runtime and scan bytes; document any tradeoffs.
  • Correctness guardrails: golden queries and KPI aggregates ensure tuning doesn’t change semantics.
  • Regression thresholds: define alert thresholds (e.g., +25% bytes scanned or +30% runtime) and enforce via CI or scheduled checks.
  • Operational monitors: post-tuning dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.
Execution

Migration steps

A sequence that improves performance while protecting semantics.
  1. 01

    Identify top cost and SLA drivers

    Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.

  2. 02

    Translate Teradata tuning assumptions

    Document where PI/AMP locality, stats collection, and volatile tables were doing performance work. Decide the BigQuery-native replacement: pruning, layout, materialization, or rewrites.

  3. 03

    Tune layout: partitioning and clustering

    Align partitions to time windows and common BI filters. Choose clustering keys based on observed predicates and join keys to reduce scan and reshuffle.

  4. 04

    Rewrite for pruning and reduced reshuffles

    Apply pruning-aware filters, reduce cross joins and broad reshuffles, and pre-aggregate where BI patterns repeatedly scan large facts.

  5. 05

    Capacity posture and governance

    Set reservations/on-demand posture, tune concurrency for BI refresh peaks, and implement guardrails to prevent scan blowups from new queries.

  6. 06

    Add regression gates

    Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.

Workload Assessment
Replace PI/AMP tuning with BigQuery-native performance

We identify your highest-cost Teradata-migrated queries, tune pruning and table layout, and deliver before/after evidence with regression thresholds-so performance improves and stays stable.

Optimization Program
Prevent scan blowups with regression gates

Get an optimization backlog, tuned partitioning/clustering, and performance gates (runtime/bytes/slot thresholds) so future releases don’t reintroduce slow dashboards or high spend.

FAQ

Frequently asked questions

Why did performance degrade after migrating Teradata workloads to BigQuery?+
Teradata optimizations (PI/AMP locality, stats, spool, volatile tables) don’t translate directly. BigQuery performance depends on partition pruning, clustering, and pruning-aware SQL shapes-so we tune layout and rewrite queries accordingly.
Do we need to redesign tables, or just rewrite queries?+
Usually both. Table layout (partitioning/clustering) must align to real filters and join keys, and queries often need pruning-aware rewrites or selective materialization to meet SLAs without cost spikes.
How do you prevent optimization from changing results?+
We gate tuning with correctness checks: golden queries, KPI aggregates, and checksum-style diffs. Optimizations only ship when outputs remain within agreed tolerances.
Can you optimize concurrency and batch windows too?+
Yes. We plan capacity (on-demand vs reservations), tune concurrency for BI refresh peaks, and add monitoring and guardrails so performance stays stable as usage grows.