Workload

Validation & reconciliation for Teradata -> BigQuery

Turn "it compiles" into a measurable parity contract. We prove correctness for batch and incremental systems with golden queries, KPI diffs, and replayable integrity simulations-then gate cutover with rollback-ready criteria.

Book assessment See migration approach

Back to pair page

At a glance

Input: Teradata Validation & reconciliation logic
Output: BigQuery equivalent (validated)
Common pitfalls: Spot-check validation: a few samples miss drift in ties and edge cohorts.
No tolerance model: teams argue about “small diffs” because thresholds were never defined.
Wrong comparison level: comparing raw rows when the business cares about rollups (or vice versa).

Context

Why this breaks

Teradata migrations fail late when teams validate only syntax and a handful of spot checks. Teradata estates encode decades of implicit behavior: PI/AMP-era locality assumptions, volatile-table workflows, and “good enough” ordering in top-N and window logic. BigQuery will execute translated workloads-but drift appears when correctness rules were never written down or stress-tested.

Common drift drivers in Teradata -> BigQuery:

Windowing and TOP ambiguity: QUALIFY/top-N logic without deterministic tie-breakers
Type and NULL semantics: CASE/COALESCE branches and join keys cast differently
Date/time edge cases: boundary days, truncation, and DATE vs TIMESTAMP intent
Intermediate/staging behavior: volatile-table patterns replaced incorrectly, changing apply boundaries
Incremental behavior: reruns, restarts, and backfill windows weren’t validated as first-class scenarios

Validation must treat the system as operational and incremental, not a one-time batch.

Approach

How conversion works

Define the parity contract: what must match (tables, KPIs, dashboards), at what granularity, and with what tolerances.
Build validation datasets: golden inputs, edge cohorts (ties, null-heavy segments), and representative windows (including boundary dates).
Run readiness + execution gates: schema/type alignment, dependency readiness, compile/run reliability.
Run layered parity gates: counts/profiles -> KPI diffs -> targeted row-level diffs for flagged cohorts.
Validate incremental integrity where applicable: reruns, restart simulations, backfills, and late-arrival injections.
Gate cutover: pass/fail thresholds, canary strategy, rollback triggers, and post-cutover monitors.

Supported constructs

Representative validation and reconciliation mechanisms we apply in Teradata -> BigQuery migrations.

Source	Target	Notes
Golden queries and reports	Golden query harness + repeatable parameter sets	Codifies business sign-off into runnable tests.
Counts and column profiles	Partition-level counts + null/min/max/distinct profiles	Cheap early drift detection before deep diffs.
KPI validation	Aggregate diffs by key dimensions + tolerance thresholds	Aligns validation with business meaning.
Row-level diffs	Targeted sampling diffs + edge cohort tests	Use deep diffs only where aggregates signal drift.
ETL restartability	Reruns, restart simulations, and backfill windows	Validates control-table semantics and idempotency.
Operational sign-off	Canary gates + rollback criteria + monitors	Prevents cutover from becoming a long KPI debate.

How workload changes

Topic	Teradata	BigQuery
Drift drivers	PI/AMP-era query idioms and volatile-table workflows hide assumptions	Explicit ordering, casting, and apply boundaries required
Cost of validation	Resource governance and workload classes	Bytes scanned + slot time
Incremental behavior	Restartability conventions and control tables common	Rerun/backfill behavior must be simulated explicitly

Drift drivers: Validation focuses on making assumptions testable.

Cost of validation: Use layered gates to keep validation economical.

Incremental behavior: Integrity simulations are mandatory gates.

Examples

Illustrative parity and integrity checks in BigQuery. Replace datasets, keys, and KPI definitions to match your migration.

-- Partition/window row counts (BigQuery)
SELECT
  DATE(txn_ts) AS d,
  COUNT(*) AS rows
FROM `proj.mart.fact_sales`
WHERE DATE(txn_ts) BETWEEN @start_date AND @end_date
GROUP BY 1
ORDER BY 1;

Avoid

Common pitfalls

Spot-check validation: a few samples miss drift in ties and edge cohorts.
No tolerance model: teams argue about “small diffs” because thresholds were never defined.
Wrong comparison level: comparing raw rows when the business cares about rollups (or vice versa).
Ignoring incremental behavior: parity looks fine on a static snapshot but fails under reruns/backfills.
Unstable ordering: top-N/window functions without complete ORDER BY.
Cost-blind diffs: exhaustive row-level diffs can be expensive; use layered gates (cheap->deep).

Proof

Validation approach

Gate set (layered)

Gate 0 - Readiness

BigQuery datasets, permissions, and target schemas exist
Dependent assets deployed (UDFs/routines, reference data, control tables)

Gate 1 - Execution

Converted workloads compile and run reliably
Deterministic ordering + explicit casts enforced for high-risk patterns

Gate 2 - Structural parity

Row counts by partitions/windows
Null/min/max/distinct profiles for key columns

Gate 3 - KPI parity

KPI aggregates by key dimensions
Rankings and top-N validated on edge windows (ties, boundary dates)

Gate 4 - Integrity (incremental systems)

Idempotency: rerun same window -> no net change
Restart simulation: fail mid-run -> resume -> correct final state
Backfill: historical windows replay without drift
Late-arrival: inject late corrections -> only expected rows change

Gate 5 - Cutover & monitoring

Canary criteria + rollback triggers
Post-cutover monitors: latency, scan bytes/slot time, diff sentinels on critical KPIs

Execution

Migration steps

A practical sequence for making validation fast, repeatable, and dispute-proof.

01
Define the parity contract
Decide what must match (tables, dashboards, KPIs), at what granularity, and with what tolerance thresholds. Identify golden queries and business sign-off owners.
02
Create validation datasets and edge cohorts
Select representative time windows and cohorts that trigger edge behavior (ties/top-N, null-heavy segments, boundary dates, skewed keys).
03
Implement layered gates
Start with cheap checks (counts/profiles), then KPI diffs, then deep diffs only where needed. Codify gates into runnable jobs so validation is repeatable.
04
Validate incremental integrity
Run idempotency reruns, restart simulations, backfill windows, and late-arrival injections. These are the scenarios that usually break if not tested.
05
Gate cutover and monitor
Establish canary/rollback criteria and post-cutover monitors for critical KPIs and pipeline health (latency, failures, scan bytes/slot time).

Workload Assessment

Make parity measurable before you cut over

We define your parity contract, build the golden-query set, and implement layered reconciliation gates-including reruns, restarts, and backfill simulations-so KPI drift is caught before production cutover.

Book assessment See migration approach

Cutover Readiness

Gate cutover with evidence and rollback criteria

Get a validation plan, runnable gates, and sign-off artifacts (diff reports, thresholds, monitors) so Teradata->BigQuery cutover is controlled and dispute-proof.

Book assessment Explore workloads

FAQ

Frequently asked questions

Do we need row-level diffs for everything?+

Usually no. A layered approach is faster and cheaper: start with counts/profiles and KPI aggregates, then do targeted row-level diffs only where aggregates signal drift or for critical entities.

How do you handle small KPI differences?+

We define tolerance thresholds up front (exact vs %/absolute). If differences exceed thresholds, we trace back using dimensional rollups and targeted sampling to isolate the drift source.

What if the system has restartability and backfills?+

Then validation must include simulations: reruns, restart scenarios, backfill windows, and late-arrival injections. These gates prove the migrated system behaves correctly under operational stress.

How does validation tie into cutover?+

We convert gates into cutover criteria: pass/fail thresholds, canary rollout, rollback triggers, and post-cutover monitors. Cutover becomes an evidence-based decision, not a debate.

Teradata to BigQuery migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

SQL / query migration

Workload

Convert Teradata SQL to BigQuery with semantic parity and golden-query validation.

View page

ETL / pipeline migration

Workload

Migrate Teradata ETL with restartability and incremental integrity gates.

View page