Validation & reconciliation for Teradata -> BigQuery
Turn "it compiles" into a measurable parity contract. We prove correctness for batch and incremental systems with golden queries, KPI diffs, and replayable integrity simulations-then gate cutover with rollback-ready criteria.
- Input
- Teradata Validation & reconciliation logic
- Output
- BigQuery equivalent (validated)
- Common pitfalls
- Spot-check validation: a few samples miss drift in ties and edge cohorts.
- No tolerance model: teams argue about “small diffs” because thresholds were never defined.
- Wrong comparison level: comparing raw rows when the business cares about rollups (or vice versa).
Why this breaks
Teradata migrations fail late when teams validate only syntax and a handful of spot checks. Teradata estates encode decades of implicit behavior: PI/AMP-era locality assumptions, volatile-table workflows, and “good enough” ordering in top-N and window logic. BigQuery will execute translated workloads-but drift appears when correctness rules were never written down or stress-tested.
Common drift drivers in Teradata -> BigQuery:
- Windowing and TOP ambiguity: QUALIFY/top-N logic without deterministic tie-breakers
- Type and NULL semantics: CASE/COALESCE branches and join keys cast differently
- Date/time edge cases: boundary days, truncation, and DATE vs TIMESTAMP intent
- Intermediate/staging behavior: volatile-table patterns replaced incorrectly, changing apply boundaries
- Incremental behavior: reruns, restarts, and backfill windows weren’t validated as first-class scenarios
Validation must treat the system as operational and incremental, not a one-time batch.
How conversion works
- Define the parity contract: what must match (tables, KPIs, dashboards), at what granularity, and with what tolerances.
- Build validation datasets: golden inputs, edge cohorts (ties, null-heavy segments), and representative windows (including boundary dates).
- Run readiness + execution gates: schema/type alignment, dependency readiness, compile/run reliability.
- Run layered parity gates: counts/profiles -> KPI diffs -> targeted row-level diffs for flagged cohorts.
- Validate incremental integrity where applicable: reruns, restart simulations, backfills, and late-arrival injections.
- Gate cutover: pass/fail thresholds, canary strategy, rollback triggers, and post-cutover monitors.
Supported constructs
Representative validation and reconciliation mechanisms we apply in Teradata -> BigQuery migrations.
| Source | Target | Notes |
|---|---|---|
| Golden queries and reports | Golden query harness + repeatable parameter sets | Codifies business sign-off into runnable tests. |
| Counts and column profiles | Partition-level counts + null/min/max/distinct profiles | Cheap early drift detection before deep diffs. |
| KPI validation | Aggregate diffs by key dimensions + tolerance thresholds | Aligns validation with business meaning. |
| Row-level diffs | Targeted sampling diffs + edge cohort tests | Use deep diffs only where aggregates signal drift. |
| ETL restartability | Reruns, restart simulations, and backfill windows | Validates control-table semantics and idempotency. |
| Operational sign-off | Canary gates + rollback criteria + monitors | Prevents cutover from becoming a long KPI debate. |
How workload changes
| Topic | Teradata | BigQuery |
|---|---|---|
| Drift drivers | PI/AMP-era query idioms and volatile-table workflows hide assumptions | Explicit ordering, casting, and apply boundaries required |
| Cost of validation | Resource governance and workload classes | Bytes scanned + slot time |
| Incremental behavior | Restartability conventions and control tables common | Rerun/backfill behavior must be simulated explicitly |
Examples
Illustrative parity and integrity checks in BigQuery. Replace datasets, keys, and KPI definitions to match your migration.
-- Partition/window row counts (BigQuery)
SELECT
DATE(txn_ts) AS d,
COUNT(*) AS rows
FROM `proj.mart.fact_sales`
WHERE DATE(txn_ts) BETWEEN @start_date AND @end_date
GROUP BY 1
ORDER BY 1;Common pitfalls
- Spot-check validation: a few samples miss drift in ties and edge cohorts.
- No tolerance model: teams argue about “small diffs” because thresholds were never defined.
- Wrong comparison level: comparing raw rows when the business cares about rollups (or vice versa).
- Ignoring incremental behavior: parity looks fine on a static snapshot but fails under reruns/backfills.
- Unstable ordering: top-N/window functions without complete ORDER BY.
- Cost-blind diffs: exhaustive row-level diffs can be expensive; use layered gates (cheap->deep).
Validation approach
Gate set (layered)
Gate 0 - Readiness
- BigQuery datasets, permissions, and target schemas exist
- Dependent assets deployed (UDFs/routines, reference data, control tables)
Gate 1 - Execution
- Converted workloads compile and run reliably
- Deterministic ordering + explicit casts enforced for high-risk patterns
Gate 2 - Structural parity
- Row counts by partitions/windows
- Null/min/max/distinct profiles for key columns
Gate 3 - KPI parity
- KPI aggregates by key dimensions
- Rankings and top-N validated on edge windows (ties, boundary dates)
Gate 4 - Integrity (incremental systems)
- Idempotency: rerun same window -> no net change
- Restart simulation: fail mid-run -> resume -> correct final state
- Backfill: historical windows replay without drift
- Late-arrival: inject late corrections -> only expected rows change
Gate 5 - Cutover & monitoring
- Canary criteria + rollback triggers
- Post-cutover monitors: latency, scan bytes/slot time, diff sentinels on critical KPIs
Migration steps
- 01
Define the parity contract
Decide what must match (tables, dashboards, KPIs), at what granularity, and with what tolerance thresholds. Identify golden queries and business sign-off owners.
- 02
Create validation datasets and edge cohorts
Select representative time windows and cohorts that trigger edge behavior (ties/top-N, null-heavy segments, boundary dates, skewed keys).
- 03
Implement layered gates
Start with cheap checks (counts/profiles), then KPI diffs, then deep diffs only where needed. Codify gates into runnable jobs so validation is repeatable.
- 04
Validate incremental integrity
Run idempotency reruns, restart simulations, backfill windows, and late-arrival injections. These are the scenarios that usually break if not tested.
- 05
Gate cutover and monitor
Establish canary/rollback criteria and post-cutover monitors for critical KPIs and pipeline health (latency, failures, scan bytes/slot time).
We define your parity contract, build the golden-query set, and implement layered reconciliation gates-including reruns, restarts, and backfill simulations-so KPI drift is caught before production cutover.
Get a validation plan, runnable gates, and sign-off artifacts (diff reports, thresholds, monitors) so Teradata->BigQuery cutover is controlled and dispute-proof.