Workload

Teradata ETL pipelines to BigQuery

Re-home batch and incremental ETL-staging, dedupe, SCD patterns, and orchestration-from Teradata into BigQuery with an explicit run contract and validation gates that prevent KPI drift.

Book assessment See migration approach

Back to pair page

At a glance

Input: Teradata ETL / pipeline migration logic
Output: BigQuery equivalent (validated)
Common pitfalls: Assuming Teradata load tools map 1:1: FastLoad/TPump patterns need BigQuery-native staging and apply strategy.
Implicit restart logic: control-table conventions not recreated, causing reruns to double-apply or skip.
Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers, leading to drift under retries.

Context

Why this breaks

Teradata ETL systems often carry decades of operational behavior: restartability patterns, control tables, load windows, and incremental rules shaped by Teradata’s PI/AMP execution model and tooling (BTEQ, FastLoad/TPump, MultiLoad, TPT). When migrated naïvely, jobs may “run” in BigQuery but produce drift or miss SLAs because the run contract was never made explicit.

Common symptoms after cutover:

Duplicates or missing updates because watermarks and tie-breakers were implicit
SCD dimensions drift during backfills, restarts, or late-arrival corrections
Control-table driven restart logic breaks; reruns double-apply or silently skip
Performance collapses because pipelines lose pruning-aware staging boundaries
Orchestration dependencies and retries change, turning failures into silent data issues

Approach

How conversion works

Inventory & classify ETL jobs, dependencies, schedules, and operational controls (control tables, restart markers, load windows).
Extract the run contract: business keys, incremental boundaries, dedupe rules, SCD strategy, error handling, and restartability semantics.
Re-home transformations using BigQuery-native staging (landing -> typed staging -> dedupe -> apply) with partitioning/clustering aligned to load windows.
Rebuild restartability: idempotency keys, applied-batch tracking, and deterministic ordering so retries are safe.
Re-home orchestration (Composer/Airflow, dbt, or your runner) with explicit DAG dependencies, retries, and alerts.
Gate cutover with evidence: golden outputs + incremental integrity simulations (reruns, backfills, late-arrival injections) and rollback-ready criteria.

Supported constructs

Representative Teradata ETL constructs we commonly migrate to BigQuery (exact coverage depends on your estate).

Source	Target	Notes
BTEQ / SQL scripts with control tables	BigQuery SQL + control tables for restartability	Applied-window tracking and idempotent reruns made explicit.
FastLoad/TPump/MultiLoad patterns	Landing + typed staging + partition-scoped apply	Replace load-tool semantics with deterministic staging boundaries.
Incremental loads (watermarks, load windows)	Explicit high-water mark + late window policy	Auditable and testable under retries/backfills.
SCD Type-1 / Type-2 apply logic	MERGE + end-date/current-flag patterns	Backfills and restarts validated as first-class scenarios.
Staging/volatile tables	Dataset staging + temp tables / materializations	Concurrency-safe and pruning-aligned.
Workload management constraints	Reservations/slots + concurrency controls	Predictable batch windows and BI refresh coexistence.

How workload changes

Topic	Teradata	BigQuery
Incremental correctness	Often encoded in control tables, load windows, and restart conventions	Explicit high-water marks + deterministic staging + integrity gates
Staging and apply	Load tools + volatile tables + PI/AMP locality	Landing -> typed staging -> dedupe -> partition-scoped apply
Orchestration	Enterprise schedulers + script chains	Composer/dbt/native orchestration with explicit DAG contracts
Cost and capacity	Platform resource governance + workload classes	Bytes scanned + slot time + reservation strategy

Incremental correctness: Correctness becomes auditable and repeatable under retries/backfills.

Staging and apply: Pruning and layout drive stable cost and runtime.

Orchestration: Retries, alerts, and dependencies are modeled and monitored.

Cost and capacity: Batch windows require pruning-aware design and capacity posture.

Examples

Canonical BigQuery apply pattern for incremental ETL: stage -> dedupe deterministically -> MERGE with scoped partitions + applied-window tracking. Adjust keys, partitions, and casts to your model.

-- Control table for load windows (restartability)
CREATE TABLE IF NOT EXISTS `proj.control.load_windows` (
  job_name STRING NOT NULL,
  window_start TIMESTAMP NOT NULL,
  window_end TIMESTAMP NOT NULL,
  status STRING NOT NULL,
  applied_at TIMESTAMP
);

-- Example: mark a window as STARTED
INSERT INTO `proj.control.load_windows` (job_name, window_start, window_end, status)
VALUES (@job_name, @window_start, @window_end, 'STARTED');

Avoid

Common pitfalls

Assuming Teradata load tools map 1:1: FastLoad/TPump patterns need BigQuery-native staging and apply strategy.
Implicit restart logic: control-table conventions not recreated, causing reruns to double-apply or skip.
Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers, leading to drift under retries.
PI/AMP assumptions carried over: join and staging shapes that relied on locality now reshuffle large datasets.
Unbounded applies: MERGE/upsert jobs scan full targets without partition-scoped boundaries.
Schema evolution surprises: upstream types widen; downstream typed tables fail or truncate.
No incremental integrity tests: parity looks fine on a one-time run but breaks under backfills/late data.

Proof

Validation approach

Execution checks: pipeline runs reliably under representative volumes and schedules.
Structural parity: partition-level row counts and column profiles (null/min/max/distinct) for key tables.
KPI parity: aggregates by key dimensions for critical marts and dashboards.
Incremental integrity (mandatory):
- Idempotency: rerun same load window -> no net change
- Restart simulation: fail mid-run -> resume -> correct final state
- Backfill: historical windows replay without SCD drift
- Late-arrival: inject late corrections -> only expected rows change
Cost/performance gates: pruning verified; scan bytes and runtime thresholds set for top jobs.
Operational readiness: retry/alerting tests and rollback criteria defined before cutover.

Execution

Migration steps

A sequence that keeps pipeline correctness measurable and cutover controlled.

01
Inventory ETL jobs, schedules, and dependencies
Extract job chains, upstream/downstream dependencies, SLAs, retry policies, and control-table conventions. Identify business-critical marts and consumers.
02
Formalize the run contract
Define load windows/high-water marks, business keys, deterministic ordering/tie-breakers, dedupe rules, late-arrival policy, and SCD strategy. Make restartability explicit.
03
Rebuild transformations on BigQuery-native staging
Implement landing -> typed staging -> dedupe -> apply, with partitioning/clustering aligned to load windows and BI access paths. Define schema evolution policy (widen/quarantine/reject).
04
Re-home orchestration and operations
Implement DAGs in Composer/Airflow or your orchestrator: dependencies, retries, alerts, and concurrency. Recreate control-table driven restart logic with idempotent markers.
05
Run parity and incremental integrity gates
Golden outputs + KPI aggregates, restart simulations, idempotency reruns, late-data injections, and backfill windows. Cut over only when thresholds pass and rollback criteria are defined.

Workload Assessment

Migrate Teradata ETL with restartability intact

We inventory your Teradata ETL estate, formalize the run contract (windows, restartability, SCD), migrate a representative pipeline end-to-end, and deliver parity evidence with cutover gates.

Book assessment See migration approach

Migration Acceleration

Cut over pipelines with proof-backed gates

Get an actionable migration plan with restart simulations, incremental integrity tests, reconciliation evidence, and cost/performance baselines-so ETL cutover is controlled and dispute-proof.

Book assessment Explore workloads

FAQ

Frequently asked questions

Is Teradata ETL migration just rewriting BTEQ scripts?+

No. The critical work is preserving the run contract: load windows, restartability, dedupe/tie-break rules, SCD semantics, and operational error handling. Script translation is only one component.

How do you handle restartability and reruns?+

We recreate restart semantics explicitly: applied-window tracking, deterministic ordering, idempotent staging/apply, and failure-mode simulations so reruns never double-apply or silently skip.

Can you keep pipelines incremental in BigQuery?+

Yes. We implement explicit high-water marks and late-arrival windows, use partition-scoped applies, and validate incremental integrity with reruns/backfills/late injections.

How do you avoid BigQuery cost surprises for ETL?+

We design pruning-aware staging boundaries and choose partitioning/clustering aligned to load windows. Validation includes scan bytes/runtime baselines and regression thresholds for your top jobs.

Teradata to BigQuery migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

Performance tuning & optimization

Workload

Replace PI/AMP-era tuning with pruning-aware SQL, layout, and regression gates in BigQuery.

View page

Validation & reconciliation

Method

How we define parity contracts, thresholds, and evidence for sign-off.

Learn more