Workload

Teradata ETL pipelines to BigQuery

Re-home batch and incremental ETL-staging, dedupe, SCD patterns, and orchestration-from Teradata into BigQuery with an explicit run contract and validation gates that prevent KPI drift.

At a glance
Input
Teradata ETL / pipeline migration logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • Assuming Teradata load tools map 1:1: FastLoad/TPump patterns need BigQuery-native staging and apply strategy.
  • Implicit restart logic: control-table conventions not recreated, causing reruns to double-apply or skip.
  • Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers, leading to drift under retries.
Context

Why this breaks

Teradata ETL systems often carry decades of operational behavior: restartability patterns, control tables, load windows, and incremental rules shaped by Teradata’s PI/AMP execution model and tooling (BTEQ, FastLoad/TPump, MultiLoad, TPT). When migrated naïvely, jobs may “run” in BigQuery but produce drift or miss SLAs because the run contract was never made explicit.

Common symptoms after cutover:

  • Duplicates or missing updates because watermarks and tie-breakers were implicit
  • SCD dimensions drift during backfills, restarts, or late-arrival corrections
  • Control-table driven restart logic breaks; reruns double-apply or silently skip
  • Performance collapses because pipelines lose pruning-aware staging boundaries
  • Orchestration dependencies and retries change, turning failures into silent data issues
Approach

How conversion works

  1. Inventory & classify ETL jobs, dependencies, schedules, and operational controls (control tables, restart markers, load windows).
  2. Extract the run contract: business keys, incremental boundaries, dedupe rules, SCD strategy, error handling, and restartability semantics.
  3. Re-home transformations using BigQuery-native staging (landing -> typed staging -> dedupe -> apply) with partitioning/clustering aligned to load windows.
  4. Rebuild restartability: idempotency keys, applied-batch tracking, and deterministic ordering so retries are safe.
  5. Re-home orchestration (Composer/Airflow, dbt, or your runner) with explicit DAG dependencies, retries, and alerts.
  6. Gate cutover with evidence: golden outputs + incremental integrity simulations (reruns, backfills, late-arrival injections) and rollback-ready criteria.

Supported constructs

Representative Teradata ETL constructs we commonly migrate to BigQuery (exact coverage depends on your estate).

SourceTargetNotes
BTEQ / SQL scripts with control tablesBigQuery SQL + control tables for restartabilityApplied-window tracking and idempotent reruns made explicit.
FastLoad/TPump/MultiLoad patternsLanding + typed staging + partition-scoped applyReplace load-tool semantics with deterministic staging boundaries.
Incremental loads (watermarks, load windows)Explicit high-water mark + late window policyAuditable and testable under retries/backfills.
SCD Type-1 / Type-2 apply logicMERGE + end-date/current-flag patternsBackfills and restarts validated as first-class scenarios.
Staging/volatile tablesDataset staging + temp tables / materializationsConcurrency-safe and pruning-aligned.
Workload management constraintsReservations/slots + concurrency controlsPredictable batch windows and BI refresh coexistence.

How workload changes

TopicTeradataBigQuery
Incremental correctnessOften encoded in control tables, load windows, and restart conventionsExplicit high-water marks + deterministic staging + integrity gates
Staging and applyLoad tools + volatile tables + PI/AMP localityLanding -> typed staging -> dedupe -> partition-scoped apply
OrchestrationEnterprise schedulers + script chainsComposer/dbt/native orchestration with explicit DAG contracts
Cost and capacityPlatform resource governance + workload classesBytes scanned + slot time + reservation strategy
Incremental correctness: Correctness becomes auditable and repeatable under retries/backfills.
Staging and apply: Pruning and layout drive stable cost and runtime.
Orchestration: Retries, alerts, and dependencies are modeled and monitored.
Cost and capacity: Batch windows require pruning-aware design and capacity posture.

Examples

Canonical BigQuery apply pattern for incremental ETL: stage -> dedupe deterministically -> MERGE with scoped partitions + applied-window tracking. Adjust keys, partitions, and casts to your model.

-- Control table for load windows (restartability)
CREATE TABLE IF NOT EXISTS `proj.control.load_windows` (
  job_name STRING NOT NULL,
  window_start TIMESTAMP NOT NULL,
  window_end TIMESTAMP NOT NULL,
  status STRING NOT NULL,
  applied_at TIMESTAMP
);

-- Example: mark a window as STARTED
INSERT INTO `proj.control.load_windows` (job_name, window_start, window_end, status)
VALUES (@job_name, @window_start, @window_end, 'STARTED');
Avoid

Common pitfalls

  • Assuming Teradata load tools map 1:1: FastLoad/TPump patterns need BigQuery-native staging and apply strategy.
  • Implicit restart logic: control-table conventions not recreated, causing reruns to double-apply or skip.
  • Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers, leading to drift under retries.
  • PI/AMP assumptions carried over: join and staging shapes that relied on locality now reshuffle large datasets.
  • Unbounded applies: MERGE/upsert jobs scan full targets without partition-scoped boundaries.
  • Schema evolution surprises: upstream types widen; downstream typed tables fail or truncate.
  • No incremental integrity tests: parity looks fine on a one-time run but breaks under backfills/late data.
Proof

Validation approach

  • Execution checks: pipeline runs reliably under representative volumes and schedules.
  • Structural parity: partition-level row counts and column profiles (null/min/max/distinct) for key tables.
  • KPI parity: aggregates by key dimensions for critical marts and dashboards.
  • Incremental integrity (mandatory):
    • Idempotency: rerun same load window -> no net change
    • Restart simulation: fail mid-run -> resume -> correct final state
    • Backfill: historical windows replay without SCD drift
    • Late-arrival: inject late corrections -> only expected rows change
  • Cost/performance gates: pruning verified; scan bytes and runtime thresholds set for top jobs.
  • Operational readiness: retry/alerting tests and rollback criteria defined before cutover.
Execution

Migration steps

A sequence that keeps pipeline correctness measurable and cutover controlled.
  1. 01

    Inventory ETL jobs, schedules, and dependencies

    Extract job chains, upstream/downstream dependencies, SLAs, retry policies, and control-table conventions. Identify business-critical marts and consumers.

  2. 02

    Formalize the run contract

    Define load windows/high-water marks, business keys, deterministic ordering/tie-breakers, dedupe rules, late-arrival policy, and SCD strategy. Make restartability explicit.

  3. 03

    Rebuild transformations on BigQuery-native staging

    Implement landing -> typed staging -> dedupe -> apply, with partitioning/clustering aligned to load windows and BI access paths. Define schema evolution policy (widen/quarantine/reject).

  4. 04

    Re-home orchestration and operations

    Implement DAGs in Composer/Airflow or your orchestrator: dependencies, retries, alerts, and concurrency. Recreate control-table driven restart logic with idempotent markers.

  5. 05

    Run parity and incremental integrity gates

    Golden outputs + KPI aggregates, restart simulations, idempotency reruns, late-data injections, and backfill windows. Cut over only when thresholds pass and rollback criteria are defined.

Workload Assessment
Migrate Teradata ETL with restartability intact

We inventory your Teradata ETL estate, formalize the run contract (windows, restartability, SCD), migrate a representative pipeline end-to-end, and deliver parity evidence with cutover gates.

Migration Acceleration
Cut over pipelines with proof-backed gates

Get an actionable migration plan with restart simulations, incremental integrity tests, reconciliation evidence, and cost/performance baselines-so ETL cutover is controlled and dispute-proof.

FAQ

Frequently asked questions

Is Teradata ETL migration just rewriting BTEQ scripts?+
No. The critical work is preserving the run contract: load windows, restartability, dedupe/tie-break rules, SCD semantics, and operational error handling. Script translation is only one component.
How do you handle restartability and reruns?+
We recreate restart semantics explicitly: applied-window tracking, deterministic ordering, idempotent staging/apply, and failure-mode simulations so reruns never double-apply or silently skip.
Can you keep pipelines incremental in BigQuery?+
Yes. We implement explicit high-water marks and late-arrival windows, use partition-scoped applies, and validate incremental integrity with reruns/backfills/late injections.
How do you avoid BigQuery cost surprises for ETL?+
We design pruning-aware staging boundaries and choose partitioning/clustering aligned to load windows. Validation includes scan bytes/runtime baselines and regression thresholds for your top jobs.