[NL]
Region: europe-west4 [v2.1.0] | status: online!
// projects / retail_platform.tf

Retail Assortment & Commerce Data Platform

type: end-to-end platform · status: production · year: 2024
TL;DR
A retailer needed one reliable foundation for commerce and assortment reporting. Product feeds, GA4 events, ad spend, pricing and SEO data now land in BigQuery — infrastructure declared in Terraform, extraction running serverless on Cloud Functions + Scheduler, and dbt modelling the business logic into tested, reusable, report-ready tables. A scalable base for Power BI and future Marketing Mix Modelling.
Infrastructure
TerraformCloud Functions Gen 2Cloud SchedulerSecret ManagerGCS
Data
BigQuerydbt CloudGA4 ExportGoogle AdsMeta AdsPower BI
Language
Python

From cloud infrastructure to report-ready data models for assortment, e-commerce, advertising and pricing.

Context & problem

The retailer needed one reliable foundation for commerce and assortment reporting. Data was scattered across multiple sources: GA4, advertising platforms, product feeds, pricing tools and SEO sources.

Without central modelling you quickly end up with a landscape of loose exports, dashboards and definitions. Revenue, margin, channel performance, product availability and price position then depend on wherever you happen to look.

So a platform was built in which infrastructure, extraction, transformation and reporting are logically separated.

Architecture

The solution consists of three layers:

Cloud infrastructure
Terraform → Cloud Functions Gen 2 → Cloud Scheduler → Secret Manager → GCS

Data landing
Python extractors/loaders → BigQuery raw tables

Transformation & reporting
dbt staging → business transformations → final models → Power BI

Terraform manages the cloud resources. Python functions fetch data and load it into BigQuery. dbt then takes care of cleaning, business logic, tests and report-ready models.

Infrastructure-as-Code with Terraform

The Google Cloud infrastructure is set up declaratively with Terraform. Key components:

A reusable Terraform module handles zipping the Python code, uploading to GCS, deploying Cloud Functions, wiring secrets and setting invoker permissions. As a result, every new data source could be added following the same pattern.

Data sources and loading

The serverless extraction layer pulls data from multiple sources:

SourceUsed for
Commerce platform product feedProduct information, categories, prices and availability
Pricing APIOwn prices and competitor prices
GA4 BigQuery ExportEvents, sessions, transactions and e-commerce behaviour
Google Ads, Meta Ads, CriteoCampaign costs, clicks, impressions, conversions and revenue
SEO / Search ConsoleOrganic performance and page insights

The Python loaders normalise nested JSON, add metadata such as load_date and load_ts, enforce BigQuery schemas and write to partitioned tables. Where needed, day partitions are overwritten in place, so daily runs are repeatable without losing historical data.

dbt transformation layer

The dbt layer is organised into clear stages:

1_ga4_data_models
  staging and marts for events, sessions, users, items and transactions

2_business_transformations
  pricing, ecommerce, advertising, SEO and billing

3_final_data_models
  report-ready models for Power BI

GA4 data is first cleaned and enriched with session, user and traffic-source logic. Purchase events are deduplicated, transactions are joined to session context and attribution data, and e-commerce performance is modelled on reliable keys.

For advertising data, Google Ads, Meta Ads and Criteo are brought together in one uniform campaign model. Differences in field names, devices, metrics and conversion definitions are resolved centrally in dbt instead of in scattered dashboards.

For pricing, own prices are placed next to competitor prices. This produces models with price difference, price index, availability and labels such as competitor_cheaper, competitor_more_expensive and equal_price.

Many models are built incrementally with BigQuery insert_overwrite on day partitions. Only recent days are reprocessed, which lowers cost and speeds up refreshes.

What does the business get out of it?

The value is not just in "collecting data", but above all in better decision-making.

GA4 data becomes more reliable than the interface.
Instead of relying on the GA4 interface alone, the raw event data from BigQuery is transformed, cleaned, deduplicated and tested. Measurement errors and inconsistencies can be corrected or excluded, making the reporting qualitatively better than standard interface numbers.

Advertising data no longer lives in silos.
Google Ads, Meta Ads and Criteo are brought together in one data model. Performance can be compared across platforms on the same definitions for cost, clicks, impressions, conversions and revenue.

Clear daily insights in Power BI.
Every day the business gets tangible insight into e-commerce performance, channel contribution, pricing, assortment and marketing results. Reporting leans on stable dbt models instead of loose exports or manual calculations.

A foundation for Marketing Mix Modelling and advanced analytics.
With performance data, advertising costs, e-commerce behaviour and product context centrally available, a base emerges for more advanced applications such as Marketing Mix Modelling, attribution analysis, forecasting and commercial optimisation.

Engineering decisions that matter

  1. Everything as code. Cloud resources are managed with Terraform, making infrastructure reproducible and reviewable.
  2. Serverless extraction. Cloud Functions only run when needed. No servers, no idle compute.
  3. Secrets out of the code. Credentials are managed through Secret Manager and only made available at runtime.
  4. dbt as the central business layer. Definitions such as revenue, margin, channel, price position and campaign performance live in SQL models, not scattered across Power BI.
  5. Incremental BigQuery models. Day partitions are reprocessed intelligently, keeping the platform scalable.
  6. From silos to integrated models. Product data, GA4, pricing and advertising are not reported in isolation but connected in one analytical foundation.

Result

The project delivered a production-grade data foundation for retail analytics:

The platform enables daily steering on assortment, marketing performance, e-commerce behaviour and price position from one reliable data layer.

Status

The platform forms the basis for daily reporting and further development of the retail data warehouse. The combination of Terraform, BigQuery, dbt and Power BI makes it scalable, maintainable and ready for future analytics use cases.