From cloud infrastructure to report-ready data models for assortment, e-commerce, advertising and pricing.
Context & problem
The retailer needed one reliable foundation for commerce and assortment reporting. Data was scattered across multiple sources: GA4, advertising platforms, product feeds, pricing tools and SEO sources.
Without central modelling you quickly end up with a landscape of loose exports, dashboards and definitions. Revenue, margin, channel performance, product availability and price position then depend on wherever you happen to look.
So a platform was built in which infrastructure, extraction, transformation and reporting are logically separated.
Architecture
The solution consists of three layers:
Cloud infrastructure
Terraform → Cloud Functions Gen 2 → Cloud Scheduler → Secret Manager → GCS
Data landing
Python extractors/loaders → BigQuery raw tables
Transformation & reporting
dbt staging → business transformations → final models → Power BI
Terraform manages the cloud resources. Python functions fetch data and load it into BigQuery. dbt then takes care of cleaning, business logic, tests and report-ready models.
Infrastructure-as-Code with Terraform
The Google Cloud infrastructure is set up declaratively with Terraform. Key components:
- Cloud Functions Gen 2 for extraction and load processes.
- Cloud Scheduler for daily triggers.
- Secret Manager for API keys and credentials.
- Google Cloud Storage for deployment artifacts.
- IAM for controlled access per workload.
- Remote Terraform state in GCS, so the team can collaborate safely.
A reusable Terraform module handles zipping the Python code, uploading to GCS, deploying Cloud Functions, wiring secrets and setting invoker permissions. As a result, every new data source could be added following the same pattern.
Data sources and loading
The serverless extraction layer pulls data from multiple sources:
| Source | Used for |
|---|---|
| Commerce platform product feed | Product information, categories, prices and availability |
| Pricing API | Own prices and competitor prices |
| GA4 BigQuery Export | Events, sessions, transactions and e-commerce behaviour |
| Google Ads, Meta Ads, Criteo | Campaign costs, clicks, impressions, conversions and revenue |
| SEO / Search Console | Organic performance and page insights |
The Python loaders normalise nested JSON, add metadata such as load_date and load_ts, enforce BigQuery schemas and write to partitioned tables. Where needed, day partitions are overwritten in place, so daily runs are repeatable without losing historical data.
dbt transformation layer
The dbt layer is organised into clear stages:
1_ga4_data_models
staging and marts for events, sessions, users, items and transactions
2_business_transformations
pricing, ecommerce, advertising, SEO and billing
3_final_data_models
report-ready models for Power BI
GA4 data is first cleaned and enriched with session, user and traffic-source logic. Purchase events are deduplicated, transactions are joined to session context and attribution data, and e-commerce performance is modelled on reliable keys.
For advertising data, Google Ads, Meta Ads and Criteo are brought together in one uniform campaign model. Differences in field names, devices, metrics and conversion definitions are resolved centrally in dbt instead of in scattered dashboards.
For pricing, own prices are placed next to competitor prices. This produces models with price difference, price index, availability and labels such as competitor_cheaper, competitor_more_expensive and equal_price.
Many models are built incrementally with BigQuery insert_overwrite on day partitions. Only recent days are reprocessed, which lowers cost and speeds up refreshes.
What does the business get out of it?
The value is not just in "collecting data", but above all in better decision-making.
GA4 data becomes more reliable than the interface.
Instead of relying on the GA4 interface alone, the raw event data from BigQuery is transformed, cleaned, deduplicated and tested. Measurement errors and inconsistencies can be corrected or excluded, making the reporting qualitatively better than standard interface numbers.
Advertising data no longer lives in silos.
Google Ads, Meta Ads and Criteo are brought together in one data model. Performance can be compared across platforms on the same definitions for cost, clicks, impressions, conversions and revenue.
Clear daily insights in Power BI.
Every day the business gets tangible insight into e-commerce performance, channel contribution, pricing, assortment and marketing results. Reporting leans on stable dbt models instead of loose exports or manual calculations.
A foundation for Marketing Mix Modelling and advanced analytics.
With performance data, advertising costs, e-commerce behaviour and product context centrally available, a base emerges for more advanced applications such as Marketing Mix Modelling, attribution analysis, forecasting and commercial optimisation.
Engineering decisions that matter
- Everything as code. Cloud resources are managed with Terraform, making infrastructure reproducible and reviewable.
- Serverless extraction. Cloud Functions only run when needed. No servers, no idle compute.
- Secrets out of the code. Credentials are managed through Secret Manager and only made available at runtime.
- dbt as the central business layer. Definitions such as revenue, margin, channel, price position and campaign performance live in SQL models, not scattered across Power BI.
- Incremental BigQuery models. Day partitions are reprocessed intelligently, keeping the platform scalable.
- From silos to integrated models. Product data, GA4, pricing and advertising are not reported in isolation but connected in one analytical foundation.
Result
The project delivered a production-grade data foundation for retail analytics:
- declarative cloud infrastructure with Terraform;
- daily serverless data loads into BigQuery;
- secure credential handling via Secret Manager;
- cleaned and tested GA4 models;
- integrated advertising models across multiple platforms;
- pricing models for competitive analysis;
- report-ready datasets for Power BI;
- a base for advanced applications such as Marketing Mix Modelling.
The platform enables daily steering on assortment, marketing performance, e-commerce behaviour and price position from one reliable data layer.
Status
The platform forms the basis for daily reporting and further development of the retail data warehouse. The combination of Terraform, BigQuery, dbt and Power BI makes it scalable, maintainable and ready for future analytics use cases.