Why should you add your Google Analytics (GA4) data to a warehouse or data lake? And how?
Why should you add your Google Analytics (GA4) data to a warehouse or data lake? And how?
Why
If you use Google Analytics 4 to collect data from your websites or mobile applications, it’s wise to store a copy of these data in your own systems for a variety of reasons.
- To take control of your data. What would happen if you wanted to leave Google Analytics or if the product disappeared?
- To analyze all of your marketing data in a holistic way, such as when reconciling your efforts with the performance you see in GA.
- To push use cases beyond what the interface suggests.
To this end, there are a number of ways to save your GA4 data for storage in a warehouse or data lake: integrate either the raw data or the API aggregated data. Let’s take a look at which fulfills your needs as well as the methods available to reintegrate them.
Raw event data
Raw data contain events ingested as-is by GA4. They have so little enrichment that Google could operate in GA4. The model’s level of granularity is “one line per event.” These are the data we want to ingest if we’re looking to reconcile a user’s actions with their purchase journey and their connection with a brand within a 360 context (link in French only).
Advantages:
- The native connector is easy to configure and use.
- It’s integrated directly into GA4, which means that data are automatically transferred to BigQuery in a few clicks.
- It’s free of charge for GA4 customers.
- It’s the only way to establish complex use cases that require several consecutive actions by the same user within a single period (for example).
Drawbacks:
- The data are stored in BigQuery, which may be expensive if you have a lot of data.
- You can’t customize the data that’s imported. You have to calculate the metrics yourself.
- No data modelling. Data from non-consenting users carry no session or user identifier.
Using the GA4–Big Query native connector
The simplest and most common way to save Google Analytics 4 data in a data warehouse is to use the native GA4-BigQuery connector. This connector automatically imports GA4 data to BigQuery, where you can then examine and analyze them using SQL.
The connector can be configured in 2 ways:
- Real-time streaming: the moment an event is ingested into GA4, it is exported to BQ
- Batching: Once a day, a list of events from the day before is exported

The connector is configured directly in your GA4 property.
Saving the data in another data warehouse
It is often the case that your company already has a data warehouse in which you would like to centralize your GA4 data. The first thing you should know is that you can’t forgo using BigQuery—you’ll need it as a data transfer platform. The process for saving this data in another warehouse will therefore be different depending on the situation.
In Snowflake
Snowflake has a native connector to suck in BigQuery data into its own storage space, then slightly transform them to adapt them to the Snowflake model. For more information, you should read my previous article on creating a 360 view in Snowflake.
In another warehouse (Databricks, Redshift, etc.)
An excellent solution for transferring data into a data lake like Databricks or Redshift is to export the previous day’s table of data into Google Cloud Storage on a daily basis. It’s then fairly simple to copy the data into your respective storage space (GCS to S3 or GCS to Blob Storage). All the major cloud providers integrate services for copying external data to their respective cloud service.

Example of no-code replication of data from GCS to S3 via AWS DataSync.
How do you derive metrics and statistics?
Once the use cases are defined, you can use a tool such as dbt or Dataform to transform data via SQL pipelines to make analysts’ lives easier.
For example, dbt is the most popular tool out there to perform this work, and there are already existing packages to transform GA4 data in BigQuery. You should note that dbt is also available for Snowflake, Databricks and Postgre, but there is no “premade” code in existence as of the time of writing.
GA4 aggregated data
It is also possible to save “Report”-style aggregated data via the GA4 API. The granularity will be adjustable depending on the use case. This could be seen as a faster, more efficient way to save your simple report data. On the other hand, the use cases will be very limited. You will never know which user did what with this method.
You could, for example, retrieve on a daily basis the number of sessions and users who visited your site by acquisition channel, source, and medium. You could refer to the list of metrics and dimensions to identify possible use cases.
Advantages:
- You can customize the model of the data that are retrieved.
- The data modelled by Google algorithms are included. If the interface displays 100, the API returns 100. This facilitates QA and the confidence specialized teams have in the data.
- Fairly fast time to value for simple use cases.
Drawbacks:
- Expect certain limitations to use cases. The API has limits on the number of dimensions (amongst other things).
- If your property is highly subject to sampling, the total lines returned by the API may vary from the total displayed in the interface (by up to 5%).
- The API has quota limits that may restrict complex use cases.

To retrieve data from the GA4 API, you have 2 major options:
Use “customized” scripts
This requires development skills to extract the GA4 data via the API and save them to your data warehouse. This can be accomplished by using online developer libraries like those available for Python and R, maintained by Google, which are fairly simple to use for veteran programmers. However, like any such technical project, be aware that this will require maintenance over the years as the GA4 API is updated.
We don’t recommend that our clients develop their own scripts except in a few very specific cases.
External third-party connectors
There are also external third-party connectors, such as Supermetrics, Airbyte, or Fivetran, which can be used to save GA4 data to a data warehouse. These connectors are generally paid connectors and may require additional configuration.
The advantage is that the time to value is extremely fast if you don’t consider the time it takes to enter into a contract. If you find your company can be a bit slow to integrate new third-party solutions, this could take several weeks or months.

We always recommend using an integrator of this type whenever possible for our clients. Basically, it’s a lot safer for the long-term viability of your project.
Conclusion
Which paradigm should you choose? Your selection will depend on your specific needs and your development skills. First you need to ask yourself about your use cases. In reality, you’ll probably use a bit of both methods. As of the time of writing, “acquisition”-type reports are often easier to integrate via the API because Google’s attribution rules and the modelling are sometimes hard to reconstruct in SQL. On the other hand, it’s a lot more enjoyable to work with raw data for all customer knowledge cases.
We help our clients define their marketing data integration and centralization needs on a daily basis. Don’t hesitate to get in touch with us if you have any questions.
Expertises évoquées
(SizeLimitingPyList: [Understand the impacts, Analytics & Tracking, Acquiring durable audiences, Measuring and optimizing initiatives, Cookie Apocalypse, How-To])TechHow-ToBusinessBusiness StrategyPaid Media & SEMAnalytics & TrackingClient experience & UXContent & SEOSocial MediaInnovationDigital transformationE-commerceData Science & AI MarketingAdvisoOpinionsArtificial Intelligence MarketingSeriesCustomer-centric marketingContentProject ManagementUnderstand the impactsAcquiring durable audiencesEmail marketingGDPRAI MarketingCookie ApocalypsePersonalizing the customer experienceRaise awarenessCreating a value exchange programMarTechMarketing AutomationMeasuring and optimizing initiativesPrivate: Cookie ApocalypseBrandingFacebook AnalyticsLoyaltyOne-on-oneCOMMERCECtrl+A | A Podcast About Digital StrategyFacebook AnalyticsFind solutionsInnovationInsiderSecret weaponsTHOMA DANEAUWebinars
You May Also Like
These Related Stories

Why You Should Export Your Google Analytics 4 Data to BigQuery
The new version of Google Analytics, commonly referred to as GA4, offers the opportunity to export raw data from the platform towards Google Cloud’s BigQuery. It’s a big decision by Google, given that previously, only customers of the paid version of Google Analytics 360 had access to this functionality. Here’s why marketers should consider this option carefully in 2021. How to save $150,000 in Google Analytics 360 licensing fees Other than the fact that it helps prepare you for the seismic changes coming in the world of digital marketing, the major advantage of implementing GA4 is that it provides you with the opportunity to export your raw data – FREE – from Google Analytics to Google Cloud’s BigQuery data warehouse. The goldmine represented by having access to raw analytics data is one of the main reasons many large companies decided to procure GA 360 licenses in the first place. As mentioned in our earlier article on GA4, this represents a massive savings and opens up access to the …

Salesforce Data Cloud: Enhance your Google Analytics 4 (GA4) data
Are you using Salesforce (SF) Data Cloud or are you planning on integrating it into your marketing technology (MarTech) stack? You’re doubtless already familiar with how the platform can enhance your strategies through its 360-degree customer view and AI-based chatbots.

6 steps to building a data strategy
A comprehensive data strategy is essential for guiding an organization’s transition to a data-driven culture. With the rise of analytics technologies and platforms, whether new or updated—such as data clouds, customer data platforms (CDPs), data clean rooms, etc.—it has become more important than ever to remember that a strong data strategy is foundational to all of your undertakings if you intend to become a data-focused business. The size of your company shouldn’t be an obstacle to having an effective data strategy. What’s important is not to rely on intuition alone to guide your business decisions. In this article, I’ll be addressing three classic questions about data strategy: what, why, and how. What is a data strategy? It’s crucial to distinguish a data strategy from mere operational planning. A data strategy is not: a list of analytics tools you are planning to acquire next year; the number of employees you hope to hire as part of your analytics team next year; or an arbitrary l …
Repérer et valoriser ce qui compte vraiment.
Recevez nos analyses et conseils pour rester à l’avant-garde du numérique.