How to Understand GA4 BigQuery Export: A Guide for Digital Marketers
*(The original article in Japanese was written by Kasumi Yoshinari, the Senior Customer Success Consultant of the Ayudante’s GMP team. This is a translated version.)
Kasumi Yoshinari, the Senior Customer Success Consultant of Ayudante
When you start exploring GA4, you’ll often hear about BigQuery. But why is it necessary to integrate with BigQuery? Is it even possible to use GA4 without linking it to BigQuery? These are common questions, especially for those new to Google’s latest analytics platform.
In this column, we’ll break down the essentials of integrating GA4 with BigQuery (BQ) specifically for digital marketers who want to unlock GA4’s full potential. This guide is specifically designed for marketers and advertising operation managers, focusing on easy-to-understand concepts without delving into technical details. So, feel free to read on, even if you don’t have a technical background!
Here, we’ll keep this column focused on understanding the core concepts rather than providing a detailed explanation or step-by-step instructions on how to use BigQuery.
What Exactly is BigQuery? And What Does BigQuery Export Mean?
BigQuery is a data warehouse service provided by Google—a place where you can organise and store vast amounts of data for aggregation and analysis.
BigQuery itself is a completely separate service from GA4, meaning it’s not limited to handling only GA4 data. You can use BigQuery to store various types of data, such as customer information, sales data, or data from third-party tools like Salesforce.
One of the key benefits of BigQuery is its seamless integration with GA4, thanks to both being part of Google’s ecosystem. This compatibility makes it easy to transfer data between the two services. You can easily set up your GA4 data to be stored in BigQuery, which is where the BigQuery Export feature comes in.
For more details, please visit the official help page:
[GA4] BigQuery Export
What Kind of Data is Stored in BigQuery?
When you set up GA4 to export data, the information is automatically extracted and stored in BigQuery. The data is saved using Google’s predefined structure and format, so you don’t have to decide what information to store. However, this also means you can’t customise the details of the data that gets saved.
The data in BigQuery isn’t pre-aggregated. It consists of raw, unprocessed information, like “X number of sessions” or “Y users came from source Z.
Rather than the summarised figures you see in GA4 reports, some types of data, such as those derived from Google Signals, aren’t exported at all.
You can access and use the exported data directly from the BigQuery interface within the Google Cloud management console.
Before you start exploring the data, it’s useful to understand where it is stored in BigQuery, which organises data into a hierarchy of “Project,” “Dataset,” and “Table.”
To set up BigQuery Export in GA4, you’ll set up a “project” in Google Cloud. This project serves as the billing and user management unit for all Google Cloud services, not just BigQuery.
When configuring data export in GA4, you select which project to use, and that’s where your daily data will be stored.
The actual location where data is stored in BigQuery is called a “table“. This is where GA4 data is kept, and when you want to analyse or aggregate data, you’ll work directly with these tables.
Each day’s data from GA4 is saved as a separate table in BigQuery. However, you won’t see each day listed individually in the menu; instead, these daily tables are grouped under “events_.”
To keep the data organised and manageable, these tables are grouped into a “dataset.” When you specify a project in GA4, BigQuery automatically creates a dataset named “analytics_XXX” (where “XXX” is your property ID). This dataset acts as a container, organising all the tables that store your GA4 data.
Now, let’s take a look at what the contents of a “table” actually look like when data is imported.
Each event is stored as a single row, and within that row, you’ll find details such as the timestamp (the time the data was sent), the event name, and various dimensions, all laid out across the columns.
For the most up-to-date information on the data structure, including the names and types of data included, please refer to the official help page.
For more details, please visit the official help page:
[GA4] BigQuery Export schema
How Do You Use the Stored Data?
You may have noticed that the data stored in BigQuery is in a completely different format from what you typically see on the report screens.
To analyse or utilise this data, you’ll need to process it from its raw state. The approach is similar to how you would handle data in Excel or other spreadsheet software.
GA4 performs this aggregation behind the scenes for you. However, when working with data stored in BigQuery, you need to decide how to aggregate it yourself by writing commands, known as queries, in a language called SQL.
SQL is used to interact with databases. To use it effectively, you’ll not only need to learn the syntax but also understand how GA4 data is structured in BigQuery. The exported data from GA4 often has a “nested structure,” meaning multiple pieces of data are contained within a single cell, which can make handling it a bit tricky.
By using SQL in BigQuery, you can specify things like “aggregate this dimension, this metric, and apply these filters,” and BigQuery will return the results based on your commands.
One of the challenges in using data stored in BigQuery is that you need to have the skills to process this data using SQL.
While this column won’t cover how to write queries or perform specific operations, in practice, it’s common to set up a structure where the digital marketers define what data they need, and the development or data teams handle writing the necessary queries for aggregation.
Why Use BigQuery for Aggregation?
You might wonder, if using BigQuery requires SQL and seems complicated, why not just stick to GA4?
Of course, there’s no need to use BigQuery for all your data aggregation. For data you want to check quickly on a daily basis or explore in more detail yourself, GA4 is the better option.
To help you decide, here’s a comparison of using the GA4 reports versus BigQuery for data aggregation:
Aggregation Model | Aggregation Flexity | Aggregation Method | Cost | |
GA4 Standard Report | Uses Google’s default settings. | Low | Through the graphical user interface (GUI) only. | No additional cost. |
GA4 Explorations | Requires custom design and setup. | Medium | ||
BigQuery | High | Requires writing SQL queries. | Costs apply based on query computing; some queries may be free. |
While GA4 allows for easy access to data, it does have limitations in terms of how freely you can aggregate data.
Exploration reports offer more flexibility than standard reports, but even they require viewing data in predefined combinations depending on the method used, meaning you can’t use them with complete freedom.
In contrast, BigQuery stores data in its raw form, which allows for much greater flexibility in combining dimensions and metrics. You can even regroup or manipulate the data as needed for aggregation, giving you a high level of control over how you analyse your data.
(Of course, you still need a solid understanding of GA4 to know which dimensions and metrics can or cannot be combined.)
Another significant advantage of using BigQuery is that it is not subject to the data retention limits that apply to GA4.
GA4’s free version retains data for up to 14 months, while the paid version extends this to a maximum of 50 months. This retention applies primarily to exploration reports, which are often essential when you need to dive deeper into your data.
However, data exported to BigQuery is not affected by these retention limits, allowing you to store and analyse data for as long as you need.
On the BigQuery side, you have the option to set how long tables are retained—whether for a specific number of days or indefinitely. If you need to store data for extended periods, using the BigQuery export feature is highly recommended.
For more details, please visit the official help page:
[GA4] Data retention
However, it’s important to note that there are costs associated with using BigQuery.
BigQuery charges are based on two factors: the “Storage cost” for keeping data and the “Querying costs” incurred when aggregating and processing data.
Although BigQuery’s fees are relatively low, and there is a free usage tier, it’s important to use it wisely. Not everything should be aggregated in BigQuery, so be mindful of how you use this service to avoid unnecessary costs.
For more details, please visit the official help page:
[GCP] BigQuery pricing
Summary
In this article, I have explained how the BigQuery Export feature works and what happens when you use it.
- The data in these tables can be freely processed and aggregated using a language called SQL.
- The BigQuery Export feature allows you to store your GA4 data in BigQuery.
- When you connect a Google Cloud project with GA4, BigQuery creates a dataset, which contains tables that store each day’s data separately.