Google Analytics Data Retention Example (GA Universal)
In 2018 Google Analytics introduced a new setting to help organizations with evolving digital privacy regulations around the world, GDPR in particular: Data Retention.
In this article we illustrate the effects of GA’s Data Retention settings, including sample report screenshots. Like many concepts in this space, wordy explanations can be somewhat confusing.
We’ve noticed that at one end many professionals are setting Data Retention limits after only skimming through these explanations not realizing the impact of this setting in their Google Analytics reports. At the other end, some organizations are not taking action afraid of what might happen, risking legal compliance.
In the spirit of clarifying what this setting does, we wanted to show some screenshots that illustrate the consequences of Google Analytics Data Retention. In the example shown, the Google Analytics Universal Data Retention setting is set to 26 months – at the time of that screenshot 26 months was August 14 of 2019.
The screenshots below quickly illustrate the behaviour of GA’s report past the data retention date in three scenarios: an aggregate report, a report with a segment, and a report with a secondary dimension.
Not Affected: Data Retention in Aggregate Reports
If you just browse the aggregate reports, you will not see any effects of Data Retention settings. The image below shows the Channels report as if nothing had changed. This is what Google Support means when they say
“Keep in mind that standard aggregated Google Analytics reporting is not affected.”
https://support.google.com/analytics/answer/7667196?hl=en
Affected: Data Retention with Segments
Now we can see the effects of setting Data Retention. Once a segment is applied, data before the date set in the Data Retention settings is no longer accessible – it is gone. Analysis involving identifiable cohorts will become a challenge in Google Analytics after the data Retention Date.
Have in mind that many teams prefer (and sometimes require) the ability to analyze different cohorts over time – i.e. device types, countries or cities, campaigns, etc. This is no longer possible beyond the Data Retention date.
Affected: Data Retention with Secondary Dimensions
Similar to the applying a segment, applying adding a secondary dimension will most likely suppress data from the reports. Some segments will still work, but most will not. Analysis that involved identifiable cohorts will become a challenge in Google Analytics after the data Retention Date.
Notice that when you add a secondary dimension only the data table (lower half of the report) reflects the changes. The Explorer graph (top half) remains unchanged. This can be confusing!
Best Practices to Avoid Issues with Data Retention
Like any other “real-world” scenario, teams work with what they have, not with what they would like to have. So, what can organizations do? We often advise clients to consider two points:
Whenever possible, keep Data Retention at 26 months.
The reason for trying to kee Data Retention settings at 26 months is simple: year-to-year analysis.
We’ve encountered more than one case where the Data Retention setting was set at 14 months only for the team to find out they have lost the ability to analysis year-over-year. After setting data retention to 14 months, setting it back to 26 months will not recover the data. The team will have to wait for one more year in order to regain the ability to do year-over-year comparisons.
Data exports that address common analysis.
Legal requirements that often drive Data Retention policies in Google Analytics often do not apply (or apply somewhat differently) to data in your own servers.
Most organization needing legal compliance also have databases with customer information in a local server, Google Cloud Platform, AWS, Microsoft Azure, any of the Cloud providers that offer services needed to run an online business.
It may be worth tapping into these resources to have data from Google Analytics exported to your own database. This way teams can use Google Data Studio (Data Portal in Japan) to create dashboards and other reports that are needed across longer periods of time. Doing this will reduce the likelihood for problems down the road.
For those organizations using Google Analytics 360 or Google Analytics 4, data exports can be automated to BigQuery. These are raw data exports, meaning that with some SQL chops teams will be able to replicate any GA report.
We understand that in most organizations legal requirements override the wishes of digital analytics teams – as it should. Nonetheless, understanding the impact of legal requirements will enable an organization to prepare for what is coming. For Google Analytics Data Retention, the impacts are showcased above.
For those interested to know more about the technical details of this setting, I encourage you go through Google’s support page on Data Retention.
Let us know if you have any experiences you would like to share around Google Analytics Data Retention settings in the comments.