[GA4] Recreating Landing Page Grouping Reports in GA4 (Google Analytics)

For one of our very special clients, the “Landing Page Content Group” report in Google Analytics Universal has been a regular part of their weekly narratives. Unfortunately, this report is not available in GA4 – collateral damage in this reimagining of Google Analytics.

Yes, GA4 has both “Landing Page” and “Content Group” dimensions. But if you try to cross these two dimensions in the default reports or custom report in Explorer, you get a mix of event- and session-scoped data, which yields unworkable results.

Also, this functionality does not seem to be in the GA4 roadmap as of this writing, so, for those of you looking for a solution to replicate “Landing page content group”, here is one way you can do it with Google BigQuery exports.

Basic Concept and Requirements

First, BigQuery. This workaround will be done using GA4 BigQuery exports. As we mentioned in a previous post, GA4 and BigQuery have a much closer relationship now than in GA Universal. BigQuery integration is free for paid and standard versions and many advanced (and sometimes not so advanced) analysis really needs it. So make sure you have GA4 BigQuery exports enabled. Here to setup the GA4 BigQuery Exports.

Second, understanding the general approach. This approach does not replicate the “landing page” report exactly, but it should give a fair indication of trends. In other words, expect the sessions here not to match the landing page in the reports to the dot. However, the trend should be similar.

This query (basically, the post) is based on a query created by our own Vibhor Jain – thanks Vibhor!

I’m just making a few changes to the query by Vibhor. Hopefully by the end of this post, anyone should be able to fine tune this query to accommodate different parameters, like a second content group.

Lastly, visualization, automation and other considerations. In this post we are just looking at the query. We will follow up with more on automating queries on BigQuery and visualizations in future post.

On with the query…

Breaking Down the Query

I am breaking the query into three working parts. The intention here is not to deep dive on all the BigQuery functions and SQL being used, but understand the general approach so that anyone can make further fine tuning to match their business requirements.

BigQuery integration is free for GA4 paid and standard versions and many advanced (and sometimes not so advanced) analysis really needs it.

Part 1: Getting The Desired Dimensions

First, let’s decide what we want to analyze. The first part of this query creates a temporary table with all the dimensions we will need to create the intended report and analysis. In GA4, these are often pulled from the “event_parameters” we have added in our implementation. In our case, we want to get the query the following:

  • event_date
  • a combination of user_pseudo_id and user_id to get a unique user.
  • event_timestamp
  • page_location
  • page_title
  • content_group
  • source
  • medium
  • campaign
⚠️ Here is an important bit. We are excluding taking these parameters from the following automatic events: “session_start“, “first_visit“, “first_open” events. This is because automatic events do not incorporate custom parameters set in the GA4 configuration tag, like content_group, which is the main value we want to extract. ⚠️

Also, noticed another condition to specify dates. This is fairly straightforward, just add the dates in the YYYYMMDD format, and that should do.

select 
  event_date,
    coalesce(user_id, user_pseudo_id) as user,
    (select value.int_value from unnest(event_params) where key = "ga_session_id") as session_id,
    event_timestamp,
    (select value.string_value from unnest(event_params) where key = "page_location") as page_location,
    (select value.string_value from unnest(event_params) where key = "page_title") as page_title,
    (select value.string_value from unnest(event_params) where key = "content_group") as content_group,
    (select value.string_value from unnest(event_params) where key = "source") as event_source,
    (select value.string_value from unnest(event_params) where key = "medium") as event_medium,
    (select value.string_value from unnest(event_params) where key = "campaign") as event_campaign
  from 
      `[YOUR_GA4_EXPORT_TABLE].events_*` 
  where event_name not in ("session_start", "first_visit", "first_open") 
  and _TABLE_SUFFIX BETWEEN '20230401' AND '20230411'    -- Replace date range.

Part 2: The First of Each Session

I like Vibhor’s approach here because it is so straightforward and easy to understand.

This part of the query creates one more table based on the previous temporary query. While the previous query listed all the parameters we wanted along with their session_id and event_timestamp, by using the functions “any_value” combined with “HAVING MIN event_timestamp” we can essentially look for, well, the value where the event_timesamp is the lowest.

This will most likely be an event named page_view which would represent the landing page. But the beauty here is that it is not restricted to the event name “page_view”. Perhaps it is the case that your implementation has other events firing before the page_view event, or perhaps you want the “screen_view” or something else. It should work!

  select
    event_date,
    user, session_id,
    any_value(page_location HAVING MIN event_timestamp) as landing_page_location,
    any_value(page_title HAVING MIN event_timestamp) as landing_page_title,
    any_value(content_group HAVING MIN event_timestamp) as landing_content_group,
    any_value(event_source HAVING MIN event_timestamp) as event_source,
    any_value(event_medium HAVING MIN event_timestamp) as event_medium,
    any_value(event_campaign HAVING MIN event_timestamp) as event_campaign
  from source
  group by event_date, user, session_id

Part 3: Get The Report!

The previous query created a normalized table for “landing pages”, or more accurately, “first events that have our GA4 configuration applied”. And that is the final table we want to query for our analysis.  Here is the final part of the query.

Notice we are finally using the “count” function to understand sessions and users. (I am commenting out “landing_page_title” just because it is too long for what we need. The original query had it and some people find I useful, so I am leaving it in.)

select
  event_date,
  event_source,
  event_medium,
  event_campaign,
  landing_page_location, 
  --landing_page_title, 
  landing_content_group,
  count(distinct concat(user, "-", session_id)) as sessions,
  count (distinct user) as users
from landing_pages
group by 
  event_date, 
  event_source, 
  event_medium, 
  event_campaign,
  landing_page_location,
  -- landing_page_title,
  landing_content_group
order by sessions desc

The Complete Query

Here is the complete query. Just replace [YOUR_GA4_EXPORT_TABLE] with, your GA4 export table and it should work!

with source as (
  select 
  event_date,
    coalesce(user_id, user_pseudo_id) as user,
    (select value.int_value from unnest(event_params) where key = "ga_session_id") as session_id,
    event_timestamp,
    (select value.string_value from unnest(event_params) where key = "page_location") as page_location,
    (select value.string_value from unnest(event_params) where key = "page_title") as page_title,
    (select value.string_value from unnest(event_params) where key = "content_group") as content_group,
    (select value.string_value from unnest(event_params) where key = "source") as event_source,
    (select value.string_value from unnest(event_params) where key = "medium") as event_medium,
    (select value.string_value from unnest(event_params) where key = "campaign") as event_campaign
  from 
      `[YOUR_GA4_EXPORT_TABLE].events_*` 
  where event_name not in ("session_start", "first_visit", "first_open") 
  and _TABLE_SUFFIX BETWEEN '20230401' AND '20230411'    -- Replace date range.
),

landing_pages as (
  select
    event_date,
    user, session_id,
    any_value(page_location HAVING MIN event_timestamp) as landing_page_location,
    any_value(page_title HAVING MIN event_timestamp) as landing_page_title,
    any_value(content_group HAVING MIN event_timestamp) as landing_content_group,
    any_value(event_source HAVING MIN event_timestamp) as event_source,
    any_value(event_medium HAVING MIN event_timestamp) as event_medium,
    any_value(event_campaign HAVING MIN event_timestamp) as event_campaign
  from source
  group by event_date, user, session_id
)

select
  event_date,
  event_source,
  event_medium,
  event_campaign,
  landing_page_location, 
  --landing_page_title, 
  landing_content_group,
  count(distinct concat(user, "-", session_id)) as sessions,
  count (distinct user) as users
from landing_pages
group by 
  event_date, 
  event_source, 
  event_medium, 
  event_campaign,
  landing_page_location,
  -- landing_page_title,
  landing_content_group
order by sessions desc

Comparing Results with GA4

Even though I am saying things will likely not match with the GA4 reports, I understand it is impossible to resist trying. Also, it is understandable to make sure the numbers are at least close. So, when doing the comparison, create a custom report table in your “Explorer” reports. The standard reports might complain about sampling or aggregation limits.

Jose Uzcategui

Global Lead / Sr. Analytics Consultant

Jose Uzcategui

Mr. Uzcategui joined Ayudante in Japan after working for ASICS and Amazon. His position previous to Ayudante was as a team leader, working on data construction/implementation and performance improvement using the Google Marketing Platform and the cloud. His hobbies are playing squash and cooking. He enjoys studying Japanese, Spanish, and English, and spring in Japan is his favorite season.