Activity Stream Datasets
This article describes the various BigQuery tables Mozilla uses to store Activity Stream data, along with some examples of how to access them.
Table of Contents
- What is Activity Stream?
- Activity Stream Pings
- Accessing Activity Stream Data
- Gotchas and Caveats
- Examples
What is Activity Stream?
Activity Stream is the Firefox module which manages the in product content pages for Firefox:
about:home
about:newtab
about:welcome
Snippets
CFR
Onboarding
What's new panel
Moments pages
The Activity Stream team has implemented data collection in and around these pages. This data has some overlap with the standard Firefox Telemetry system, however it is a custom system, designed and maintained by that team.
For specific questions about this data, reach out to the #fx-messaging-system
Slack channel directly.
Activity Stream Pings
This data is measured in various custom pings that are sent via PingCentre (different from Pingsender).
Accessing Activity Stream Data
Activity Stream pings are stored in BigQuery (like other Firefox Telemetry). There are two datasets: activity_stream
and messaging_system
.
activity_stream
The activity_stream
dataset contains the following tables:
events
stores user interactions with theabout:home
andabout:newtab
pagessessions
stores sessions ofabout:home
andabout:newtab
pagesimpression_stats
stores impression/click/block events for the Pocket recommendations on theabout:home
andabout:newtab
pagesspoc_fills
stores "Pocket Sponsored" recommendation related pings
messaging_system
The messaging_system
dataset contains the following tables:
cfr
stores metrics on user interactions with the CFR (Contextual Feature Recommendation) systemmoments
stores "Moments Pages" related pingsonboarding
stores metrics on user interactions with onboarding featuressnippets
stores impression/click/dismissal metrics for Firefox Snippetswhats_new_panel
stores "What's New Panel" related pingsundesired_events
stores system health related events
Gotchas and Caveats
Since this data collection isn't collected or maintained through our standard Telemetry API, there are a number of "gotchas" to keep in mind when working on this data:
-
Ping send conditions: Activity Stream pings have different send conditions, both from Telemetry pings as well as from each other. For example, AS Session Pings get sent by profiles that entered an Activity Stream session, at the end of that session, regardless of how long that session is. Compare this to
main
pings, which get sent by all Telemetry enabled profiles upon subsession end (browser shutdown, environment change, or local midnight cutoff).Due to these inconsistencies, using data from different sources can be tricky. For example, if we wanted to know how much of DAU (from
main
pings) had a customabout:home
page (available in AS Health Pings), joining onclient_id
and a date field would only provide information on profiles that started the session on that same day (active profiles on multi-day sessions would be excluded). -
Population covered: In addition to the usual considerations when looking at a measurement (in what version of Firefox did this measurement start getting collected? In what channels is it enabled in? etc.), when working with this data, there are additional Activity Stream specific conditions to consider when deciding "who is eligible to send this ping?"
For example, Pocket recommendations are only enabled in the US, CA, UK, and DE countries, for profiles that are on en-US, en-CA, en-GB, and de locales. Furthermore, users can set their
about:home
andabout:newtab
page to non-Activity Stream pages. This information can be important when deciding denominators for certain metrics. -
Different ping types in the same table: The tables in the
activity_stream
namespace can contain multiple types of pings. For example, theevents
table contains both AS Page Takeover pings as well as AS User Event pings. -
Null handling: Some fields in the Activity Stream data encode nulls with a
'N/A'
string or a-1
value. -
Changes in ping behaviors: These pings continue to undergo development and the behavior as well as possible values for a given ping seem to change over time. For example, older versions of the event pings for clicking on a Topsite do not seem to report
card_types
andicon_types
, while newer versions do. Caution is advised. -
Pocket data: Data related to Pocket interaction and usage in the
about:home
andabout:newtab
pages get sent to Pocket via this data collection and pipeline. However, due to privacy reasons, theclient_id
is omitted in the ping whenever the Pocket recommendation identifiers are included, instead it reports with another user unique identifierimpression_id
. Though all the Pocket user interactions, such as clicks, dismisses, and save to pocket are still reported as the regular events with theclient_id
as long as they don't contain the Pocket recommendation identifiers.
Examples
Sessions per client_id
Note: only includes client_ids
that completed an Activity Stream session that day.
SELECT
client_id,
DATE(submission_timestamp) AS date,
count(DISTINCT session_id) as num_sessions
FROM
`moz-fx-data-shared-prod.activity_stream.sessions`
WHERE
DATE(submission_timestamp) = '20200601'
GROUP BY
1
Topsite clicks and Highlights clicks
SELECT
client_id,
DATE(submission_timestamp) AS date,
session_id,
page,
source,
action_position,
experiments
FROM
`moz-fx-data-shared-prod.activity_stream.events`
WHERE
source in ('TOP_SITES', 'HIGHLIGHTS')
AND event = 'CLICK'
DATE(submission_timestamp) = '20200601'
Topsite
Tile Dismissals: Sponsored and Non-Sponsored
The Topsite
Tile Dismiss
action corresponds to the BLOCK
event which can be taken on a Sponsored or Non-Sponsored Tile reference. When applied to a Non-Sponsored Tile, the BLOCK
event prevents the Tile from appearing in TopSites
but leaves the browsing history as is. The DELETE
event is fired when the user selects Delete from History
and is only applicable to Non-Sponsored Tiles. This action deletes the URL from the client's complete browser history and prevents the Tile from appearing in their Topsites.
DELETE
doesn't apply to Sponsored Tiles as these are not generated by the user's browsing history.
SELECT
DATE(submission_timestamp) AS date,
count(*)
FROM
`moz-fx-data-shared-prod.activity_stream.events`
WHERE
source = 'TOP_SITES'
AND event = 'BLOCK'
AND DATE(submission_timestamp) = '20220101'
AND value LIKE '%"card_type":"spoc"%'
GROUP BY 1
ORDER BY 1
Snippet impressions, blocks, clicks, and dismissals
Note: Which snippet message a record corresponds to can be identified by the message_id
(check with Marketing for snippet recipes published).
SELECT
client_id,
DATE(submission_timestamp) AS date,
event,
message_id,
event_context,
experiments
FROM
`moz-fx-data-shared-prod.messaging_system.snippets`
WHERE
DATE(submission_timestamp) = '20200601'