Event Data Pipeline
We collect event-oriented data from different sources. This data is collected and processed in a specific path through our data pipeline, which we will detail here.
graph TD subgraph Products fx_code(fa:fa-cog Firefox code) --> firefox(fa:fa-firefox Firefox Telemetry) fx_extensions(fa:fa-cog Mozilla extensions) --> firefox mobile(fa:fa-cog Mobile products) --> mobile_telemetry(fa:fa-firefox Glean) end subgraph Data Platform firefox -.->|main ping, Firefox <62| pipeline((fa:fa-database Firefox Data Pipeline)) firefox -->|event ping, Firefox 62+| pipeline mobile_telemetry --> |events ping| pipeline pipeline -->|Firefox <62 events| main_summary[fa:fa-bars main summary table] pipeline -->|Firefox 62+ events| mobile_events_table[fa:fa-bars events table] main_summary --> events_table pipeline -->|Glean events| events_table[fa:fa-bars events table] end subgraph Data Tools events_table --> looker events_table --> looker main_summary --> looker(fa:fa-bar-chart Looker) end style fx_code fill:#f94,stroke-width:0px style fx_extensions fill:#f94,stroke-width:0px style mobile fill:#f94,stroke-width:0px style firefox fill:#f61,stroke-width:0px style mobile_telemetry fill:#f61,stroke-width:0px style pipeline fill:#79d,stroke-width:0px style main_summary fill:lightblue,stroke-width:0px style events_table fill:lightblue,stroke-width:0px style mobile_events_table fill:lightblue,stroke-width:0px style looker fill:salmon,stroke-width:0px
Overview
Across the different Firefox teams there is a common need for a more fine grained understanding of product usage, like understanding the order of interactions or how they occur over time. To address that our data pipeline needs to support working with event-oriented data.
We specify a common event data format, which allows for broader, shared usage of data processing tools. To make working with event data feasible, we provide different mechanisms to get the event data from products to our data pipeline and make the data available in tools for analysis.
The event format
Events are submitted as an array, e.g.:
[
[2147, "ui", "click", "back_button"],
[2213, "ui", "search", "search_bar", "google"],
[
2892,
"ui",
"completion",
"search_bar",
"yahoo",
{ querylen: "7", results: "23" },
],
[5434, "dom", "load", "frame", null, { prot: "https", src: "script" }],
// ...
];
Each event is of the form:
[timestamp, category, method, object, value, extra];
Where the individual fields are:
timestamp
:Number
, positive integer. This is the time in ms when the event was recorded, relative to the main process start time.category
:String
, identifier. The category is a group name for events and helps to avoid name conflicts.method
:String
, identifier. This describes the type of event that occurred, e.g.click
,keydown
orfocus
.object
:String
, identifier. This is the object the event occurred on, e.g.reload_button
orurlbar
.value
:String
, optional, may be null. This is a user defined value, providing context for the event.extra
:Object
, optional, may be null. This is an object of the form{"key": "value", ...}
, both keys and values need to be strings. This is used for events when additional richer context is needed.
See also the Firefox Telemetry documentation.
Event data collection
Firefox event collection
To collect this event data in Firefox there are different APIs in Firefox, all addressing different use cases:
- The Telemetry event API allows easy recording of events from Firefox code.
- The dynamic event API allows code from Mozilla addons to record new events into Telemetry without shipping Firefox code.
- The Telemetry WebExtension API (introduced here) which allows Mozilla extensions to record new events into Telemetry.
For all these APIs, events will get sent to the pipeline through the event ping, which gets sent hourly, if any pings were recorded, or up to every 10 minutes whenever 1000 events were recorded. Before Firefox 62, events were sent through the main ping instead, with a hard limit of 500 events per ping. From Firefox 61, all events recorded through these APIs are automatically counted in scalars.
Finally, custom pings can follow the event data format and potentially connect to the existing tooling with some integration work.
Mobile event collection
Mobile data collection is done through Glean. Glean events are recorded for our mobile applications.
Datasets
On the pipeline side, the event data is made available in different datasets:
main_summary
has a row for each main ping and includes its event payload for Firefox versions before 62.events
contains a row for each event received from main pings and event pings. SeeSTMO#52582
.- For applications that collect events through Glean, each application has a separate
events
dataset.
Data tooling
The above datasets are all accessible through STMO and Looker.