Event Data Pipeline

We collect event-oriented data from different sources. This data is collected and processed in a specific path through our data pipeline, which we will detail here.

graph TD

subgraph Products
fx_code(fa:fa-cog Firefox code) --> firefox(fa:fa-firefox Firefox Telemetry)
fx_extensions(fa:fa-cog Mozilla extensions) --> firefox
fx_hybrid(fa:fa-cog Hybrid Content) --> firefox
mobile(fa:fa-cog Mobile products) --> mobile_telemetry(fa:fa-firefox Mobile Telemetry)
end

subgraph Data Platform
firefox -.->|main ping, Firefox <62| pipeline((fa:fa-database Firefox Data Pipeline))
firefox -->|event ping, Firefox 62+| pipeline
mobile_telemetry --> |mobile events ping| pipeline
pipeline -->|Firefox <62 events| main_summary[fa:fa-bars main summary table]
pipeline -->|Firefox 62+ events| events_table[fa:fa-bars events table]
main_summary --> events_table
pipeline -->|Mobile events| mobile_events_table[fa:fa-bars mobile events table]
end

subgraph Data Tools
events_table --> redash
mobile_events_table --> redash
main_summary --> redash(fa:fa-bar-chart Redash)
pipeline -->|on request| amplitude(fa:fa-bar-chart Amplitude)
end

style fx_code fill:#f94,stroke-width:0px
style fx_extensions fill:#f94,stroke-width:0px
style fx_hybrid fill:#f94,stroke-width:0px
style mobile fill:#f94,stroke-width:0px
style firefox fill:#f61,stroke-width:0px
style mobile_telemetry fill:#f61,stroke-width:0px
style pipeline fill:#79d,stroke-width:0px
style main_summary fill:lightblue,stroke-width:0px
style events_table fill:lightblue,stroke-width:0px
style mobile_events_table fill:lightblue,stroke-width:0px
style redash fill:salmon,stroke-width:0px
style amplitude fill:salmon,stroke-width:0px

Overview

Across the different Firefox teams there is a common need for a more fine grained understanding of product usage, like understanding the order of interactions or how they occur over time. To address that our data pipeline needs to support working with event-oriented data.

We specify a common event data format, which allows for broader, shared usage of data processing tools. To make working with event data feasible, we provide different mechanisms to get the event data from products to our data pipeline and make the data available in tools for analysis.

The event format

Events are submitted as an array, e.g.:

[
  [2147, "ui", "click", "back_button"],
  [2213, "ui", "search", "search_bar", "google"],
  [2892, "ui", "completion", "search_bar", "yahoo",
    {"querylen": "7", "results": "23"}],
  [5434, "dom", "load", "frame", null,
    {"prot": "https", "src": "script"}],
  // ...
]

Each event is of the form:

[timestamp, category, method, object, value, extra]

Where the individual fields are:

  • timestamp: Number, positive integer. This is the time in ms when the event was recorded, relative to the main process start time.
  • category: String, identifier. The category is a group name for events and helps to avoid name conflicts.
  • method: String, identifier. This describes the type of event that occurred, e.g. click, keydown or focus.
  • object: String, identifier. This is the object the event occurred on, e.g. reload_button or urlbar.
  • value: String, optional, may be null. This is a user defined value, providing context for the event.
  • extra: Object, optional, may be null. This is an object of the form {"key": "value", ...}, both keys and values need to be strings. This is used for events when additional richer context is needed.

See also the Firefox Telemetry documentation.

Event data collection

Firefox event collection

To collect this event data in Firefox there are different APIs in Firefox, all addressing different use cases:

For all these APIs, events will get sent to the pipeline through the event ping, which gets sent hourly, if any pings were recorded, or up to every 10 minutes whenever 1000 events were recorded. Before Firefox 62, events were sent through the main ping instead, with a hard limit of 500 events per ping. From Firefox 61, all events recorded through these APIs are automatically counted in scalars.

Finally, custom pings can follow the event data format and potentially connect to the existing tooling with some integration work.

Mobile event collection

Mobile events data primarily flows through the mobile events ping (ping schema), from e.g. Firefox iOS, Firefox for Fire TV and Rocket.

Currently we also collect event data from Firefox Focus through the focus-events ping, using the telemetry-ios and telemetry-android libraries.

Datasets

On the pipeline side, the event data is made available in different datasets:

  • main_summary has a row for each main ping and includes its event payload for Firefox versions before 62.
  • events contains a row for each event received from main pings and event pings. See this sample query.
  • telemetry_mobile_event_parquet contains a row for each mobile event ping. See this sample query.
  • focus_events_longitudinal currently contains events from Firefox Focus.

Data tooling

The above datasets are all accessible through Re:dash and Spark jobs.

For product analytics based on event data, we have Amplitude (hosted by the IT data team). We can connect our event data sources data to Amplitude. We have an active connector to Amplitude for mobile events, which can push event data over daily. For Firefox Desktop events this will be available soon.