Choosing a Desktop Product Dataset

This document will help you find the best data source for a given analysis. It focuses on descriptive datasets and does not cover anything attempting to explain why something is observed. This guide will help if you need to answer questions like:

  • How many Firefox users are active in Germany?
  • How many crashes occur each day?
  • How many users have installed a specific add-on?

If you want to know whether a causal link occurs between two events, you can learn more at tools for experimentation.

Table of Contents

Raw Pings

We receive data from our users via pings. There are several types of pings, each containing different measurements and sent for different purposes. To review a complete list of ping types and their schemata, see this section of the Mozilla Source Tree Docs.

Pings are also described by a JSONSchema specification which can be found in the mozilla-pipeline-schemas repository.

There are a few pings that are central to delivering our core data collection primitives (Histograms, Events, Scalars) and for keeping an eye on Firefox behaviour (Environment, New Profiles, Updates, Crashes).

For instance, a user's first session in Firefox might have four pings like this:

Flowchart of pings in the user's first session

"main" ping

The "main" ping is the workhorse of the Firefox Telemetry system. It delivers the Telemetry Environment as well as Histograms and Scalars for all process types that collect data in Firefox. It has several variants each with specific delivery characteristics:

ReasonSent whenNotes
shutdownFirefox session ends cleanlyAccounts for about 80% of all "main" pings. Sent by Pingsender immediately after Firefox shuts down, subject to conditions: Firefox 55+, if the OS isn't also shutting down, and if this isn't the client's first session. If Pingsender fails or isn't used, the ping is sent by Firefox at the beginning of the next Firefox session.
dailyIt has been more than 24 hours since the last "main" ping, and it is around local midnightIn long-lived Firefox sessions we might go days without receiving a "shutdown" ping. Thus the "daily" ping is sent to ensure we occasionally hear from long-lived sessions.
environment-changeTelemetry Environment changesIs sent immediately when triggered by Firefox (Installing or removing an addon or changing a monitored user preference are common ways for the Telemetry Environment to change)
aborted-sessionFirefox session doesn't end cleanlySent by Firefox at the beginning of the next Firefox session.

It was introduced in Firefox 38.

"first-shutdown" ping

The "first-shutdown" ping is identical to the "main" ping with reason "shutdown" created at the end of the user's first session, but sent with a different ping type. This was introduced when we started using Pingsender to send shutdown pings as there would be a lot of first-session "shutdown" pings that we'd start receiving all of a sudden.

It is sent using Pingsender.

It was introduced in Firefox 57.

"event" ping

The "event" ping provides low-latency eventing support to Firefox Telemetry. It delivers the Telemetry Environment, Telemetry Events from all Firefox processes, and some diagnostic information about Event Telemetry. It is sent every hour if there have been events recorded, and up to once every 10 minutes (governed by a preference) if the maximum event limit for the ping (default to 1000 per process, governed by a preference) is reached before the hour is up.

It was introduced in Firefox 62.

"update" ping

Firefox Update is the most important means we have of reaching our users with the latest fixes and features. The "update" ping notifies us when an update is downloaded and ready to be applied (reason: "ready") and when the update has been successfully applied (reason: "success"). It contains the Telemetry Environment and information about the update.

It was introduced in Firefox 56.

"new-profile" ping

When a user starts up Firefox for the first time, a profile is created. Telemetry marks the occasion with the "new-profile" ping which sends the Telemetry Environment. It is sent either 30 minutes after Firefox starts running for the first time in this profile (reason: "startup") or at the end of the profile's first session (reason: "shutdown"), whichever comes first. "new-profile" pings are sent immediately when triggered. Those with reason "startup" are sent by Firefox. Those with reason "shutdown" are sent by Pingsender.

It was introduced in Firefox 55.

"crash" ping

The "crash" ping provides diagnostic information whenever a Firefox process exits abnormally. Unlike the "main" ping with reason "aborted-session", this ping does not contain Histograms or Scalars. It contains a Telemetry Environment, Crash Annotations, and Stack Traces.

It was introduced in Firefox 40.

"deletion-request" ping

In the event a user opts out of Telemetry, we send one final "deletion-request" ping to let us know. It contains only the common ping data and an empty payload.

It was introduced in Firefox 72, replacing the "optout" ping (which was in turn introduced in Firefox 63).

"coverage" ping

The coverage ping (announcement) is a periodic census intended to estimate telemetry opt-out rates.

We estimate that 93% of release channel profiles have telemetry enabled (and are therefore included in DAU).

Pingsender

Pingsender is a small application shipped with Firefox which attempts to send pings even if Firefox is not running. If Firefox has crashed or has already shut down we would otherwise have to wait for the next Firefox session to begin to send pings.

Pingsender was introduced in Firefox 54 to send "crash" pings. It was expanded to send "main" pings of reason "shutdown" in Firefox 55 (excepting the first session). It sends the "first-shutdown" ping since its introduction in Firefox 57.

Analysis

The large majority of analyses can be completed using only the main ping. This ping includes histograms, scalars, and other performance and diagnostic data.

Few analyses actually rely directly on any raw ping data. Instead, we provide derived datasets which are processed versions of these data, made to be:

  • Easier and faster to query
  • Organized to make the data easier to analyze
  • Cleaned of erroneous or misleading data

Before analyzing raw ping data, check to make sure there isn't already a derived dataset made for your purpose. If you do need to work with raw ping data, be aware that the volume of data can be high. Try to limit the size of your data by controlling the date range, and start off using a sample.

Accessing the Data

Ping data lives in BigQuery and is accessible in STMO; see the BigQuery cookbook section for more information.

Further Reading

You can find the complete ping documentation. To augment our data collection, see Collecting New Data and the Data Collection Policy.

Main Ping Derived Datasets

The main ping includes most of the measurements that track the performance and health of Firefox in the wild. This ping includes histograms, scalars, and events.

In its raw form, the main ping can be a bit difficult to work with. To make analyzing data easier, some datasets have been provided that simplify and aggregate information provided by the main ping.

Clients Daily

Many questions about Firefox take the form "What did clients with characteristics X, Y, and Z do during the period S to E?" The clients_daily table aims to answer these questions. Each row in the table is a (client_id, submission_date) and contains a number of aggregates about that day's activity.

See the clients_daily reference for more information.

Clients Last Seen

The clients_last_seen dataset is useful for efficiently determining exact user counts such as DAU and MAU. It can also allow efficient calculation of other windowed usage metrics like retention via its bit pattern fields. It includes the most recent values in a 28 day window for all columns in the clients_daily dataset.

See the clients_last_seen reference for more information.