Analysis Gotchas

When performing analysis on any data there are some mistakes that are easy to make and details that are easy to overlook. Do you know exactly what question you hope to answer? Is your sample representative of your population? Is your result "real"? How precisely can you state your conclusion?

This document is not about those traps. Instead, it is about quirks and pitfalls specific to Telemetry.

Intermittent issues

Sometimes, despite our efforts, there are problems with the ingestion of data or the faithful creation of datasets.

Ongoing issues of this kind are marked with the [data-quality] whiteboard tag in Bugzilla. See currently open issues.

Especially severe problems with production data should be announced on our fx-data-dev mailing list (see getting help): please consider subscribing to it if you are a current or aspiring data practitioner.

Notable historic events

When looking at trends, it is helpful to be aware of events from the past that might impact comparisons with history. Here are a few to keep in mind:

  • December 4 2019 - AWS Ingestion Pipeline decommissioned. Specifically, the last ping relayed through the AWS machinery had a timestamp of 2019-12-04 22:04:45.912204 UTC.
  • October 29 2019 - Glean SDK Timing Distribution(s) are reporting buckets 1 nanosecond apart. This is due to a potential rounding bug in Glean SDK versions less than 19.0.0. See Bug 1591938.
  • October 23 2019 - Hot-fix shipped through add-ons that reset the Telemetry endpoint preference back to the default for a large number of users.
  • September 1 - October 18 2019 - BigQuery Ping tables are missing the X-PingSender-Version header information. This data is available before and after this time period.
  • May 4 - May 11 2019 - Telemetry source data deleted. No source data is available for this period and derived tables may have missing days or imputed values. Derived tables that depend on multiple days may have have affected dates beyond the deletion region.
  • January 31 2019 - Profile-per-install landed in mozilla-central and affects how new profiles are created. See discussion in bigquery-etl#212.
  • October 25 2018 - many client_ids on Firefox Android were reset to the same client_id. For more information see the blameless post-mortem document here or Bug 1501329.
  • November 2017 - Quantum Launch. There was a surge in new profiles and usage.
  • June 1 and 5, 2016 - Main Summary v4 data is missing for these two days.
  • March 2016 - Unified Telemetry launched.


Telemetry data is a collection of pings. A single main-ping represents a single subsession. Some clients have more subsessions than others.

So when you say "63% of beta 53 has Firefox set as its default browser", make sure you specify it is 63% of pings, since it is only around 46% of clients. (Apparently users with Firefox Beta 53 set as their default browser submit more main-pings than users who don't).

Profiles vs Users

In the section above you'll notice I said "clients" not "users." That is because of all the things we're able to count, users isn't one of them.

Users can have multiple Firefox profiles running on the same computer at the same time (like developers).

Users can have the same Firefox profile running on several computers on different days of the week (also developers).

The only things we can count are pings and clients. Clients we can count because we send a client_id with each ping that uniquely identifies the profile from which it came. This is generally close enough to our idea of "user" that we can get away with counting profiles and calling them users, but you may run into some instances where the distinction matters.

When in doubt, be precise. You're counting clients.

This article contains a more thorough treatment of the concept of "profiles".

Opt-in vs Opt-out

We don't collect the same information from everyone.

Every profile that doesn't have Telemetry disabled sends us "opt-out" Telemetry. This includes:

Most probes are "opt-in": we do not get information from them unless the user opts into sending us this information. Users can opt-in in two ways:

  1. Using Firefox's Options UI to tick the box that gives us permission
  2. Installing any pre-release version of Firefox

The nature of selection bias is such that the population in #1 is useless for analysis. If you want to encourage users to collect good information for us, ask them to install Beta: it's only slightly harder than finding and checking the opt-in check-box in Firefox.

Trusting Dates

Don't trust client times.

Any timestamp recorded by the user is subject to "clock skew." The user's clock can be set (purposefully or accidentally) to any time at all. The nature of SSL certificates tends to keep this within a certain relatively-accurate window, because a user who's clock is too far in the past or too far in the future might confuse certain expiration-date-checking code.

Examples of client times: crashDate, crashTime, meta/Date, sessionStartDate, subsessionStartDate, profile/creationDate

Examples of server times you can trust: submission_timestamp, submission_date

Note that submission_date does not appear in the ping documentation because it is added in post-processing. It can be found in the meta field of the ping as in the Databricks Example.

Date Formats

Not all dates and times are created equal. Most of the dates and times in Telemetry pings are ISO 8601. Most are full timestamps, though their resolution may differ from being per-second to being per-day.

Then there's profile/creationDate which is just a number of days since epoch (January 1, 1970). Like 17177 for the date 2017-01-11.

Tip: To convert profile/creationDate to a usable date in SQL: DATE_ADD('day', profile_created, DATE '1970-01-01')

In derived datasets ISO dates are sometimes converted to strings in one of two formats: %Y-%m-%d or %Y%m%d.

The date formats for different rows in main_summary are described on the main_summary reference page.

Build ids look like dates but aren't. If you take the first eight characters you can use that as a proxy for the day the build was released.

metadata/Date is an HTTP Date header in a RFC 7231-compatible format.

Tip: To parse metadata/Date to a usable date in SQL: DATE_PARSE(SUBSTR(client_submission_date, 1, 25), '%a, %d %b %Y %H:%i:%s')


Telemetry data takes a while to get into our hands. The largest data mule in Telemetry is the main-ping. It is (pending Bug 1336360) sent at the beginning of a client's next Firefox session. If the user shuts down their Firefox for the weekend, we won't get their Friday data until Monday morning.

A rule of thumb is data from two days ago is usually fairly representative.

If you'd like to read more about this subject and look at pretty graphs, there are a series of blog posts here, here and here.


Pingsender greatly reduces delay in sending pings to Mozilla, but only some types of pings are sent by Pingsender. Bug 1310703 introduced Pingsender for crash pings and was merged in Firefox 54, which hit release on June 13, 2017. Bug 1336360 moved shutdown pings to Pingsender and was merged in Firefox 55, which hit release on August 8, 2017. Bug 1374270 added sending health pings on shutdown via Pingsender and was merged in Firefox 56, which hit release on Sept 28, 2017. Other types of pings are not sent with Pingsender. This is usually okay because Firefox is expected to continue running long enough to send those pings.

Mobile clients do not have Pingsender, so they suffer delay as given in this query.

Submission Date

submission_date is the server time at which a ping is received from the client. We use it to partition many of our data sets.

In bug 1422892 we decided to standardize on submission_date.


  • not subject to client clock skew
  • doesn't require normalization
  • good for backfill
  • good for daily processing
  • and usually good enough