This document will help you find the best data source for a given analysis. It focuses on descriptive datasets and does not cover anything attempting to explain why something is observed. This guide will help if you need to answer questions like:
- How many Firefox users are active in Germany?
- How many crashes occur each day?
- How many users have installed a specific add-on?
If you want to know whether a causal link occurs between two events, you can learn more at tools for experimentation.
There are two types of datasets that you might want to use: those based on raw pings and those derived from them.
We receive data from Firefox users via pings: small JSON payloads sent by clients at specified intervals. There are many types of pings, each containing different measurements and sent for different purposes.
These pings are then aggregated into ping-level datasets that can be retrieved using BigQuery. Pings can be difficult to work with and expensive to query: where possible, you should use a derived dataset to answer your question.
For more information on pings and how to use them, see Raw Ping Data.
Derived datasets are built using the raw ping data above with various transformations to make them easier to work with and help you avoid the pitfall of pseudo-replication. You can find a full list of them in the derived datasets section, but two commonly used ones are "Clients Daily" and "Clients Last Seen".
Many questions about Firefox take the form "What did clients with
characteristics X, Y, and Z do during the period S to E?" The
clients_daily table aims to answer these questions. Each row in
the table is a (
submission_date) and contains a
number of aggregates about that day's activity.
clients_daily reference for more information.
clients_last_seen dataset is useful for efficiently determining exact
user counts such as DAU and MAU.
It can also allow efficient calculation of other windowed usage metrics like retention via its
bit pattern fields.
It includes the most recent values in a 28 day window for all columns in the
clients_last_seen reference for more information.