Accessing Desktop Data
This document will help you find the best data source for a given analysis of Desktop Firefox. It focuses on descriptive datasets and does not cover anything attempting to explain why something is observed. This guide will help if you need to answer questions like:
- How many Firefox users are active in Germany?
- How many crashes occur each day?
- How many users have installed a specific add-on?
If you want to know whether a causal link occurs between two events, consider running an experiment.
There are two types of datasets that you might want to use: those based on raw pings and those derived from them.
Raw Ping Datasets
We receive data from Firefox users via pings: small JSON payloads sent by clients at specified intervals. There are many types of pings, each containing different measurements and sent for different purposes.
These pings are then aggregated into ping-level datasets that can be retrieved using BigQuery. Pings can be difficult to work with and expensive to query: where possible, you should use a derived dataset to answer your question.
For more information on pings and how to use them, see Raw Ping Data.
Derived Datasets
Derived datasets are built using the raw ping data above with various transformations to make them easier to work with and help you avoid the pitfall of pseudo-replication. You can find a full list of them in the derived datasets section, but two commonly used ones are "Clients Daily" and "Clients Last Seen".
Clients Daily
Many questions about Firefox take the form "What did clients with
characteristics X, Y, and Z do during the period S to E?" The
clients_daily
table aims to answer these questions. Each row in
the table is a (client_id
, submission_date
) and contains a
number of aggregates about that day's activity.
See the clients_daily
reference for more information.
Clients Last Seen
The clients_last_seen
dataset is useful for efficiently determining exact
user counts such as DAU and MAU.
It can also allow efficient calculation of other windowed usage metrics like retention via its
bit pattern fields.
It includes the most recent values in a 28 day window for all columns in the
clients_daily
dataset.
See the clients_last_seen
reference for more information.