Main Summary (deprecated)
⚠ Since the introduction of BigQuery, we are able to represent the full
mainping structure in a table, available astelemetry.main. As such,main_summarywas discontinued as of 2023-10-05.
The main_summary table contains one row for each ping.
Each column represents one field from the main ping payload,
though only a subset of all main ping fields are included.
This dataset does not include most histograms.
This table is massive, and due to its size, it can be difficult to work with.
Instead, we recommend using the clients_daily or clients_last_seen dataset
where possible.
If you do need to query this table, make use of the sample_id field and
limit to a short submission date range.
Table of Contents
Accessing the Data
The main_summary table is accessible through STMO.
See STMO#4201 for an example.
Data Reference
Example Queries
Compare the search volume for different search source values:
WITH search_data AS (
SELECT
s.source AS search_source,
s.count AS search_count
FROM
telemetry.main_summary
CROSS JOIN UNNEST(search_counts) AS s
WHERE
submission_date_s3 = '2019-11-11'
AND sample_id = 42
AND search_counts IS NOT NULL
)
SELECT
search_source,
sum(search_count) as total_searches
FROM search_data
GROUP BY search_source
ORDER BY sum(search_count) DESC
Sampling
The main_summary dataset contains one record for each main ping
as long as the record contains a non-null value for
documentId, submissionDate, and Timestamp.
We do not ever expect nulls for these fields.
Scheduling
This dataset is updated daily via the telemetry-airflow infrastructure.
The DAG is defined in
dags/bqetl_main_summary.py
Schema
As of 2019-11-28, the current version of the main_summary dataset is v4.
For more detail on where these fields come from in the
raw data,
please look in the main_summary ETL code.
Most of the fields are simple scalar values, with a few notable exceptions:
- The
search_countfield is an array of structs, each item in the array representing a 3-tuple of (engine,source,count). Theenginefield represents the name of the search engine against which the searches were done. Thesourcefield represents the part of the Firefox UI that was used to perform the search. It contains values such asabouthome,urlbar, andsearchbar. Thecountfield contains the number of searches performed against this engine+source combination during that subsession. Any of the fields in the struct may be null (for example if the search key did not match the expected pattern, or if the count was non-numeric). - The
loop_activity_counterfield is a simple struct containing inner fields for each expected value of theLOOP_ACTIVITY_COUNTEREnumerated Histogram. Each inner field is a count for that histogram bucket. - The
popup_notification_statsfield is a map ofStringkeys to struct values, each field in the struct being a count for the expected values of thePOPUP_NOTIFICATION_STATSKeyed Enumerated Histogram. - The
places_bookmarks_countandplaces_pages_countfields contain the mean value of the corresponding Histogram, which can be interpreted as the average number of bookmarks or pages in a given subsession. - The
active_addonsfield contains an array of structs, one for each entry in theenvironment.addons.activeAddonssection of the payload. More detail in Bug 1290181. - The
disabled_addons_idsfield contains an array of strings, one for each entry in thepayload.addonDetailswhich is not already reported in theenvironment.addons.activeAddonssection of the payload. More detail in Bug 1390814. Please note that while using this field is generally OK, this was introduced to support the TAAR project and you should not count on it in the future. The field can stay in themain_summary, but we might need to slightly change the ping structure to something better thanpayload.addonDetails. - The
themefield contains a single struct in the same shape as the items in theactive_addonsarray. It contains information about the currently active browser theme. - The
user_prefsfield contains a struct with values for preferences of interest. - The
eventsfield contains an array of event structs. - Dynamically-included histogram fields are present as key->value maps, or key->(key->value) nested maps for keyed histograms.
Time formats
Columns in main_summary may use one of a handful of time formats with different precisions:
| Column Name | Origin | Description | Example | Spark | Presto |
|---|---|---|---|---|---|
timestamp | stamped at ingestion | nanoseconds since epoch | 1504689165972861952 | from_unixtime(timestamp/1e9) | from_unixtime(timestamp/1e9) |
submission_date_s3 | derived from timestamp | YYYYMMDD date string of timestamp in UTC | 20170906 | from_unixtime(unix_timestamp(submission_date, 'yyyyMMdd')) | date_parse(submission_date, '%Y%m%d') |
client_submission_date | derived from HTTP header: Fields[Date] | HTTP date header string sent with the ping | Tue, 27 Sep 2016 16:28:23 GMT | unix_timestamp(client_submission_date, 'EEE, dd M yyyy HH:mm:ss zzz') | date_parse(substr(client_submission_date, 1, 25), '%a, %d %b %Y %H:%i:%s') |
creation_date | creationDate | time of ping creation ISO8601 at UTC+0 | 2017-09-06T08:21:36.002Z | to_timestamp(creation_date, "yyyy-MM-dd'T'HH:mm:ss.SSSXXX") | from_iso8601_timestamp(creation_date) AT TIME ZONE 'GMT' |
timezone_offset | info.timezoneOffset | timezone offset in minutes | 120 | ||
subsession_start_date | info.subsessionStartDate | hourly precision, ISO8601 date in local time | 2017-09-06T00:00:00.0+02:00 | from_iso8601_timestamp(subsession_start_date) AT TIME ZONE 'GMT' | |
subsession_length | info.subsessionLength | subsession length in seconds | 599 | date_add('second', subsession_length, subsession_start_date) | |
profile_creation_date | environment.profile.creationDate | days since epoch | 15,755 | from_unixtime(profile_creation_date * 86400) |
User Preferences
These are added in the Main Summary ETL code. They must be available in the ping environment to be included here.
Once added, they will show as top-level fields, with the string user_pref prepended.
For example, dom.ipc.processCount becomes user_pref_dom_ipc_processcount.
Code Reference
This dataset is generated by bigquery-etl. Refer to this repository for information on how to run or augment the dataset.