Telemetry Aggregates Reference
Introduction
The telemetry_aggregates dataset is a daily aggregation of the pings,
aggregating the histograms across a set of dimensions.
Rows and Columns
There's one column for each of the dimensions and the histogram and each row is a distinct set of dimensions, along with their associated histograms.
Accessing the Data
This dataset is accessible via STMO by selecting from telemetry_aggregates.
The data is stored as a parquet table in S3 at the following address.
s3://telemetry-parquet/aggregates_poc/v1/
Data Reference
Example Queries
Here's an example query that shows the number of pings received per
submission_date for the dimensions provided.
SELECT
submission_date,
SUM(count) AS pings
FROM
telemetry_aggregates
WHERE
channel = 'nightly'
AND metric = 'GC_MS'
AND aggregate_type = 'build_id'
AND period = '201901'
GROUP BY
submission_date
ORDER BY
submission_date
;
Sampling
Invalid Pings
We ignore invalid pings in our processing. Invalid pings are defined as those that:
- The submission dates are invalid or missing.
- The build ID is malformed.
- The
docTypefield is missing or unknown. - The build ID is older than a defined cutoff days.
(See the
BUILD_ID_CUTOFFSvariable in the code for the max days per channel)
Scheduling
The telemetry_aggregates job is run daily, at midnight UTC.
The job is scheduled on Airflow.
The DAG is here
Schema
The telemetry_aggregates table has a set of dimensions and set of
aggregates for those dimensions.
The partitioned dimensions are the following columns. Filtering by one of these fields to limit the resulting number of rows can run significantly faster:
metricis the name of the metric, like"GC_MS".aggregate_typeis the type of aggregation, either"build_id"or"submission_date", representing how this aggregation was grouped.periodis a string representing the month inYYYYMMformat that a ping was submitted, like'201901'.
The rest of the dimensions are:
submission_dateis the date pings were submitted for a particular aggregate.channelis the channel, likereleaseorbeta.versionis the program version, like46.0a1.build_idis theYYYYMMDDhhmmsstimestamp the program was built, like20190123192837.applicationis the program name, likeFirefoxorFennec.architectureis the architecture that the program was built for (not necessarily the one it is running on).osis the name of the OS the program is running on, likeDarwinorWindows_NT.os_versionis the version of the OS the program is running on.keyis the key of a keyed metric. This will be empty if the underlying metric is not a keyed metric.process_typeis the process the histogram was recorded in, likecontentorparent.
The aggregates are:
countis the aggregate sum of the number of pings per dimensions.sumis the aggregate sum of the histogram values per dimensions.histogramis the aggregated histogram per dimensions.