- Data Reference
telemetry_aggregates dataset is a daily aggregation of the pings,
aggregating the histograms across a set of dimensions.
There's one column for each of the dimensions and the histogram and each row is a distinct set of dimensions, along with their associated histograms.
This dataset is accessible via STMO by selecting from
The data is stored as a parquet table in S3 at the following address.
Here's an example query that shows the number of pings received per
submission_date for the dimensions provided.
SELECT submission_date, SUM(count) AS pings FROM telemetry_aggregates WHERE channel = 'nightly' AND metric = 'GC_MS' AND aggregate_type = 'build_id' AND period = '201901' GROUP BY submission_date ORDER BY submission_date ;
We ignore invalid pings in our processing. Invalid pings are defined as those that:
- The submission dates are invalid or missing.
- The build ID is malformed.
docTypefield is missing or unknown.
- The build ID is older than a defined cutoff days.
BUILD_ID_CUTOFFSvariable in the code for the max days per channel)
telemetry_aggregates table has a set of dimensions and set of
aggregates for those dimensions.
The partitioned dimensions are the following columns. Filtering by one of these fields to limit the resulting number of rows can run significantly faster:
metricis the name of the metric, like
aggregate_typeis the type of aggregation, either
"submission_date", representing how this aggregation was grouped.
periodis a string representing the month in
YYYYMMformat that a ping was submitted, like
The rest of the dimensions are:
submission_dateis the date pings were submitted for a particular aggregate.
channelis the channel, like
versionis the program version, like
YYYYMMDDhhmmsstimestamp the program was built, like
applicationis the program name, like
architectureis the architecture that the program was built for (not necessarily the one it is running on).
osis the name of the OS the program is running on, like
os_versionis the version of the OS the program is running on.
keyis the key of a keyed metric. This will be empty if the underlying metric is not a keyed metric.
process_typeis the process the histogram was recorded in, like
The aggregates are:
countis the aggregate sum of the number of pings per dimensions.
sumis the aggregate sum of the histogram values per dimensions.
histogramis the aggregated histogram per dimensions.