Telemetry Aggregates Reference
Introduction
The telemetry_aggregates
dataset is a daily aggregation of the pings,
aggregating the histograms across a set of dimensions.
Rows and Columns
There's one column for each of the dimensions and the histogram and each row is a distinct set of dimensions, along with their associated histograms.
Accessing the Data
This dataset is accessible via STMO by selecting from telemetry_aggregates
.
The data is stored as a parquet table in S3 at the following address.
s3://telemetry-parquet/aggregates_poc/v1/
Data Reference
Example Queries
Here's an example query that shows the number of pings received per
submission_date
for the dimensions provided.
SELECT
submission_date,
SUM(count) AS pings
FROM
telemetry_aggregates
WHERE
channel = 'nightly'
AND metric = 'GC_MS'
AND aggregate_type = 'build_id'
AND period = '201901'
GROUP BY
submission_date
ORDER BY
submission_date
;
Sampling
Invalid Pings
We ignore invalid pings in our processing. Invalid pings are defined as those that:
- The submission dates are invalid or missing.
- The build ID is malformed.
- The
docType
field is missing or unknown. - The build ID is older than a defined cutoff days.
(See the
BUILD_ID_CUTOFFS
variable in the code for the max days per channel)
Scheduling
The telemetry_aggregates
job is run daily, at midnight UTC.
The job is scheduled on Airflow.
The DAG is here
Schema
The telemetry_aggregates
table has a set of dimensions and set of
aggregates for those dimensions.
The partitioned dimensions are the following columns. Filtering by one of these fields to limit the resulting number of rows can run significantly faster:
metric
is the name of the metric, like"GC_MS"
.aggregate_type
is the type of aggregation, either"build_id"
or"submission_date"
, representing how this aggregation was grouped.period
is a string representing the month inYYYYMM
format that a ping was submitted, like'201901'
.
The rest of the dimensions are:
submission_date
is the date pings were submitted for a particular aggregate.channel
is the channel, likerelease
orbeta
.version
is the program version, like46.0a1
.build_id
is theYYYYMMDDhhmmss
timestamp the program was built, like20190123192837
.application
is the program name, likeFirefox
orFennec
.architecture
is the architecture that the program was built for (not necessarily the one it is running on).os
is the name of the OS the program is running on, likeDarwin
orWindows_NT
.os_version
is the version of the OS the program is running on.key
is the key of a keyed metric. This will be empty if the underlying metric is not a keyed metric.process_type
is the process the histogram was recorded in, likecontent
orparent
.
The aggregates are:
count
is the aggregate sum of the number of pings per dimensions.sum
is the aggregate sum of the histogram values per dimensions.histogram
is the aggregated histogram per dimensions.