Events
The telemetry.events
and telemetry.events_1pct
derived datasets
make it easier to analyze the desktop Firefox event ping.
It has the following advantages over accessing the raw ping table (telemetry.event
):
- There is no need to
UNNEST
theevents
column: this is already done for you. - You don't have to know which process type emitted your event. If you care, you can query the
event_process
column. - It is clustered on the
event_category
column, which can dramatically speed up your query.
Data Reference
The events
dataset contains one row for each event submitted in an event ping for that day.
The timestamp
, category
, method
, object
, value
, and extra
fields of the event
are mapped to columns named event_timestamp
, event_category
, event_method
, event_object
,
event_string_value
, and event_map_values
.
To access the event_map_values
, you can use the mozfun.map.get_key
UDF,
like SELECT mozfun.map.get_key(event_map_values, "branch") AS branch FROM telemetry.events
.
Please note that event_timestamp
refers to the time in milliseconds when the event was recorded relative to the main process start time (session_start_time
), while the timestamp
column refers to the time the ping was ingested. event_timestamp
is useful for determining relative order of events within a single session. Adding event_timestamp
to session_start_time
will allow you to approximate the absolute time an event occurred, subject to client clock skew and other factors.
Sample of events: telemetry.events_1pct
The telemetry.events_1pct
table is a consistent 1% sample from telemetry.events
(sample_id = 0
) that includes 6 months of history. Using the sampled table can
be faster than hitting telemetry.events
, particularly when iterating on a prototype
query.
BigQuery is also better able to estimate the amount of data it will scan when
querying events_1pct
, so queries on events_1pct
may be able to succeed where
the equivalent query on events
with a sample_id = 0
filter would be rejected
due to the query appearing to scan many TB of data.
Example Query
This query gets the count of the number of times the user initiated the dismiss_breach_alert
and learn_more_breach
actions. Note the use of the event_category
to optimize the query:
for this example, this reduces the amount of data scanned from 450 GB to 52 MB.
SELECT countif(event_method = 'dismiss_breach_alert') AS n_dismissing_breach_alert,
countif(event_method = 'learn_more_breach') AS n_learn_more
FROM mozdata.telemetry.events
WHERE event_category = 'pwmgr'
AND submission_date='2020-04-20'
AND sample_id=0
Scheduling
The events dataset is updated daily.
The job is scheduled on Airflow.
The DAG is defined in dags/copy_deduplicate.py
.
Code Reference
This dataset is generated by BigQuery ETL. The query that generates the dataset is sql/moz-fx-data-shared-prod/telemetry_derived/event_events_v1/query.sql
.
More Information
Firefox has an API to record events, which are then submitted through the event
ping.
The format and mechanism of event collection in Firefox is documented in the Firefox source documentation.
The full events data pipeline is documented in the event pipeline documentation.