Glean Data
The following describes in detail how we structure Glean data in BigQuery. For information on the actual software which does this, see the Generated Schemas reference. This document intended as a reference, if you want a tutorial on how best to access Glean Data in BigQuery, see Accessing Glean Data.
Tables
Each ping type is recorded in its own table, and these tables are named using {application_id}.{ping_type}
.
For example, for Fenix, the application id is org.mozilla.fenix
, so its metrics
pings are available in the table org_mozilla_fenix.metrics
.
Columns
Fields are nested inside BigQuery STRUCTs to organize them into groups, and we can use dot notation to specify individual subfields in a query.
For example, columns containing Glean's built-in client information are in the client_info
struct, so accessing its columns involves using a client_info.
prefix.
The top-level groups are:
client_info
: Client information provided by Glean.ping_info
: Ping information provided by Glean.metrics
: Custom metrics defined by the application and its libraries.events
: Custom events defined by the application and its libraries.
Ping and Client Info sections
Core attributes sent with every ping are mapped to the client_info
and ping_info
sections.
For example, the client id is mapped to a column called client_info.client_id
.
The metrics
group
Custom metrics in the metrics
section have two additional levels of indirection in their column name: they are organized by the metric type, and then by their category: metrics.{metric_type}.{category}_{name}
.
For example, suppose you had the following boolean
metric defined in a metrics.yaml
file (abridged for clarity):
browser:
is_default:
type: boolean
description: >
Is this application the default browser?
send_in_pings:
- metrics
It would be available in the column metrics.boolean.browser_is_default
.
The events
group
Events are stored as a set of records in a single column called "events": there might be many events sent as part of a single ping. Each record has the following fields which allow you to filter for the specific metrics of interest:
- category (maps to the metric category)
- name (maps to the metric name)
For example, suppose you had the following event
metric defined in a metrics.yaml
file (again, abridged for clarity):
engine_tab:
foreground_metrics:
type: event
description: |
Event collecting data about the state of tabs when the app comes back to
the foreground.
extra_keys:
extra_keys:
background_active_tabs:
description: |
Number of active tabs (with an engine session assigned) when the app
went to the background.
...
In this case the event's category
would be engine_tab
and its name would be foreground_metrics
.
You can use the record's timestamp
and extra
fields to get the event's timestamp and specifics related
to the event.
For a complete example, see "event metrics" under Accessing Glean Data.