The following describes in detail how we structure Glean data in BigQuery. For information on the actual software which does this, see the Generated Schemas reference.
Each ping type is recorded in its own table, and these tables are named using
For example, for Fenix, the application id is
org.mozilla.fenix, so its
metrics pings are available in the table
Fields are nested inside BigQuery STRUCTs to organize them into groups, and we can use dot notation to specify individual subfields in a query.
For example, columns containing Glean's built-in client information are in the
client_info struct, so accessing its columns involves using a
The top-level groups are:
client_info: Client information provided by Glean.
ping_info: Ping information provided by Glean.
metrics: Custom metrics defined by the application and its libraries.
events: Custom events defined by the application and its libraries.
Custom metrics in the
metrics section have two additional levels of indirection in their column name: they are organized by the metric type, and then by their category:
For example, suppose you had the following
boolean metric defined in a
metrics.yaml file (abridged for clarity):
browser: is_default: type: boolean description: > Is this application the default browser? send_in_pings: - metrics
It would be available in the column
-- Count number of pings where Fenix is the default browser SELECT COUNT(*), COUNTIF(metrics.boolean.browser_is_default) FROM -- We give the table an alias so that the table name `metrics` and field name -- `metrics` don't conflict. org_mozilla_fenix.metrics AS m WHERE date(submission_timestamp) = '2019-11-11'
Custom events in the
events section have a different structure.
Documentation TBD. See bug 1606836