Experiment monitoring datasets

Experiment monitoring datasets are designed to power dashboards, such as Experiment Enrollment Grafana dashboard, for monitoring experiments in real time. Currently, datasets for monitoring the number or enrollments and number of searches performed by clients enrolled in experiments are available.

Experiment enrollment data

moz-fx-data-shared-prod.telemetry_derived.experiment_enrollment_aggregates_live provides enrollment, unenrollment, graduate, update and failure aggregates for experiments and branches over 5-minute intervals. This live view is also the basis of several derived tables:

Dataset nameDescription
mozdata.telemetry.experiment_unenrollment_overallOverall number of clients that unenrolled from experiments
mozdata.telemetry.experiment_enrollment_other_events_overallNumber of events other than enroll and unenroll sent by clients
mozdata.telemetry.experiment_enrollment_cumulative_population_estimateCumulative number of clients enrolled in experiments
mozdata.telemetry.experiment_enrollment_overallOverall number of clients enrolled in experiments
mozdata.telemetry.experiment_enrollment_daily_active_populationNumber of daily active clients enrolled in experiments

Experiment search metrics data

moz-fx-data-shared-prod.telemetry_derived.experiment_search_aggregates_live_v1 provides aggregated search metrics of clients enrolled in experiments, such as the number of searches performed, the number of searches with ads and the number of ad clicks. This live view is also the basis of several derived tables:

Dataset nameDescription
mozdata.telemetry.experiment_cumulative_ad_clicksCumulative number of ad clicks by clients enrolled in experiments
mozdata.telemetry.experiment_cumulative_search_countCumulative number of searches by clients enrolled in experiments
mozdata.telemetry.experiment_cumulative_search_with_ads_countCumulative number of searches with ads by clients enrolled in experiments

Derived tables

Derived tables all have the same schema:

Column nameTypeDescription
timeTIMESTAMPTimestamp when value was recorded
branchSTRINGExperiment branch
experimentSTRINGExperiment slug
valueINT64Aggregated value

As an example of how these derived tables can be used, the following query determines the number of cumulative clients enrolled in a the multi-stage-aboutwelcome-set-default-as-first-screen experiment to date in each branch of a study:

SELECT
    branch,
    SUM(value) AS total_enrolled
FROM `mozdata.telemetry.experiment_enrollment_cumulative_population_estimate`
WHERE experiment = 'multi-stage-aboutwelcome-set-default-as-first-screen'
GROUP BY 1
ORDER BY 2

GCS data export

As some dashboard solutions, such as the Experimenter console, might not have access to BigQuery, data from derived experiment monitoring tables is also exported as JSON to monitoring/ in the mozanalysis bucket in moz-fx-data-experiments. JSON files are named like: <experiment_slug>_<monitoring_dataset_name>.json, for example: gs://mozanalysis/monitoring/bug-1683348-rollout-tab-modal-print-ui-roll-out-release-84-85_experiment_unenrollment_overall.json

A script for exporting this data is scheduled to run via Airflow every 5 minutes.

Scheduling

To keep cost low for populating the monitoring live tables, several jobs have been set up for each enrollments and search metrics monitoring live tables:

  • Hourly jobs that materialize data from the live tables from the past hour and write it to the hourly-partitioned telemetry_derived.experiment_enrollment_aggregates_hourly_v1 and telemetry_derived.experiment_search_aggregates_hourly_v1 tables. The jobs are scheduled with some lag (30 minutes) to account for BigQuery sink delays.
  • Daily jobs for updating telemetry_derived.experiment_enrollment_aggregates_v1 and telemetry_derived.experiment_search_aggregates_v1 to finalize numbers from the stable tables.
  • Jobs scheduled to run every 5 minutes that dump experiment enrollment aggregates and experiment search metrics aggregates that are very recent and have not been processed by the hourly job yet into telemetry_derived.experiment_enrollment_aggregates_recents_v1 and telemetry_derived.experiment_search_aggregates_recents_v1.

The tables derived from the experiment monitoring live tables are also scheduled to run every 5 minutes together with the data export script.

Code reference

moz-fx-data-shared-prod.telemetry_derived.experiment_enrollment_aggregates_live and derived datasets are part of bigquery-etl:

moz-fx-data-shared-prod.telemetry_derived.experiment_search_aggregates_live_v1 and derived datasets are part of bigquery-etl: