Scheduling Queries

Want to refresh a dashboard or other query against our data store automatically? There's a few options available, depending on what you want to do.

sql.telemetry.mozilla.org
bigquery-etl
- Testing changes before deploying
Scheduling Queries using GCP

sql.telemetry.mozilla.org

This is by far the easiest option. Any query on STMO can be made to refresh itself automatically. This can be used to power both dashboards inside sql.telemetry.mozilla.org as well as web sites like the Tab Spinner Dashboard and Numbers that Matter via Redash's CSV and JSON APIs.

However, there are disadvantages to this approach:

Every time it is refreshed, STMO re-runs the query from scratch. This is fine for smaller or less frequently updated dashboards but when querying a large amount of data, this can get expensive.
Aside from hand-rolled JavaScript or Python scripts, there isn't a way to run a query over the results of an STMO query.
Queries on STMO are not peer reviewed and are not supported by Data SRE (read: no one will be notified if your query breaks).

bigquery-etl

Mozilla's Data Engineering maintains a repository called bigquery-etl which can incrementally create datasets based on an SQL query using our Airflow infrastructure. This requires a little bit more work to set up than scheduling a query on STMO, but has the advantage of being more cost effective, reliable, and amenable to further exploration.

Also, since the procedure for scheduling a query in this way is submitting a pull request against the bigquery-etl repository, this is an easy route to getting peer review from the extended Data Team.

For more information on how to use this approach, see A quick guide to creating a derived dataset with bigquery-etl.

Testing changes before deploying

bigquery-etl can run a query against real data in a private development environment before a pull request is opened, so a new or changed query can be validated — confirming it runs without errors and returns sensible numbers — without affecting the production tables that power live dashboards and metrics. See Development workflows.