Accessing Public Data
A public dataset is a dataset in BigQuery which is made available to the general public in BigQuery or through our public HTTP endpoint.
Table of Contents
- Accessing Public Data in BigQuery
- Accessing Public Data Through the Public HTTP Endpoint
- Let us know!
Accessing Public Data in BigQuery
To access public datasets in BigQuery, a Google Cloud Platform (GCP) account is required. GCP also offers a free tier which offers free credits to use and run queries in BigQuery. BigQuery sandbox enables users to use BigQuery for free without requiring payment information.
To get started, log into the BigQuery console or use the
BigQuery command line tools to create a new project.
After selecting the project, Mozilla's public datasets in the mozilla-public-data project can
be accessed and queried. For example:
SELECT *
FROM `mozilla-public-data.telemetry_derived.ssl_ratios_v1`
WHERE submission_date = "2020-04-16"
Accessing Public Data Through the Public HTTP Endpoint
Some BigQuery datasets are also published as gzipped JSON files through the public HTTP endpoint: https://public-data.telemetry.mozilla.org.
A list of available public datasets is available at: https://public-data.telemetry.mozilla.org/all-datasets.json This list contains the names of available datasets, additional metadata and links to the storage locations of the files containing the data.
For example:
{
  "telemetry_derived": {
    // ^ dataset name
    "deviations": {
      // ^ table name
      "v1": {
        // ^ table version
        "friendly_name": "Deviations",
        "description": "Deviation of different metrics from forecast.",
        "incremental": true,
        "incremental_export": false,
        "review_link": "https://bugzilla.mozilla.org/show_bug.cgi?id=1624528",
        "files_uri": "https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/deviations/v1/files",
        "last_updated": "https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/deviations/v1/last_updated"
      }
    },
    "ssl_ratios": {
      "v1": {
        "friendly_name": "SSL Ratios",
        "description": "Percentages of page loads Firefox users have performed that were  conducted over SSL broken down by country.",
        "incremental": true,
        "incremental_export": false,
        "review_link": "https://bugzilla.mozilla.org/show_bug.cgi?id=1414839",
        "files_uri": "https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/ssl_ratios/v1/files",
        "last_updated": "https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/ssl_ratios/v1/last_updated"
      }
    }
    // [...]
  }
}
The keys within each dataset have the following meanings:
- incremental:- true: data gets incrementally updated which means that new data gets added periodically (for most datasets on a daily basis)
- false: the entire table data gets updated periodically
 
- incremental_export:- true: data for each- submission_dategets exported into separate directories (e.g.- files/2020-04-15,- files/2020-04-16, ...)
- false: all data gets exported into one- files/directory
 
- review_link: links to the Bugzilla bug for the data review
- files_uri: lists links to all available data files
- last_updated: link to a- last_updatedfile containing the timestamp for when the data files were last updated
Data files are gzipped and up to 1 GB in size. If the data exceeds 1 GB, then it gets split up into multiple
files named 000000000000.json, 000000000001.json, ...
For example: https://public-data.telemetry.mozilla.org/api/v1/tables/telemetry_derived/ssl_ratios/v1/files/000000000000.json
Let us know!
If this public data has proved useful to your research, or you've built a cool visualization with it, let us know! You can email publicdata@mozilla.com or reach us on the #telemetry:mozilla.org channel on Mozilla's instance of matrix.