Projects
Below are a number of trailheads that lead into the projects and code that comprise the Firefox Data Platform.
Telemetry APIs
| Name and repo | Description |
|---|---|
python_moztelemetry | Python APIs for Mozilla Telemetry |
moztelemetry | Scala APIs for Mozilla Telemetry |
spark-hyperloglog | Algebird's HyperLogLog support for Apache Spark |
mozanalysis | A library for Mozilla experiments analysis |
glean | A client-side mobile Telemetry SDK for collecting metrics and sending them to Mozilla's Telemetry service |
ETL code and Datasets
| Name and repo | Description |
|---|---|
bigquery-etl | SQL ETL code for building derived datasets in BigQuery |
telemetry-batch-view | Scala ETL code for derived datasets |
python_mozetl | Python ETL code for derived datasets |
telemetry-airflow | Airflow configuration and DAGs for scheduled jobs |
python_mozaggregator | Aggregation job for telemetry.mozilla.org aggregates |
telemetry-streaming | Spark Streaming ETL jobs for Mozilla Telemetry |
See also data-docs for documentation on datasets.
Infrastructure
| Name and repo | Description |
|---|---|
mozilla-pipeline-schemas | JSON and Parquet Schemas for Mozilla Telemetry and other structured data |
gcp-ingestion | Documentation and implementation of the Mozilla telemetry ingestion system on Google Cloud Platform |
jsonschema-transpiler | Convert JSON Schema into BigQuery table definitions |
mozilla-schema-generator | Incorporate probe metadata to generate BigQuery table schemas |
hindsight | Real-time data processing |
lua_sandbox | Generic sandbox for safe data analysis |
lua_sandbox_extensions | Modules and packages that extend the Lua sandbox |
nginx_moz_ingest | Nginx module for Telemetry data ingestion |
puppet-config | Cloud services puppet config for deploying infrastructure |
parquet2hive | Hive import statement generator for Parquet datasets |
edge-validator | A service endpoint for validating incoming data |
Data applications
| Name and repo | Description |
|---|---|
telemetry.mozilla.org | Main entry point for viewing aggregate Telemetry data |
| Glean Aggregate Metrics | Aggregate info about probes and measures |
| Glean Debug View | Tag and view Glean submissions with low latency |
| Redash | Mozilla's fork of the data query / visualization system |
redash-stmo | Mozilla's extensions to Redash |
| TAAR | Telemetry-aware addon recommender |
| Ensemble | A minimalist platform for publishing data |
| Hardware Report | Firefox Hardware Report, available here |
| St. Mocli | A command-line interface to STMO |
| probe-scraper | Scrape and publish Telemetry probe data from Firefox |
| experimenter | A web application for managing experiments |
| Jetstream | Automated analysis for experiments |
| metric-hub | Semantic layer for metric definitions |
See also What Data Tool Should I Use? for more information on Data Tools and their uses.
Legacy projects
Projects in this section are less active, but may not be officially
deprecated. Please check with the fx-data-dev mailing list before
starting a new project using anything in this section.
| Name and repo | Description |
|---|---|
telemetry-next-node | A node.js package for accessing Telemetry Aggregates data |
emr-bootstrap-spark | AWS bootstrap scripts for Spark. |
emr-bootstrap-presto | AWS bootstrap scripts for Presto. |
Reference materials
Public
| Name and repo | Description |
|---|---|
data-docs | All the info you need to answer questions about Firefox users with data |
| Firefox source docs | Mozilla Source Tree Docs - Telemetry section |
mozilla.report | Knowledge repository for public reports (archived) |
Non-public
| Name and repo | Description |
|---|