Projects
Below are a number of trailheads that lead into the projects and code that comprise the Firefox Data Platform.
Telemetry APIs
Name and repo | Description |
---|---|
python_moztelemetry | Python APIs for Mozilla Telemetry |
moztelemetry | Scala APIs for Mozilla Telemetry |
spark-hyperloglog | Algebird's HyperLogLog support for Apache Spark |
mozanalysis | A library for Mozilla experiments analysis |
glean | A client-side mobile Telemetry SDK for collecting metrics and sending them to Mozilla's Telemetry service |
ETL code and Datasets
Name and repo | Description |
---|---|
bigquery-etl | SQL ETL code for building derived datasets in BigQuery |
telemetry-batch-view | Scala ETL code for derived datasets |
python_mozetl | Python ETL code for derived datasets |
telemetry-airflow | Airflow configuration and DAGs for scheduled jobs |
python_mozaggregator | Aggregation job for telemetry.mozilla.org aggregates |
telemetry-streaming | Spark Streaming ETL jobs for Mozilla Telemetry |
See also data-docs
for documentation on datasets.
Infrastructure
Name and repo | Description |
---|---|
mozilla-pipeline-schemas | JSON and Parquet Schemas for Mozilla Telemetry and other structured data |
gcp-ingestion | Documentation and implementation of the Mozilla telemetry ingestion system on Google Cloud Platform |
jsonschema-transpiler | Convert JSON Schema into BigQuery table definitions |
mozilla-schema-generator | Incorporate probe metadata to generate BigQuery table schemas |
hindsight | Real-time data processing |
lua_sandbox | Generic sandbox for safe data analysis |
lua_sandbox_extensions | Modules and packages that extend the Lua sandbox |
nginx_moz_ingest | Nginx module for Telemetry data ingestion |
puppet-config | Cloud services puppet config for deploying infrastructure |
parquet2hive | Hive import statement generator for Parquet datasets |
edge-validator | A service endpoint for validating incoming data |
Data applications
Name and repo | Description |
---|---|
telemetry.mozilla.org | Main entry point for viewing aggregate Telemetry data |
Glean Aggregate Metrics | Aggregate info about probes and measures |
Glean Debug View | Tag and view Glean submissions with low latency |
Redash | Mozilla's fork of the data query / visualization system |
redash-stmo | Mozilla's extensions to Redash |
TAAR | Telemetry-aware addon recommender |
Ensemble | A minimalist platform for publishing data |
Hardware Report | Firefox Hardware Report, available here |
St. Mocli | A command-line interface to STMO |
probe-scraper | Scrape and publish Telemetry probe data from Firefox |
test-tube | Compare data across branches in experiments |
experimenter | A web application for managing experiments |
St. Moab | Automatically generate Redash dashboard for A/B experiments |
Legacy projects
Projects in this section are less active, but may not be officially
deprecated. Please check with the fx-data-dev
mailing list before
starting a new project using anything in this section.
Name and repo | Description |
---|---|
telemetry-next-node | A node.js package for accessing Telemetry Aggregates data |
emr-bootstrap-spark | AWS bootstrap scripts for Spark. |
emr-bootstrap-presto | AWS bootstrap scripts for Presto. |
Reference materials
Public
Name and repo | Description |
---|---|
data-docs | All the info you need to answer questions about Firefox users with data |
Firefox source docs | Mozilla Source Tree Docs - Telemetry section |
mozilla.report | Knowledge repository for public reports (archived) |
Non-public
Name and repo | Description |
---|