Below are a number of trailheads that lead into the projects and code that comprise the Firefox Data Platform.

Telemetry APIs

Name and repoDescription
python_moztelemetryPython APIs for Mozilla Telemetry
moztelemetryScala APIs for Mozilla Telemetry
spark-hyperloglogAlgebird's HyperLogLog support for Apache Spark
mozanalysisA library for Mozilla experiments analysis
gleanA client-side mobile Telemetry SDK for collecting metrics and sending them to Mozilla's Telemetry service

ETL code and Datasets

Name and repoDescription
bigquery-etlSQL ETL code for building derived datasets in BigQuery
telemetry-batch-viewScala ETL code for derived datasets
python_mozetlPython ETL code for derived datasets
telemetry-airflowAirflow configuration and DAGs for scheduled jobs
python_mozaggregatorAggregation job for aggregates
telemetry-streamingSpark Streaming ETL jobs for Mozilla Telemetry

See also firefox-data-docs for documentation on datasets.


Name and repoDescription
mozilla-pipeline-schemasJSON and Parquet Schemas for Mozilla Telemetry and other structured data
gcp-ingestionDocumentation and implementation of the Mozilla telemetry ingestion system on Google Cloud Platform
jsonschema-transpilerConvert JSON Schema into BigQuery table definitions
mozilla-schema-generatorIncorporate probe metadata to generate BigQuery table schemas
hindsightReal-time data processing
lua_sandboxGeneric sandbox for safe data analysis
lua_sandbox_extensionsModules and packages that extend the Lua sandbox
nginx_moz_ingestNginx module for Telemetry data ingestion
puppet-configCloud services puppet config for deploying infrastructure
parquet2hiveHive import statement generator for Parquet datasets
edge-validatorA service endpoint for validating incoming data

Data applications

Name and repoDescription
telemetry.mozilla.orgMain entry point for viewing aggregate Telemetry data
Growth & Usage dashboardDashboard for questions about product growth and usage)
Glean Aggregate MetricsAggregate info about probes and measures
Glean Debug ViewTag and view Glean submissions with low latency
Cerberus & MedusaAutomatic alert system for telemetry aggregates
Mission ControlLow latency dashboard for stability and health metrics
RedashMozilla's fork of the data query / visualization system
redash-stmoMozilla's extensions to Redash
TAARTelemetry-aware addon recommender
EnsembleA minimalist platform for publishing data
Hardware ReportFirefox Hardware Report, available here
python-zeppelinConvert Zeppelin notebooks to Markdown
St. MocliA command-line interface to STMO
probe-scraperScrape and publish Telemetry probe data from Firefox
test-tubeCompare data across branches in experiments
experimenterA web application for managing experiments
St. MoabAutomatically generate Redash dashboard for A/B experiments
Iodide (code)Literate scientific computing and communication for the web

Legacy projects

Projects in this section are less active, but may not be officially deprecated. Please check with the fx-data-dev mailing list before starting a new project using anything in this section.

Name and repoDescription
telemetry-next-nodeA node.js package for accessing Telemetry Aggregates data
emr-bootstrap-sparkAWS bootstrap scripts for Spark.
emr-bootstrap-prestoAWS bootstrap scripts for Presto.

Reference materials


Name and repoDescription
firefox-data-docsAll the info you need to answer questions about Firefox users with data
Firefox source docsMozilla Source Tree Docs - Telemetry section
reports.t.m.oKnowledge repository for public reports


Name and repoDescription