Projects

Below are a number of trailheads that lead into the projects and code that comprise the Firefox Data Platform.

Telemetry APIs

Name and repoDescription
python_moztelemetryPython APIs for Mozilla Telemetry
moztelemetryScala APIs for Mozilla Telemetry
spark-hyperloglogAlgebird's HyperLogLog support for Apache Spark
mozanalysisA library for Mozilla experiments analysis
gleanA client-side mobile Telemetry SDK for collecting metrics and sending them to Mozilla's Telemetry service

ETL code and Datasets

Name and repoDescription
bigquery-etlSQL ETL code for building derived datasets in BigQuery
telemetry-batch-viewScala ETL code for derived datasets
python_mozetlPython ETL code for derived datasets
telemetry-airflowAirflow configuration and DAGs for scheduled jobs
python_mozaggregatorAggregation job for telemetry.mozilla.org aggregates
telemetry-streamingSpark Streaming ETL jobs for Mozilla Telemetry

See also data-docs for documentation on datasets.

Infrastructure

Name and repoDescription
mozilla-pipeline-schemasJSON and Parquet Schemas for Mozilla Telemetry and other structured data
gcp-ingestionDocumentation and implementation of the Mozilla telemetry ingestion system on Google Cloud Platform
jsonschema-transpilerConvert JSON Schema into BigQuery table definitions
mozilla-schema-generatorIncorporate probe metadata to generate BigQuery table schemas
hindsightReal-time data processing
lua_sandboxGeneric sandbox for safe data analysis
lua_sandbox_extensionsModules and packages that extend the Lua sandbox
nginx_moz_ingestNginx module for Telemetry data ingestion
puppet-configCloud services puppet config for deploying infrastructure
parquet2hiveHive import statement generator for Parquet datasets
edge-validatorA service endpoint for validating incoming data

Data applications

Name and repoDescription
telemetry.mozilla.orgMain entry point for viewing aggregate Telemetry data
Glean Aggregate MetricsAggregate info about probes and measures
Glean Debug ViewTag and view Glean submissions with low latency
Mission ControlLow latency dashboard for stability and health metrics
RedashMozilla's fork of the data query / visualization system
redash-stmoMozilla's extensions to Redash
TAARTelemetry-aware addon recommender
EnsembleA minimalist platform for publishing data
Hardware ReportFirefox Hardware Report, available here
St. MocliA command-line interface to STMO
probe-scraperScrape and publish Telemetry probe data from Firefox
test-tubeCompare data across branches in experiments
experimenterA web application for managing experiments
St. MoabAutomatically generate Redash dashboard for A/B experiments

Legacy projects

Projects in this section are less active, but may not be officially deprecated. Please check with the fx-data-dev mailing list before starting a new project using anything in this section.

Name and repoDescription
telemetry-next-nodeA node.js package for accessing Telemetry Aggregates data
emr-bootstrap-sparkAWS bootstrap scripts for Spark.
emr-bootstrap-prestoAWS bootstrap scripts for Presto.

Reference materials

Public

Name and repoDescription
data-docsAll the info you need to answer questions about Firefox users with data
Firefox source docsMozilla Source Tree Docs - Telemetry section
mozilla.reportKnowledge repository for public reports

Non-public

Name and repoDescription