Mozilla Data Documentation
1.
Introduction
1.1.
What Data does Mozilla Collect?
1.2.
Tools for Data Analysis
1.3.
Terminology
2.
Tutorials & Cookbooks
2.1.
Getting Started
2.1.1.
Gaining Access
2.1.2.
Getting Help
2.1.3.
Reporting a problem
2.2.
Analysis
2.2.1.
Data Discovery Tools
2.2.1.1.
Using the Data Catalog
2.2.1.2.
Using the Glean Dictionary
2.2.1.3.
Using the Probe Dictionary
2.2.2.
Data Monitoring - Intro to Bigeye
2.2.2.1.
Interface
2.2.2.2.
Deploying Metrics
2.2.2.3.
Collections
2.2.2.4.
Issues Management
2.2.2.5.
Cost Considerations
2.2.2.6.
bigquery-etl and Bigeye
2.2.2.7.
Further Reading
2.2.3.
Data Modeling
2.2.3.1.
Where to store the data model assets
2.2.3.2.
Using aggregates in BigQuery and Looker
2.2.3.3.
Shredder mitigation
2.2.4.
Working with Looker
2.2.4.1.
Introduction to Looker
2.2.4.2.
Normalizing Country Data
2.2.4.3.
Normalizing Browser Version Data
2.2.4.4.
Using Growth and Usage Dashboards
2.2.4.5.
Using the Event Counts Explore
2.2.4.6.
Using the Funnel Analysis Explore
2.2.4.7.
Looker Performance - Caching
2.2.5.
Other Data Analysis Tools
2.2.5.1.
Introduction to GLAM
2.2.5.2.
Introduction to Operational Monitoring
2.2.5.3.
Introduction to STMO
2.2.6.
Accessing Public Data
2.2.7.
Accessing and working with BigQuery
2.2.7.1.
Access
2.2.7.2.
Writing Queries
2.2.7.3.
Optimization
2.2.7.4.
Accessing Desktop Data
2.2.7.5.
Accessing Glean Data
2.2.7.6.
Accessing Additional Properties
2.2.7.7.
Custom analysis with Spark
2.2.8.
Dataset-Specific
2.2.8.1.
Working with Normandy events
2.2.8.2.
Working with Crash Pings
2.2.8.3.
Working with Bit Patterns in Clients Last Seen
2.2.8.4.
Visualizing Percentiles of a Main Ping Exponential Histogram
2.2.9.
Real-time
2.2.9.1.
Working with Live Data
2.2.9.2.
Seeing Your Own Pings
2.2.9.3.
See Real-time search metrics
2.2.10.
Metrics
2.3.
Operational
2.3.1.
Creating a Prototype Data Project on Google Cloud Platform
2.3.2.
Creating Static Dashboards with Protosaur
2.3.3.
Scheduling Queries
2.3.4.
Building and Deploying Containers to GCR with CircleCI
2.3.5.
Publishing Datasets
2.3.6.
Connecting Sheets and External Data to BigQuery
2.4.
Sending telemetry
2.4.1.
Implementing Experiments
2.4.2.
Sending Events
2.4.3.
Sending a Custom Ping
3.
Data Platform Reference
3.1.
Data Stack Overview
3.2.
Guiding Principles for Data Infrastructure
3.3.
Glean overview
3.4.
Overview of Mozilla's Data Pipeline
3.4.1.
HTTP Edge Server Specification
3.4.2.
Event Pipeline Detail
3.4.3.
Schemas
3.4.4.
Glean Data
3.4.5.
Channel Normalization
3.4.6.
Sampling
3.4.7.
Filtering
3.4.8.
BigQuery Artifact Deployment
3.5.
Common Analysis Gotchas
3.6.
SQL Style Guide
3.7.
Airflow Gotcha's
3.8.
Telemetry Behavior Reference
3.8.1.
History of Telemetry
3.8.2.
Profile Behavior
3.8.2.1.
Profile Creation
3.8.2.2.
Real World Usage
3.8.2.3.
Profile History
3.8.3.
Engagement metrics
3.8.4.
User states/Segments
3.9.
Experimentation
3.10.
Metric Hub
3.11.
External data integration using Fivetran
3.12.
Project Glossary
4.
Dataset Reference
4.1.
Pings
4.2.
Derived Datasets
4.2.1.
Active Profiles
4.2.2.
Active Users
4.2.3.
Addons
4.2.4.
Addons Daily
4.2.5.
Autonomous System Aggregates
4.2.6.
Clients Daily
4.2.7.
Clients Last Seen
4.2.8.
Events
4.2.9.
Events Daily
4.2.10.
Firefox Android Clients
4.2.11.
Main Ping Tables
4.2.12.
Main Summary
4.2.13.
Socorro Crash Reports
4.2.14.
SSL Ratios (public)
4.2.15.
Telemetry Aggregates
4.2.16.
GLAM Aggregates
4.3.
Experiment Datasets
4.3.1.
Jetstream
4.3.2.
Accessing experiment data
4.3.3.
Accessing Heartbeat data
4.3.4.
Dynamic telemetry
4.3.5.
Experiment monitoring
4.4.
Search Datasets
4.4.1.
Search Aggregates
4.4.2.
Search Clients Engines Sources Daily
4.4.3.
Search Clients Last Seen
4.4.4.
Client LTV
4.4.5.
Mobile Search Clients Sources Daily
4.4.6.
Search Revenue Levers
4.5.
Non-Desktop Datasets
4.5.1.
Day 2-7 Activation
4.5.2.
Google Play Store
4.5.3.
Apple App Store
4.6.
Other Datasets
4.6.1.
hgpush
4.6.2.
Stub installer ping
4.6.3.
bmobugs
4.6.4.
Build metadata
4.6.5.
Release information
4.6.6.
Suggest
4.6.7.
Sponsored Tiles
4.6.8.
Newtab_Interactions
4.6.9.
Urlbar Events
4.6.10.
Urlbar Events Daily
4.6.11.
SERP Events
4.7.
Mozilla Accounts Datasets
4.7.1.
Mozilla Account Attribution
4.7.2.
Mozilla Account Funnel Metrics
4.7.3.
Mozilla Account Email Metrics
4.8.
Static Datasets
4.8.1.
Normalized OS Names And Versions
5.
Historical Reference
5.1.
Previous AWS Pipeline Overview
5.2.
In-depth AWS Data Pipeline Detail
5.3.
Metrics
5.3.1.
Definitions
5.3.1.1.
Metrics
5.3.1.2.
Usage Criteria
5.3.1.3.
Slicing Dimensions
5.3.2.
Metrics Standardization and Policy
5.4.
Legacy Census Metrics
5.5.
Obsolete Datasets
5.5.1.
Activity Stream
5.5.2.
Attitudes Daily
5.5.3.
Churn
5.5.4.
Client Count Daily
5.5.5.
Client Count
5.5.6.
Crash Aggregates
5.5.7.
Crash Summary
5.5.8.
Error Aggregates
5.5.9.
First Shutdown Summary
5.5.10.
Heavy Users
5.5.11.
Legacy Mobile Datasets
5.5.12.
Longitudinal
5.5.13.
Retention
5.5.14.
Sync Summary
6.
Contributing
6.1.
Style Guide
6.2.
Structure
Light
Rust
Coal
Navy
Ayu
Mozilla Data Documentation
Real-time