Public crash statistics for Firefox are available through the Data Platform in a
The crash data in Socorro is sanitized and made available to STMO.
A nightly import job converts batches of JSON documents into a columnar format using the associated JSON Schema.
The dataset is available in parquet at
It is also indexed with Athena and Presto with the table name
The dataset can be queried using SQL. For example, we can aggregate the number of crashes and total up-time by date and reason.
SELECT crash_date, reason, count(*) as n_crashes, avg(uptime) as avg_uptime, stddev(uptime) as stddev_uptime, approx_percentile(uptime, ARRAY [0.25, 0.5, 0.75]) as qntl_uptime FROM socorro_crash WHERE crash_date='20180520' GROUP BY 1, 2
The job is schedule on a nightly basis on airflow.
The dag is available under
The source schema is available on the
mozilla-services/socorro GitHub repository.
This schema is transformed into a Spark-SQL structure and serialized to parquet after transforming column names from