Socorro Crash Reports
Introduction
Public crash statistics for Firefox are available through the Data Platform in a socorro_crash
dataset.
The crash data in Socorro is sanitized and made available to STMO.
A nightly import job converts batches of JSON documents into a columnar format using the associated JSON Schema.
Data Reference
Example
The dataset can be queried using SQL. For example, we can aggregate the number of crashes and total up-time by date and reason.
SELECT crash_date,
reason,
count(*) as n_crashes,
avg(uptime) as avg_uptime,
stddev(uptime) as stddev_uptime,
approx_percentile(uptime, ARRAY [0.25, 0.5, 0.75]) as qntl_uptime
FROM socorro_crash
WHERE crash_date='20180520'
GROUP BY 1,
2
Scheduling
The job is schedule on a nightly basis on airflow.
The dag is available under mozilla/telemetry-airflow:/dags/socorro_import.py
.
Schema
The source schema is available on the mozilla-services/socorro
GitHub repository.
This schema is transformed into a Spark-SQL structure and serialized to parquet after transforming column names from camelCase
to snake_case
.
Code Reference
The code is a notebook in the mozilla-services/data-pipeline
repository.