Addons Daily

Introduction

The addons_daily table is a small and fast dataset with data on specific add-ons, and the users who have them installed. It contains one row per addon_id and submission_date.

Contents

Many questions about add-ons are of the form: "How many users have add-on A installed?" or "Are users with add-on Z more active than users with add-on Y?" This dataset is aimed at answering these type of questions without having to do cumbersome joins or filters.

This dataset also has detailed search aggregates by each add-on, broken out by our major search engines (google, bing, ddg, amazon, yandex, other), along with total aggregates (total). This allows us to identify strange search patterns for add-ons who change a user's search settings on their behalf, often siphoning away SAP searches and Mozilla revenue (see the second example query below).

Accessing the Data

The addons_daily table is accessible through re:dash using the Telemetry (BigQuery) data source.

Here's an example query.

Data Reference

Example Queries

DAU, WAU and MAU for uBlock Origin

SELECT
    submission_date,
    dau,
    wau,
    mau
FROM
    `moz-fx-data-shared-prod.telemetry.addons_daily`
WHERE
    submission_date >= DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY)
    AND addon_id = 'uBlock0@raymondhill.net'

Add-ons with Highest Organic:SAP search ratio

SELECT
    addon_id,
    ANY_VALUE(name) as name,
    AVG(dau) as avg_dau,
    SAFE_DIVIDE(SUM(organic_searches.total), SUM(sap_searches.total)) as organic_sap_ratio
FROM
    telemetry.addons_daily
WHERE
    submission_date >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
    AND is_system = false
GROUP BY
    1
HAVING
    avg(dau) > 1000
ORDER BY
    4 DESC

Scheduling

This dataset is updated daily via the telemetry-airflow infrastructure. The job runs as part of the addons_daily DAG.

Schema

The data is partitioned by submission_date.

As of 2020-04-17, the current version of the addons_daily dataset is v1.

Code Reference

This dataset is generated by bigquery-etl. Refer to this repository for information on how to run or augment the dataset.