As of 2018-05-18, this dataset has been deprecated and is no longer maintained. See Bug 1455314
We've moved to assigning user's an active tag based on
the Active DAU definition.
The activity of a user based on
active_ticks is available in
active_hours_sum field, which has the
sum(active_ticks / 720).
To retrieve a client's 28-day
active_hours, use the following query:
SELECT submission_date_s3, client_id, SUM(active_hours_sum) OVER (PARTITION BY client_id ORDER BY submission_date_s3 ASC ROWS 27 PRECEDING) AS monthly_active_hours FROM clients_daily
heavy_users table provides information about whether a given
considered a "heavy user" on each day (using submission date).
heavy_users table contains one row per client-day, where day is
submission_date. A client has a row for a specific
they were active at all in the 28 day window ending on that
A user is a "heavy user" as of day N if, for the 28 day period ending
on day N, the sum of their
active_ticks is in the 90th percentile (or
above) of all clients during that period. For more analysis on this,
and a discussion of new profiles, see
- Data starts at 20170801. There is technically data in the table before
this, but the
NULLfor those dates because it needed to bootstrap the first 28 day window.
- Because it is top the 10% of clients for each 28 day period, more
than 10% of clients active on a given
submission_datewill be considered heavy users. If you join with another data source (
main_summary, for example), you may see a larger proportion of heavy users than expected.
- Each day has a separate, but related, set of heavy users. Initial investigations show that approximately 97.5% of heavy users as of a certain day are still considered heavy users as of the next day.
- There is no "fixing" or weighting of new profiles - days before the
profile was created are counted as zero
active_ticks. Analyses may need to use the included
profile_creation_datefield to take this into account.
The data is available both via
sql.t.m.o and Spark.
SELECT * FROM heavy_users LIMIT 3
The code responsible for generating this dataset is here
main_summaryto get distribution of
max_concurrent_tab_countfor heavy vs. non-heavy users
longitudinalto get crash rates for heavy vs. non-heavy users
As of 2017-10-05, the current version of the
heavy_users dataset is
v1, and has a schema as follows:
root |-- client_id: string (nullable = true) |-- sample_id: integer (nullable = true) |-- profile_creation_date: long (nullable = true) |-- active_ticks: long (nullable = true) |-- active_ticks_period: long (nullable = true) |-- heavy_user: boolean (nullable = true) |-- prev_year_heavy_user: boolean (nullable = true) |-- submission_date_s3: string (nullable = true)
This dataset is generated by telemetry-batch-view. Refer to this repository for information on how to run or augment the dataset.