heavy_users table provides information about whether a given client_id is
considered a "heavy user" on each day (using submission date).
heavy_users table contains one row per client-day, where day is
submission_date. A client has a row for a specific
they were active at all in the 28 day window ending on that
A user is a "heavy user" as of day N if, for the 28 day period ending on day N, the sum of their active_ticks is in the 90th percentile (or above) of all clients during that period. For more analysis on this, and a discussion of new profiles, see this link.
- Data starts at 20170801. There is technically data in the table before
this, but the
NULLfor those dates because it needed to bootstrap the first 28 day window.
- Because it is top the 10% of clients for each 28 day period, more
than 10% of clients active on a given
submission_datewill be considered heavy users. If you join with another data source (
main_summary, for example), you may see a larger proportion of heavy users than expected.
- Each day has a separate, but related, set of heavy users. Initial investigations show that approximately 97.5% of heavy users as of a certain day are still considered heavy users as of the next day.
- There is no "fixing" or weighting of new profiles - days before the
profile was created are counted as zero
active_ticks. Analyses may need to use the included
profile_creation_datefield to take this into account.
The data is available both via sql.t.m.o and Spark.
SELECT * FROM heavy_users LIMIT 3
The code responsible for generating this dataset is here
main_summaryto get distribution of
max_concurrent_tab_countfor heavy vs. non-heavy users
longitudinalto get crash rates for heavy vs. non-heavy users
As of 2017-10-05, the current version of the
heavy_users dataset is
v1, and has a schema as follows:
root |-- client_id: string (nullable = true) |-- sample_id: integer (nullable = true) |-- profile_creation_date: long (nullable = true) |-- active_ticks: long (nullable = true) |-- active_ticks_period: long (nullable = true) |-- heavy_user: boolean (nullable = true) |-- prev_year_heavy_user: boolean (nullable = true) |-- submission_date_s3: string (nullable = true)
This dataset is generated by telemetry-batch-view. Refer to this repository for information on how to run or augment the dataset.