# Heavy Users

As of 2018-05-18, this dataset has been deprecated and is no longer maintained. See Bug 1455314

# Replacement

We've moved to assigning user's an active tag based on total_uri_count, see the Active DAU definition.

The activity of a user based on active_ticks is available in clients_daily in the active_hours_sum field, which has the sum(active_ticks / 720).

To retrieve a client's 28-day active_hours, use the following query:

SELECT submission_date_s3,
client_id,
SUM(active_hours_sum) OVER (PARTITION BY client_id
ORDER BY submission_date_s3 ASC
ROWS 27 PRECEDING) AS monthly_active_hours
FROM
clients_daily


# Introduction

The heavy_users table provides information about whether a given client_id is considered a "heavy user" on each day (using submission date).

#### Contents

The heavy_users table contains one row per client-day, where day is submission_date. A client has a row for a specific submission_date if they were active at all in the 28 day window ending on that submission_date.

A user is a "heavy user" as of day N if, for the 28 day period ending on day N, the sum of their active_ticks is in the 90th percentile (or above) of all clients during that period. For more analysis on this, and a discussion of new profiles, see this link.

#### Background and Caveats

1. Data starts at 20170801. There is technically data in the table before this, but the heavy_user column is NULL for those dates because it needed to bootstrap the first 28 day window.
2. Because it is top the 10% of clients for each 28 day period, more than 10% of clients active on a given submission_date will be considered heavy users. If you join with another data source (main_summary, for example), you may see a larger proportion of heavy users than expected.
3. Each day has a separate, but related, set of heavy users. Initial investigations show that approximately 97.5% of heavy users as of a certain day are still considered heavy users as of the next day.
4. There is no "fixing" or weighting of new profiles - days before the profile was created are counted as zero active_ticks. Analyses may need to use the included profile_creation_date field to take this into account.

#### Accessing the Data

The data is available both via sql.t.m.o and Spark.

In Spark:

spark.read.parquet("s3://telemetry-parquet/heavy_users/v1")


In SQL:

SELECT * FROM heavy_users LIMIT 3


The code responsible for generating this dataset is here

# Data Reference

Example queries:

## Schema

As of 2017-10-05, the current version of the heavy_users dataset is v1, and has a schema as follows:

root
|-- client_id: string (nullable = true)
|-- sample_id: integer (nullable = true)
|-- profile_creation_date: long (nullable = true)
|-- active_ticks: long (nullable = true)
|-- active_ticks_period: long (nullable = true)
|-- heavy_user: boolean (nullable = true)
|-- prev_year_heavy_user: boolean (nullable = true)
|-- submission_date_s3: string (nullable = true)


# Code Reference

This dataset is generated by telemetry-batch-view. Refer to this repository for information on how to run or augment the dataset.