Clients Last Seen Reference
Introduction
The clients_last_seen
dataset is useful for efficiently determining exact
user counts such as DAU and MAU.
It can also allow efficient calculation of other windowed usage metrics like retention via its
bit pattern fields.
It includes the most recent values in a 28 day window for all columns in the
clients_daily
dataset.
Content
For each submission_date
this dataset contains one row per client_id
that appeared in clients_daily
in a 28 day window including
submission_date
and preceding days.
The days_since_seen
column indicates the difference between submission_date
and the most recent submission_date
in clients_daily
where the client_id
appeared. A client observed on the given submission_date
will have days_since_seen = 0
.
Other days_since_
columns use the most recent date in clients_daily
where
a certain condition was met. If the condition was not met for a client_id
in
a 28 day window NULL
is used. For example days_since_visited_5_uri
uses the
condition scalar_parent_browser_engagement_total_uri_count_sum >= 5
. These
columns can be used for user counts where a condition must be met on any day
in a window instead of using the most recent values for each client_id
.
The days_seen_bits
field stores the daily history of a client in the 28 day
window. The daily history is converted into a sequence of bits, with a 1
for
the days a client is in clients_daily
and a 0
otherwise, and this sequence
is converted to an integer. A tutorial on how to use these bit patterns to
create filters in SQL can be found in
this notebook.
The rest of the columns use the most recent value in clients_daily
where
the client_id
appeared.
Background and Caveats
User counts generated using days_since_seen
only reflect the most recent
values from clients_daily
for each client_id
in a 28 day window.
This means Active MAU as defined cannot be efficiently calculated using days_since_seen
because if a given client_id
appeared every day in February and only on February 1st had
scalar_parent_browser_engagement_total_uri_count_sum >= 5
then it would only
be counted on the 1st, and not the 2nd-28th. Active MAU can be efficiently and
correctly calculated using days_since_visited_5_uri
.
MAU can be calculated over a GROUP BY submission_date[, ...]
clause using
COUNT(*)
, because there is exactly one row in the dataset for each
client_id
in the 28 day MAU window for each submission_date
.
User counts generated using days_since_seen
can use SUM
to reduce groups,
because a given client_id
will only be in one group per submission_date
. So
if MAU were calculated by country
and channel
, then the sum of the MAU for
each country
would be the same as if MAU were calculated only by channel
.
Accessing the Data
The data is available in Re:dash and BigQuery. Take a look at this full running
example query in Re:dash (STMO#159510
).
Data Reference
Field Descriptions
The *_bits
fields store the relevant activity of a client in a 28 day
window, as a 28 bit integer.
For each bit, a 1 corresponds to the specific activity occurring on that day.
Activity Segment/User State/Core Active Specific
Please see this section for descriptions regarding user states/segments.
is_core_active_v1
: Boolean indicating if the client satisfies conditions of being core active on that day.activity_segments_v1
: The activity segment applicable to the client that day.is_regular_user_v3
: Boolean indicating if the client satisfies conditions of being a regular user on that day.is_new_or_resurrected_v3
: Boolean indicating if the client satisfies conditions of being a regular user on that day.is_weekday_regular_v1
: Boolean indicating if the client satisfies conditions of being a weekday regular user on that day.is_allweek_regular_v1
: Boolean indicating if the client satisfies conditions of being an all-week regular user on that day.
Usage Specific
days_visited_1_uri_bits
: Each bit field represents if a client browsed at least 1 URI on that day.days_since_visited_1_uri
: Number of days since the client browsed at least 1 URI.days_interacted_bits
: Each bit field represents if a client had at least 1 active tick on that day. This is derived from theactive_hours_sum
inclients_daily
.days_since_interacted
: Number of days since the clients had at least 1 active tick.days_had_8_active_ticks_bits
: Each bit field represents if a client had at least 8 active ticks on that day. This can be used to approximate the threshold of 1 URI, and is useful for determining activity for clients using Private Browsing Mode where URI counts are not recorded.days_since_visited_8_active_ticks
: Number of days since the client had at least 8 active ticks.
New Profile Specific
first_seen_date
: Date the client sent their first main ping.second_seen_date
: Date the client sent their first main ping.days_since_first_seen
: Number of days sincefirst_seen_date
days_since_second_seen
: Number of days sincesecond_seen_date
new_profile_5_day_activated_v1
: Boolean indicating if a new profile has sent a ping 5 out of their first 7 days.new_profile_14_day_activated_v1
: Boolean indicating if a new profile has sent a ping 8 out of their first 14 days.new_profile_21_day_activated_v1
: Boolean indicating if a new profile has sent a ping 12 out of their first 21 days.days_since_created_profile
: Number of days since the profile creation date. This field is only populated when the value is 27 days or less. Otherwise, it is NULL.profile_age_in_days
can be used in the latter cases for all clients who have a profile creation date.
Example Queries
Compute DAU for non-windows clients for the last week
SELECT
submission_date,
os,
COUNT(*) AS count
FROM
mozdata.telemetry.clients_last_seen
WHERE
submission_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 WEEK)
AND days_since_seen = 0
GROUP BY
submission_date,
os
HAVING
count > 10 -- remove outliers
AND lower(os) NOT LIKE '%windows%'
ORDER BY
os,
submission_date DESC
Compute WAU by Channel for the last week
SELECT
submission_date,
normalized_channel,
COUNT(*) AS count
FROM
mozdata.telemetry.clients_last_seen
WHERE
submission_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 WEEK)
AND days_since_seen < 7
GROUP BY
submission_date,
normalized_channel
HAVING
count > 10 -- remove outliers
ORDER BY
normalized_channel,
submission_date DESC
Scheduling
This dataset is updated daily via the
telemetry-airflow
infrastructure. The job runs as part of the
main_summary
DAG.
Schema
The data is partitioned by submission_date
.
As of 2019-03-25, the current version of the clients_last_seen
dataset is
v1
, and the schema is visible in the BigQuery console
here.
Code Reference
This dataset is generated by
bigquery-etl
.
Refer to this repository for information on how to run or augment the dataset.