- Data Reference
- Code Reference
search_clients_last_seen is designed for individual search analysis for clients over the past year.
This can be useful when wondering about client search behavior over the past year.
search_clients_last_seen is currently a 1% sample, and the first day with a full year of
search activity is 2020-01-01.
search_clients_last_seen has just one row per-client, for any client
who was active over the past year. Here we define active as "sent a main
ping"; so if they are present that does not mean they searched.
NOTE: Always choose a single
submission_date when querying
The key pieces of this dataset are byte arrays that contain daily information about client activity over the past year. We have these for a variety of activity types:
days_seen_bytes: Days when we received a main ping from the client
days_searched_bytes: Days that the client searched in any form
days_tagged_searched_bytes: Days that the client performed a tagged search
days_searched_with_ads_bytes: Days that the client performed a search that contained ads
days_clicked_ads_bytes: Days that the client clicked an ad post-search See "Utilizing BYTE columns" for how to use these fields.
There are some convenience columns around these, that give the number of days since the client was last active for that usage criteria:
We also include a variety of dimension information (e.g. os, country, channel, default_search) to aggregate on. The query itself lays out all of the available dimensional fields.
There are, finally, a few fields with daily activity data.
organic for organic searches,
total_searches. Please note that these are just for the current day,
and not over the entire year of history contained in the
These are stored as BigQuery
BYTE type, so they can be a bit confusing
to use. We have a few convenience functions for using them. For these functions,
anytime we say "was active", we mean "within the usage criteria defined by that
column"; for example, it could be days that clients searched with ads.:
udf.bits_to_days_seen - The number of days the user was active.
udf.bits_to_days_since_seen - The number of days since the user was last active.
udf.bits_to_days_since_first_seen - The number of days since the user's first
active day. Note that this will be at most 365 days, since that is the beginning
of history for this dataset.
udf.bits_to_active_n_weeks_ago - Returns whether or not the user was active n weeks
ago for the given activity type.
Warning: This column was designed specifically for use with the revenue data, and probably isn't good for other kinds of analysis.
For each search engine, we store an array that contains the number of searches that user completed each month for the past 12 months. This is across calendar months, so the number of days are not directly comparable.
We have the same data for tagged searches, search with ads, and ad clicks.
search_clients_last_seen does includes
even if we did not receive a ping for that
Any client who was active over the past year will be included.
Access the data at
gives the WoW retention of users in different segments. Similar to GUD, a user is
considered retained if they do they same activity the next week. Note here the
outage from Armagaddon, and the outage for ad click telemetry in October.
shows the amount of different search activity taken by clients. We can use it
to determine the % of clients who partake in each activity, regardless of their
baseline amount of activity.
This dataset is scheduled on Airflow (source).
As of 2020-04-22, the current version of
and has a schema as follows.
It's backfilled through 2020-01-01
root |- submission_date: date |- client_id: string |- sample_id: integer |- country: string |- app_version: string |- distribution_id: string |- locale: string |- search_cohort: string |- addon_version: string |- os: string |- channel: string |- profile_creation_date: integer |- default_search_engine: string |- default_search_engine_data_load_path: string |- default_search_engine_data_submission_url: string |- profile_age_in_days: integer |- active_addons_count_mean: float |- user_pref_browser_search_region: string |- os_version: string |- max_concurrent_tab_count_max: integer |- tab_open_event_count_sum: integer |- active_hours_sum: float |- subsession_hours_sum: float |- sessions_started_on_this_day: integer |- organic: integer |- sap: integer |- unknown: integer |- tagged_sap: integer |- tagged_follow_on: integer |- ad_click: integer |- search_with_ads: integer |- total_searches: integer |- tagged_searches: integer +- engine_searches: record (repeated) | |- key: string | +- value: record | | |- total_searches: integer (repeated) | | |- tagged_searches: integer (repeated) | | |- search_with_ads: integer (repeated) | | |- ad_click: integer (repeated) |- days_seen_bytes: bytes |- days_searched_bytes: bytes |- days_tagged_searched_bytes: bytes |- days_searched_with_ads_bytes: bytes |- days_clicked_ads_bytes: bytes |- days_created_profile_bytes: bytes
search_clients_last_seen job is