Search Clients Engines Sources Daily

Introduction

search_clients_engines_sources_daily is designed to enable client-level search analyses. Querying this dataset can be slow; consider using search_aggregates for coarse analyses.

Contents

search_clients_engines_sources_daily has one row for each unique combination of: (client_id, submission_date, engine, source).

In addition to the standard search count aggregations, this dataset includes some descriptive data for each client. For example, we include country and channel for each row of data. In the event that a client sends multiple pings on a given submission_date we choose an arbitrary value from the pings for that (client_id, submission_date), unless otherwise noted.

There were originally five standard search count aggregation columns: sap, tagged-sap, and tagged-follow-on, organic and unknown. Over time, more search count aggregation columns were added, including ad_click and search_with_ads in late 2018 bug; and ad_click_organic and search_with_ads_organic in late 2021 bug.

Note that, if there were no such searches in a row's segment (i.e. the count would be 0), the column value is null. Each of these columns represent different types of searches. For more details, see the search data documentation

Background and Caveats

search_clients_engines_sources_daily does not include (client_id submission_date) pairs if we did not receive a ping for that submission_date.

We impute a NULL engine and source for pings with no search counts. This ensures users who never search are included in this dataset.

This dataset is large. If you're querying this dataset from STMO, heavily limit the data you read using submission_date or sample_id.

The has_adblocker_addon field is True if the client had an active addon that blocks Mozilla's ability to monetize the searches via a search engine partnership. The logic for identifying ad-blocking addons is here (private notebook).

Data Reference

Example Queries

STMO#51141 calculates searches per normalized_channel for US clients on an arbitrary day. If you have trouble viewing this query, it's likely you don't have the proper permissions. For more details see the search data documentation.

Scheduling

This dataset is scheduled on Airflow (source).

Schema

As of 2022-03-25, the current version of the underlying search_clients_daily is v8, with schema as follows. Generally, see Data Catalog in GCP for the most up-to-date schema. It's backfilled through 2016-03-12.

root
 |-- client_id: string (nullable = true)
 |-- submission_date: date (nullable = true)
 |-- submission_date_s3: date (nullable = true)
 |-- engine: string (nullable = true)
 |-- source: string (nullable = true)
 |-- country: string (nullable = true)
 |-- app_version: string (nullable = true)
 |-- distribution_id: string (nullable = true)
 |-- locale: string (nullable = true)
 |-- search_cohort: string (nullable = true)
 |-- addon_version: string (nullable = true)
 |-- os: string (nullable = true)
 |-- os_version: string (nullable = true)
 |-- channel: string (nullable = true)
 |-- profile_creation_date: long (nullable = true)
 |-- default_search_engine: string (nullable = true)
 |-- default_search_engine_data_load_path: string (nullable = true)
 |-- default_search_engine_data_submission_url: string (nullable = true)
 |-- default_private_search_engine: string (nullable = true)
 |-- default_private_search_engine_data_load_path: string (nullable = true)
 |-- default_private_search_engine_data_submission_url: string (nullable = true)
 |-- sample_id: long (nullable = true)
 |-- sessions_started_on_this_day: long (nullable = true)
 |-- profile_age_in_days: integer (nullable = true)
 |-- subsession_hours_sum: double (nullable = true)
 |-- active_addons_count_mean: double (nullable = true)
 |-- max_concurrent_tab_count_max: integer (nullable = true)
 |-- tab_open_event_count_sum: long (nullable = true)
 |-- active_hours_sum: double (nullable = true)
 |-- total_uri_count: long (nullable = true)
 |-- tagged_sap: long (nullable = true)
 |-- tagged_follow_on: long (nullable = true)
 |-- sap: long (nullable = true)
 |-- organic: long (nullable = true)
 |-- search_with_ads: long (nullable = true)
 |-- search_with_ads_organic: long (nullable = true)
 |-- ad_click: long (nullable = true)
 |-- ad_click_organic: long (nullable = true)
 |-- unknown: long (nullable = true)
 |-- normalized_engine: string (nullable = true)
 |-- user_pref_browser_search_region: string (nullable = true)
 |-- is_default_browser: boolean (nullable = true)
 |-- experiments: map (nullable = true)
 |    |-- key: string
 |    |-- value: string
 |-- scalar_parent_urlbar_searchmode_bookmarkmenu_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_handoff_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_keywordoffer_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_oneoff_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_other_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_shortcut_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_tabmenu_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_tabtosearch_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_tabtosearch_onboard_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_topsites_newtab_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_topsites_urlbar_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_touchbar_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64
 |-- scalar_parent_urlbar_searchmode_typed_sum: map (nullable = true)
 |    |-- key: string
 |    |-- value: int64

Code Reference

The search_clients_engines_sources_daily job is defined in bigquery-etl