Search Aggregates
Introduction
search_aggregates
is designed to power high level search dashboards.
It's quick and easy to query, but the data are coarse.
In particular, this dataset allows you to segment
by a limited number of client characteristics which are relevant to search markets.
However, it is not possible to normalize by client count.
If you need fine-grained data, consider using search_clients_engines_sources_daily
which breaks down search counts by client, engine, and source.
Contents
Each row of search_aggregates
contains
the standard search count aggregations
for each unique combination of the following columns.
Unless otherwise noted, these columns are taken directly from main_summary
.
submission_date
-yyyymmdd
engine
- e.g.google
,bing
,yahoo
source
- The UI component used to issue a search - e.g.urlbar
,abouthome
country
locale
addon_version
- The installed version of the [followonsearch
addon] (before version 61)app_version
distribution_id
-NULL
means the standard Firefox buildsearch_cohort
-NULL
except for small segments relating to search experimentationdefault_search_engine
default_private_search_engine
os
- e.g.Linux
,Windows_NT
,Darwin
...os_version
is_default_browser
There are ten aggregation columns:
sap
, tagged-sap
, tagged-follow-on
,organic
, unknown
, ad_click
, ad_click_organic
, search_with_ads
, search_with_ads_organic
, and client_count
.
Each of these columns represent different types of searches.
For more details, see the search data documentation.
Gotcha
Although search_aggregates
table is created on top of search_clients_engines_sources_daily
, you may expect the total search metrics reported to match exactly. It's actually not the case. In case you notice the total number reported in search_aggregates
higher than search_clients_engines_sources_daily
, it's most likely due to Shredder. search_aggregates
table is aggregated beyond the client level, so shredder doesn’t have to touch it. But search_clients_engines_sources_daily
contains client_id
and is subject to shredder. It's expected to lose up to 1% of rows every month as Firefox responds to clients' deletion requests, which would reduce count in search_clients_engines_soruces
but not in search_aggregates
. An example query to show such a difference can be found in STMO#84302
.
Data Reference
Example Queries
Daily US sap searches
SELECT
submission_date,
SUM(SAP) AS search_counts
FROM search.search_aggregates
WHERE
country = 'US'
AND submission_date BETWEEN '2019-01-01' AND '2019-01-07'
GROUP BY submission_date
ORDER BY submission_date
Scheduling
This job is scheduled on airflow to run daily.
Schema
As of 2021-04-29,
the current version of search_aggregates
is v8
,
and has a schema as follows.
The dataset is backfilled through 2016-03-11.
root
|-- submission_date: date (nullable = true)
|-- submission_date_s3: date (nullable = true)
|-- country: string (nullable = true)
|-- engine: string (nullable = true)
|-- normalized_engine: string (nullable = true)
|-- source: string (nullable = true)
|-- app_version: string (nullable = true)
|-- distribution_id: string (nullable = true)
|-- locale: string (nullable = true)
|-- search_cohort: string (nullable = true)
|-- addon_version: string (nullable = true)
|-- tagged_sap: long (nullable = true)
|-- tagged_follow_on: long (nullable = true)
|-- sap: long (nullable = true)
|-- organic: long (nullable = true)
|-- search_with_ads: long (nullable = true)
|-- search_with_ads_organic: long (nullable = true)
|-- ad_click: long (nullable = true)
|-- ad_click_organic: long (nullable = true)
|-- unknown: long (nullable = true)
|-- client_count: long (nullable = true)
|-- default_search_engine: string (nullable = true)
|-- default_private_search_engine: string (nullable = true)
|-- os: string (nullable = true)
|-- os_version: string (nullable = true)
|-- is_default_browser: boolean (nullable = true)
Code Reference
The search_aggregates
job is
defined in bigquery-etl