This article introduces the datasets we maintain for search analyses:
search_clients_daily. After reading this article,
you should understand the search datasets well enough to produce moderately
Table of Contents
- Standard Search Aggregates
- In Content Telemetry Issues
Access to both
is heavily restricted in re:dash.
We also maintain a restricted group for search on Github and Bugzilla.
If you reach a 404 on Github or don't have access to a re:dash query or bug
this is likely your issue.
To get access permissions, file a bug using the search permissions template
Once you have proper permissions,
you'll have access to a new source in re:dash called
You will not be able to access any of the search datasets
via the standard
Presto data source, even with proper permissions.
Direct vs Follow-on Search
Searches can be split into three major classes: sap, follow-on, and organic.
SAP searches result from a direct interaction with a
search access point (SAP),
which is part of the Firefox UI.
These searches are often called SAP searches.
There are currently 7 SAPs:
urlbar- entering a search query in the Awesomebar
searchbar- the main search bar; not present by default for new profiles on Firefox 57+
newtab- the search bar on the
abouthome- the search bar on the
contextmenu- selecting text and clicking "Search" from the context menu
system- starting Firefox from the command line with an option that immediately makes a search
webextension- initiated from a web extension (added as of Firefox 63)
Users will often interact with the Search Engine Results Page (SERP)
to create "downstream" queries.
These queries are called
These are sometimes also referred to as in-content queries
since they are initiated from the content of the page itself
and not from the Firefox UI.
For example, follow-on queries can be caused by:
- Revising a query (
restaurants near me)
- Clicking on the "next" button
- Accepting spelling suggestions
Finally, we track the number of organic searches. These would be searches initiated directly from a search engine provider, not through a search access point.
Tagged vs Untagged Searches
Our partners (search engines) attribute queries to Mozilla using partner codes. When a user issues a query through one of our SAPs, we include our partner code in the URL of the resulting search.
Tagged queries are queries that include one of our partner codes.
Untagged queries are queries that do not include one of our partner codes. If a query is untagged, it's usually because we do not have a partner deal for that search engine and region (or it is an organic search that did not start from an SAP).
If an SAP query is tagged, any follow-on query should also be tagged.
Standard Search Aggregates
We report five types of searches in our search datasets:
These aggregates show up as columns in the
Our search datasets are all derived from
The aggregate columns are derived from the
sap column counts all SAP (or direct) searches.
sap search counts are collected via
within the Firefox UI
These counts are very reliable, but do not count follow-on queries.
In 2017-06 we deployed the
followonsearch addon, which adds probes for
These columns attempt to count all tagged searches
by looking for Mozilla partner codes in the URL of requests to partner search engines.
These search counts are critical to understanding revenue
since they exclude untagged searches and include follow-on searches.
However, these search counts have important caveats affecting their reliability.
See In Content Telemetry Issues for more information.
In 2018, we incorporated this code into the product (as of version 61) and also started tracking so-called "organic" searches that weren't initiated through a search access point (sap). This data has the same caveats as those for follow on searches, above.
We also started tracking "unknown" searches, which generally correspond to clients submitting random/unknown search data to our servers as part of their telemetry payload. This category can generally safely be ignored, unless its value is extremely high (which indicates a bug in either Firefox or the aggregation code which creates our datasets).
main_summary, all of these searches are stored in
which makes it easy to over count searches.
However, in general, please avoid using
main_summary for search analyses --
it's slow and you will need to duplicate much of the work done to make
analyses of our search datasets tractable.
We remove search count observations representing more than 10,000 searches for a single search engine in a single ping.
In Content Telemetry Issues
The search code module inside Firefox (formerly implemented
as an addon until version 60) implements the probe used to measure
tagged-follow-on searches and also tracks organic searches. This probe is critical
to understanding our revenue. It's the only tool that gives us a view of follow-on searches
and differentiates between tagged and untagged queries.
However, it comes with some notable caveats.
Relies on whitelists
Firefox's search module attempts to count all tagged searches by looking for Mozilla partner codes in the URL of requests to partner search engines. To do this, it relies on a whitelist of partner codes and URL formats. The list of partner codes is incomplete and only covers a few top partners. These codes also occasionally change so there will be gaps in the data.
Additionally, changes to search engine URL formats can cause problems with our data collection. See this query for a notable example.
Limited historical data
followonsearch addon was first deployed in 2017-06.
There is no
tagged-* search data available before this.