Normalizing Country Data
This how-to guide is about getting standard country data in your Looker Explores, Looks and Dashboards:
This guide has only two steps: Normalizing Aliases and Accessing Standard Country Data.
⚠️ Some steps in this guide require knowledge of SQL and LookML - ask in #data-help for assistance if needed.
We get country data from many sources: partners, telemetry, third-party tools etc. In order to analyze these in a standard way, i.e. make different analyses comparable, we can conform these sources to a set of standard country codes, names, regions, sub-regions, etc.
Step One - Normalizing Aliases
⚠️ If your country data already consists of two-character ISO3166 codes, you can skip to Step Two!
We refer to a different input name for the same country as "alias". For example, your data might contain the country value "US", another might contain "USA" and yet another might contain "United States", etc. This can be confusing when read in a table or seen on a graph.
To normalize this, we maintain a mapping of aliases from each country to its two-character ISO3166 code This includes misspellings and alternate language encoding that we encounter in various datasets. For example:
CI:
- "Ivory Coast"
- "Côte d’Ivoire"
- "Côte d'Ivoire"
- "Cote d'Ivoire"
- "Côte d’Ivoire"
- "The Republic of Côte d'Ivoire"
To map (normalize) your input alias to its country code, add a LEFT join from your table or view to the alias country_lookup
table: mozdata.static.country_names_v1
. For example:
SELECT
...
your_table.country_field,
COALESCE(country_lookup.code, your_table.country, '??') as country_code
...
FROM
your_table
LEFT JOIN mozdata.static.country_names_v1 country_lookup ON your_table.country_field = country_lookup.name
Note: we use ??
as a country-code for empty country data from data sources. This will map to "Unknown Country",
"Unknown Region", etc.
At this point, you should check for cases where the resulting country_code
matches your_table.country
but does
not match any values in the country_lookup
table - you may have discovered a new alias, in which case please add it to the list!
You can do this via a bigquery-etl pull request for example: https://github.com/mozilla/bigquery-etl/pull/2858.
⚠️ This list of aliases is public. If you are working with sensitive data, please do not add to the public list of aliases, you should handle it in custom logic in code that interfaces with your sensitive data for example in private-bigquery-etl or the private Looker spoke.
If you are satisfied that the country_code
field is appropriately normalized, move on to Step Two!
Step Two - Accessing Standard Country Data
Standard country data is contained in the mozdata.static.country_codes_v1
table and by extension the
shared/views/countries
Looker View.
Add the following join to your Explore (either in the .explore.lkml
or .model.lkml
file):
include: "/shared/views/*"
...
join: countries {
type: left_outer
relationship: one_to_one
sql_on: ${your_table.country_code} = ${countries.code} ;;
}
Now, you should be able to see the Countries View in your Explore 🎉