Client Count Reference

This document is a work in progress. The work is being tracked here.

Introduction

The client_count dataset is useful for estimating user counts over a few pre-defined dimensions.

Content

This dataset includes columns for a dozen factors and an HLL variable. The hll column contains a HyperLogLog variable, which is an approximation to the exact count. The factor columns include activity date and the dimensions listed here. Each row represents one combinations of the factor columns.

Background and Caveats

It's important to understand that the hll column is not a standard count. The hll variable avoids double-counting users when aggregating over multiple days. The HyperLogLog variable is a far more efficient way to count distinct elements of a set, but comes with some complexity. To find the cardinality of an HLL use cardinality(cast(hll AS HLL)). To find the union of two HLL's over different dates, use merge(cast(hll AS HLL)). The Firefox ER Reporting Query is a good example to review. Finally, Roberto has a relevant writeup here.

Accessing the Data

The data is available in re:dash. Take a look at this example query.

I don't recommend accessing this data from ATMO.

Further Reading

Data Reference

Example Queries

This document is a work in progress. The work is being tracked here.

Sampling

This document is a work in progress. The work is being tracked here.

Scheduling

This document is a work in progress. The work is being tracked here.

Schema

This document is a work in progress. The work is being tracked here.

results matching ""

    No results matching ""