Client Count Daily Reference

This document is a work in progress. The work is being tracked here.

Introduction

The client_count_daily dataset is useful for estimating user counts over a few pre-defined dimensions.

The client_count_daily dataset is similar to the deprecated client_count dataset except that is aggregated by submission date and not activity date.

Content

This dataset includes columns for a dozen factors and an HLL variable. The hll column contains a HyperLogLog variable, which is an approximation to the exact count. The factor columns include submission date and the dimensions listed here. Each row represents one combinations of the factor columns.

Background and Caveats

It's important to understand that the hll column is not a standard count. The hll variable avoids double-counting users when aggregating over multiple days. The HyperLogLog variable is a far more efficient way to count distinct elements of a set, but comes with some complexity. To find the cardinality of an HLL use cardinality(cast(hll AS HLL)). To find the union of two HLL's over different dates, use merge(cast(hll AS HLL)). The Firefox ER Reporting Query is a good example to review. Finally, Roberto has a relevant write-up here.

Accessing the Data

The data is available in Re:dash. Take a look at this example query.

I don't recommend accessing this data from ATMO.

Further Reading

Data Reference

Example Queries

This document is a work in progress. The work is being tracked here.

Sampling

This document is a work in progress. The work is being tracked here.

Scheduling

This document is a work in progress. The work is being tracked here.

Schema

submission_date is formatted as %Y%m%d, like 20180130.

This document is a work in progress. The work is being tracked here.

results matching ""

    No results matching ""