DAU and MAU
For the purposes of DAU, a profile is considered active if it sends any main ping.
- Dates are defined by
DAU is the number of clients sending a main ping on a given day.
MAU is the number of unique clients who have been a DAU on any day in the last 28 days. In other words, any client that contributes to DAU in the last 28 days would also contribute to MAU for that day. Note that this is not simply the sum of DAU over 28 days, since any particular client could be active on many days.
For quick analysis, using
clients_daily_v6 is recommended. Below is an example query for getting DAU using
SELECT submission_date_s3, count(*) AS total_clients_cdv6 FROM clients_daily_v6 GROUP BY 1 ORDER BY 1 ASC
main_summary can also be used for getting DAU. Below is an example query using a 1% sample over March 2018 using
SELECT submission_date_s3, count(DISTINCT client_id) * 100 as DAU FROM main_summary WHERE sample_id = '51' AND submission_date_s3 >= '20180301' AND submission_date_s3 < '20180401' GROUP BY 1 ORDER BY 1 ASC
client_count_daily can be used to get approximate DAU. This dataset uses HyperLogLog to estimate unique counts. For example:
SELECT submission_date AS day, cardinality(merge(cast(hll AS HLL))) AS dau FROM client_count_daily WHERE -- Limit to 7 days of history submission_date >= date_format(CURRENT_DATE - INTERVAL '7' DAY, '%Y%m%d') GROUP BY 1 ORDER BY 1
Calculating MAU for a single date is simple. For example, the following query calculates MAU for 2018-12-16:
SELECT COUNT(DISTINCT client_id) FROM clients_daily_v6 WHERE submission_date_s3 <= '20181216' AND submission_date_s3 >= '20181119'
Generating a table of MAU over time is more complex. Here's one possible way, which is conceptually simple, but not very computationally efficient:
SELECT dates_table.submission_date_s3, COUNT(DISTINCT clients_table.client_id) AS mau FROM ( SELECT TO_DATE(submission_date_s3, 'yyyyMMdd') AS submission_date_s3 FROM clients_daily_v6 WHERE submission_date_s3 >= '20181216' GROUP BY submission_date_s3 ) AS dates_table JOIN ( SELECT client_id, TO_DATE(submission_date_s3, 'yyyyMMdd') AS submission_date_s3 FROM clients_daily_v6 WHERE submission_date_s3 >= '20181119' ) AS clients_table ON clients_table.submission_date_s3 between dates_table.submission_date_s3 - interval 27 day and dates_table.submission_date_s3 GROUP BY dates_table.submission_date_s3
Note that the query above will not run in STMO, but will in Databricks. In the near future, we expect to offer a better option and will thoroughly update this documentation.