Stats Daemon
Build database statistics on Hive using Presto
• Monitor the Hive metastore’s partition table for last updated time stamp
• for each recently modified partition, generate a single scan query that computes loads
of metrics
* for numeric value, compute MIN, MAX, AVG, SUM, NULL_COUNT, COUNT
DISTINCT, …
* for strings, count the number of characters, COUNT_DISTINCT, NULL_COUNT,
…
* based on naming conventions, add more specific rules
* whitelist / blacklist namespaces, regexes, …
• Load statistics into MySQL
• Used for capacity planning, data quality monitoring, debugging,
anomaly detection, alerting, …
cluster STRING
database STRING
table BIGINT
partition STRING
stat_expr STRING
value NUMBER
partition_stats