Monitor the Hive metastore’s partition table for last updated time stamp • for each recently modified partition, generate a single scan query that computes loads of metrics * for numeric value, compute MIN, MAX, AVG, SUM, NULL_COUNT, COUNT DISTINCT, … * for strings, count the number of characters, COUNT_DISTINCT, NULL_COUNT, … * based on naming conventions, add more specific rules * whitelist / blacklist namespaces, regexes, … • Load statistics into MySQL • Used for capacity planning, data quality monitoring, debugging, anomaly detection, alerting, … cluster STRING database STRING table BIGINT partition STRING stat_expr STRING value NUMBER partition_stats