Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
Item recommendation = Item ranking problem based on scoring function 1 6 2 3 4 5 How can we evaluate? Which f is better? 1 6 2 3 4 5 items score: 10 8 6 2 1 0.5 user f 1 2 4 recommend
1. Precision@k Portion of true positives in Y : |X and Y| / |Y| 2. Recall@k Portion of true positives in X : |X and Y| / |X| 3. MAP (Mean Average Precision) Average from Precision@1 to Precision@k truth recommend (use top-k items) X Y
Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
Concept behind anomaly detectors Find patterns from past points score “how far from past pattern” Data source: http://cl-www.msi.co.jp/reports/changefinder.html
SST is much simpler than CF Naive computationally heavy Efficient numerical approximation easy-to-use, robust method single intuitive hyperparameter: window size w (int) (others can be chosen implicitly)
time x 1 182.478 2 176.231 3 183.917 4 177.798 5 165.469 … … SELECT time, changefinder(x, “-changepoint_threshold 0.005") FROM timeseries ORDER BY time ASC SELECT time, sst(x, "-threshold 0.005") FROM timeseries ORDER BY time ASC Change-point detection on Hivemall
Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
DD supports (simple) outlier detection Set alert by just thresholding outlier scores We need to detect from more complex conditions reduce false positives (e.g. check if metric-A AND metric-B show high outlier scores) https://www.datadoghq.com/blog/introducing-outlier-detection-in-datadog/
Internship Day2-5: Construct DD anomaly detection system get data points via API new metric for anomaly scores ChangeFinder daemon / CLI tool for replay send record with anomaly scores notify detected anomalies notify errors stream fetch Query
Internship Day2-5: Construct DD anomaly detection system get data points via API new metric for anomaly scores ChangeFinder daemon / CLI tool for replay send record with anomaly scores notify detected anomalies notify errors stream fetch Query Yay! My intern has been finished! (?)
Feedback from @nahi Usability-related requests for daemon’s behavior Supported as soon as possible Feasibility of ChangeFinder for DD metrics CF works as expected on some metrics Hard to figure out useful metrics due to CF’s instability Lack of Norikra-side evaluation
2-month intern was: enough to implement algorithms & mock system too short to build useful anomaly detector w/ sufficient evaluation Future directions Continuous discussions w/ metric observers More static analysis w/ different methods and options re:dash integration + can be research project
Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
Hivemall UDFs 1. Evaluation of ranking problems 2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very difficult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members
Backbone of real-life machine learning Engineering Wide variety of programming skills Integrating numerous middleware Science Understanding concepts behind equations Having practical point of view (e.g. complexity, usability) Human factor Experience on various data to incorporate heuristics Communication skills to get customers’ requirements