Treasure Data Summer Internship 2016

Slide 1

Slide 1 text

Treasure Data Summer Internship 2016 Real-world Machine Learning

Slide 2

Slide 2 text

$ whoami Takuya Kitazawa github.com/takuti twitter.com/takuti

Slide 3

Slide 3 text

$ curl takuti.me

Slide 4

Slide 4 text

Congrats!

Slide 5

Slide 5 text

Lesson from internship: Machine Learning is diﬃcult…

Slide 6

Slide 6 text

Hivemall UDFs  1. Evaluation of ranking problems  2. Anomaly detection Datadog anomaly detection — thanks @nahi! Very diﬃcult… Customer churn prediction on TD Random Forest on Hivemall & td-pandas Sales/consulting MTGs Attend 2 MTGs w/ @myui and other members

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Item recommendation = Item ranking problem based on scoring function 1 6 2 3 4 5 How can we evaluate? Which f is better? 1 6 2 3 4 5 items score: 10 8 6 2 1 0.5 user f 1 2 4 recommend

Slide 9

Slide 9 text

Implement 6 ranking measures [B. McFee and G. R. Lanckriet. Metric Learning to Rank. ICML’10]

Slide 10

Slide 10 text

1. Precision@k Portion of true positives in Y : |X and Y| / |Y| 2. Recall@k Portion of true positives in X : |X and Y| / |X| 3. MAP (Mean Average Precision) Average from Precision@1 to Precision@k truth recommend (use top-k items) X Y

Slide 11

Slide 11 text

4. AUC (Area Under the ROC Curve) Scores for truth items must be greater than others Portion of “correct” pairs 1 2 4 Expected: 1 1 1 2 2 2 4 4 4 6 3 5 > > > 6 3 5 > > > 6 3 5 > > >

Slide 12

Slide 12 text

5. MRR (Mean Reciprocal Rank) Rank of ﬁrst true positive Best: “First true positive is ranked #1” 6. nDCG (normalized Discounted Cumulated Gain) Where is each true positive ranked? Best: 1 6 2 3 4 5 1 2 4 1 2 4 6 3 5 truth others #1 #6 #1 #6

Slide 13

Slide 13 text

… — aggregation SELECT  precision(t1.rec, t2.truth, 2), recall(t1.rec, t2.truth, 2), average_precision(t1.rec, t2.truth, 2), auc(t1.rec, t2.truth, 2), mrr(t1.rec, t2.truth, 2), ndcg(t1.rec, t2.truth, 2) … — join => 0.500 => 0.333 => 0.333 => 1.000 => 1.000 => 0.613 “higher is better” in [0, 1] range Evaluate top-2 rec. on Hivemall

Slide 14

Slide 14 text

Q. Which one should I use?

Slide 15

Slide 15 text

Q. Which one should I use? A. It depends on your problem    You can try all of them!

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Concept behind anomaly detectors Find patterns from past points score “how far from past pattern” Data source: http://cl-www.msi.co.jp/reports/changeﬁnder.html

Slide 18

Slide 18 text

ChangeFinder (CF) by Spring intern Outlier score & Change-point score

Slide 19

Slide 19 text

Implement additional options for CF Parameter estimation logic 1. Solving Yule-Walker equation 2. Burg’s method Scoring function 1. Logloss 2. Hellinger distance new! new!

Slide 20

Slide 20 text

Outlier Change-point Try different combinations (1/2)

Slide 21

Slide 21 text

Outlier Change-point Try different combinations (2/2)

Slide 22

Slide 22 text

CF also has 4 hyperparameters r (ﬂoat; [0, 1]) k (int) T1 (int) T2 (int) discounting rate order (i.e. complexity) of model window size for outliers window size for change-points

Slide 23

Slide 23 text

Alternative change-point detector: Implement Singular Spectrum Transform (SST)

Slide 24

Slide 24 text

SST is much simpler than CF Naive computationally heavy Eﬃcient numerical approximation easy-to-use, robust method single intuitive hyperparameter: window size w (int) (others can be chosen implicitly)

Slide 25

Slide 25 text

time x 1 182.478 2 176.231 3 183.917 4 177.798 5 165.469 … … SELECT  time, changeﬁnder(x, “-changepoint_threshold 0.005") FROM  timeseries ORDER BY time ASC SELECT  time, sst(x, "-threshold 0.005") FROM  timeseries ORDER BY time ASC Change-point detection on Hivemall

Slide 26

Slide 26 text

Q. Which method, option and hyperparameter should I choose?

Slide 27

Slide 27 text

Q. Which method, option and hyperparameter should I choose? A. It depends on data and your preference

Slide 28

Slide 28 text

Slide 29

Slide 29 text

https://github.com/takuti/datadog-anomaly-detector

Slide 30

Slide 30 text

DD supports (simple) outlier detection Set alert by just thresholding outlier scores We need to detect from more complex conditions   reduce false positives (e.g. check if metric-A AND metric-B show high outlier scores) https://www.datadoghq.com/blog/introducing-outlier-detection-in-datadog/

Slide 31

Slide 31 text

Internship Day1: Apply ChangeFinder (Python) for DD metrics Aggregate 1 month points from system.load.norm.5 successfully detected change-point score original points

Slide 32

Slide 32 text

Internship Day2-5: Construct DD anomaly detection system get data points via API new metric for anomaly scores ChangeFinder daemon / CLI tool for replay send record with anomaly scores notify detected anomalies notify errors stream fetch Query

Slide 33

Slide 33 text

Slide 34

Slide 34 text

EPL: Esper’s fancy query language Aggregate metrics (LOOPBACK on Norikra): Detection query: https://github.com/takuti/norikra-udf-dateformat

Slide 35

Slide 35 text

Feedback from @nahi Usability-related requests for daemon’s behavior Supported as soon as possible Feasibility of ChangeFinder for DD metrics CF works as expected on some metrics Hard to ﬁgure out useful metrics due to CF’s instability Lack of Norikra-side evaluation

Slide 36

Slide 36 text

Far from conclusion incident? no problem?

Slide 37

Slide 37 text

Q. Which method, option and hyperparameter should I choose? A. It depends on data and your preference Remark:

Slide 38

Slide 38 text

2-month intern was: enough to implement algorithms & mock system too short to build useful anomaly detector w/ suﬃcient evaluation Future directions Continuous discussions w/ metric observers More static analysis w/ different methods and options re:dash integration + can be research project

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Write tutorial article

Slide 41

Slide 41 text

Churn prediction Focus on subscriber-based services (e.g. mobile telephone network) Churn rate Percentage of individuals who cancelled contract …

Slide 42

Slide 42 text

Step-by-step tutorial on TD w/ td-pandas Preprocessing Model training (80% samples) Prediction (20% samples) Evaluation

Slide 43

Slide 43 text

http://tinyurl.com/td-hivemall-churn-draft

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Q. Which method, option and hyperparameter should I choose? A. It depends on data and your preference Customers’ requirements are MOST important

Slide 46

Slide 46 text

Slide 47

Slide 47 text

Backbone of real-life machine learning Engineering Wide variety of programming skills Integrating numerous middleware Science Understanding concepts behind equations Having practical point of view (e.g. complexity, usability) Human factor Experience on various data to incorporate heuristics Communication skills to get customers’ requirements