Slide 1

Slide 1 text

‹#› Zachary Tong [email protected] Building a Statistical Anomaly Detector

Slide 2

Slide 2 text

2 The Problem • 45m data points • 75,000 time-series • 8 large-scale, simulated “disruptions”

Slide 3

Slide 3 text

3 Some Random Disruptions

Slide 4

Slide 4 text

4 eBay’s Atlas Monitoring System • Atlas was designed to monitor eBay search results in real-time • Built in-house, but they published a paper • I wanted to re-implement it in Elasticsearch • Goldberg, David, and Yinan Shan. "The importance of features for statistical anomaly detection." 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15). 2015.

Slide 5

Slide 5 text

5 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number of results • Average price • Min price • Max price • Average age • Distinct sellers • etc

Slide 6

Slide 6 text

6 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number of results • Average price • Min price • Max price • Average age • Distinct sellers • etc

Slide 7

Slide 7 text

7 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number of results • Average price • Min price • Max price • Average age • Distinct sellers • etc

Slide 8

Slide 8 text

8 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number of results • Average price • Min price • Max price • Average age • Distinct sellers • etc

Slide 9

Slide 9 text

9 Source: Gray Arial10pt Can be any type of metric! Netflow Click traffic Server stats Cohort analysis IoT Sensor readings Marketing campaigns … Just needs timestamp + numeric value

Slide 10

Slide 10 text

10 Finding “surprising” series

Slide 11

Slide 11 text

11 Calculate series average Finding “surprising” series

Slide 12

Slide 12 text

12 Find largest “surprise” Finding “surprising” series

Slide 13

Slide 13 text

13 Repeat for all series 25 74 15 3 19 82 Finding “surprising” series

Slide 14

Slide 14 text

14 Sort the surprise 25 74 15 3 19 82 82 74 25 19 15 3 Finding “surprising” series

Slide 15

Slide 15 text

15 Calculate 95th Percentile 25 74 15 3 19 82 82 74 25 19 15 3 68 Finding “surprising” series

Slide 16

Slide 16 text

16 Plot value, wait n minutes 25 74 15 3 19 82 82 74 25 19 15 3 68 Finding “surprising” series

Slide 17

Slide 17 text

17 Repeat entire procedure Finding “surprising” series

Slide 18

Slide 18 text

18 28 62 23 25 4 19 Repeat entire procedure Finding “surprising” series

Slide 19

Slide 19 text

19 Repeat entire procedure 62 28 25 23 19 4 28 62 23 25 4 19 Finding “surprising” series

Slide 20

Slide 20 text

20 Repeat entire procedure 62 28 25 23 19 4 61 28 62 23 25 4 19 Finding “surprising” series

Slide 21

Slide 21 text

21 Finding “surprising” series Repeat entire procedure 62 28 25 23 19 4 61 28 62 23 25 4 19

Slide 22

Slide 22 text

22 Surprise Time Top 95th percentile Surprise Flagging Anomalies

Slide 23

Slide 23 text

23 Surprise Time Top 95th percentile Surprise 3 standard deviation threshold Flagging Anomalies

Slide 24

Slide 24 text

24 Flagging Anomalies Surprise Time Top 95th percentile Surprise Anomaly! 3 standard deviation threshold

Slide 25

Slide 25 text

25 Turns meaningless data …. into discrete alerts

Slide 26

Slide 26 text

26 Elasticsearch Pipeline Aggregations Generates the raw data Terms Terms Date_histo Avg Moving Avg Bucket Script Max Bucket Percentiles Bucket

Slide 27

Slide 27 text

27 Watcher Executes data collection & anomaly detector aggs

Slide 28

Slide 28 text

28 Kibana’s Timelion Flexible ad-hoc charting

Slide 29

Slide 29 text

29 Resources • eBay’s original article http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/
 • “Implementing a Statistical Anomaly Detector in Elasticsearch” https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-1 https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-2 https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-3

Slide 30

Slide 30 text

‹#› Questions?

Slide 31

Slide 31 text

‹#› Please attribute Elastic with a link to elastic.co Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 31