Building a Statistical Anomaly Detector in Elasticsearch

Building a Statistical Anomaly Detector in Elasticsearch

Deca372d0ae79d8d84f90e1e68bd1618?s=128

Zachary Tong

April 19, 2016
Tweet

Transcript

  1. ‹#› Zachary Tong zach@elastic.co Building a Statistical Anomaly Detector

  2. 2 The Problem • 45m data points • 75,000 time-series

    • 8 large-scale, simulated “disruptions”
  3. 3 Some Random Disruptions

  4. 4 eBay’s Atlas Monitoring System • Atlas was designed to

    monitor eBay search results in real-time • Built in-house, but they published a paper • I wanted to re-implement it in Elasticsearch • Goldberg, David, and Yinan Shan. "The importance of features for statistical anomaly detection." 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15). 2015.
  5. 5 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  6. 6 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  7. 7 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  8. 8 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  9. 9 Source: Gray Arial10pt Can be any type of metric!

    Netflow Click traffic Server stats Cohort analysis IoT Sensor readings Marketing campaigns … Just needs timestamp + numeric value
  10. 10 Finding “surprising” series

  11. 11 Calculate series average Finding “surprising” series

  12. 12 Find largest “surprise” Finding “surprising” series

  13. 13 Repeat for all series 25 74 15 3 19

    82 Finding “surprising” series
  14. 14 Sort the surprise 25 74 15 3 19 82

    82 74 25 19 15 3 Finding “surprising” series
  15. 15 Calculate 95th Percentile 25 74 15 3 19 82

    82 74 25 19 15 3 68 Finding “surprising” series
  16. 16 Plot value, wait n minutes 25 74 15 3

    19 82 82 74 25 19 15 3 68 Finding “surprising” series
  17. 17 Repeat entire procedure Finding “surprising” series

  18. 18 28 62 23 25 4 19 Repeat entire procedure

    Finding “surprising” series
  19. 19 Repeat entire procedure 62 28 25 23 19 4

    28 62 23 25 4 19 Finding “surprising” series
  20. 20 Repeat entire procedure 62 28 25 23 19 4

    61 28 62 23 25 4 19 Finding “surprising” series
  21. 21 Finding “surprising” series Repeat entire procedure 62 28 25

    23 19 4 61 28 62 23 25 4 19
  22. 22 Surprise Time Top 95th percentile Surprise Flagging Anomalies

  23. 23 Surprise Time Top 95th percentile Surprise 3 standard deviation

    threshold Flagging Anomalies
  24. 24 Flagging Anomalies Surprise Time Top 95th percentile Surprise Anomaly!

    3 standard deviation threshold
  25. 25 Turns meaningless data …. into discrete alerts

  26. 26 Elasticsearch Pipeline Aggregations Generates the raw data Terms Terms

    Date_histo Avg Moving Avg Bucket Script Max Bucket Percentiles Bucket
  27. 27 Watcher Executes data collection & anomaly detector aggs

  28. 28 Kibana’s Timelion Flexible ad-hoc charting

  29. 29 Resources • eBay’s original article http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/
 • “Implementing a

    Statistical Anomaly Detector in Elasticsearch” https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-1 https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-2 https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-3
  30. ‹#› Questions?

  31. ‹#› Please attribute Elastic with a link to elastic.co Except

    where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 31