Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Statistical Anomaly Detector in Elasticsearch

Building a Statistical Anomaly Detector in Elasticsearch

Zachary Tong

April 19, 2016
Tweet

More Decks by Zachary Tong

Other Decks in Technology

Transcript

  1. 2 The Problem • 45m data points • 75,000 time-series

    • 8 large-scale, simulated “disruptions”
  2. 4 eBay’s Atlas Monitoring System • Atlas was designed to

    monitor eBay search results in real-time • Built in-house, but they published a paper • I wanted to re-implement it in Elasticsearch • Goldberg, David, and Yinan Shan. "The importance of features for statistical anomaly detection." 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15). 2015.
  3. 5 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  4. 6 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  5. 7 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  6. 8 Queries and Metrics Query: “thinkpad laptop” Metrics: • Number

    of results • Average price • Min price • Max price • Average age • Distinct sellers • etc
  7. 9 Source: Gray Arial10pt Can be any type of metric!

    Netflow Click traffic Server stats Cohort analysis IoT Sensor readings Marketing campaigns … Just needs timestamp + numeric value
  8. 13 Repeat for all series 25 74 15 3 19

    82 Finding “surprising” series
  9. 14 Sort the surprise 25 74 15 3 19 82

    82 74 25 19 15 3 Finding “surprising” series
  10. 15 Calculate 95th Percentile 25 74 15 3 19 82

    82 74 25 19 15 3 68 Finding “surprising” series
  11. 16 Plot value, wait n minutes 25 74 15 3

    19 82 82 74 25 19 15 3 68 Finding “surprising” series
  12. 18 28 62 23 25 4 19 Repeat entire procedure

    Finding “surprising” series
  13. 19 Repeat entire procedure 62 28 25 23 19 4

    28 62 23 25 4 19 Finding “surprising” series
  14. 20 Repeat entire procedure 62 28 25 23 19 4

    61 28 62 23 25 4 19 Finding “surprising” series
  15. 26 Elasticsearch Pipeline Aggregations Generates the raw data Terms Terms

    Date_histo Avg Moving Avg Bucket Script Max Bucket Percentiles Bucket
  16. 29 Resources • eBay’s original article http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/
 • “Implementing a

    Statistical Anomaly Detector in Elasticsearch” https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-1 https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-2 https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-3
  17. ‹#› Please attribute Elastic with a link to elastic.co Except

    where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 31