Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Statistical Anomaly Detector in Elasticsearch

Building a Statistical Anomaly Detector in Elasticsearch

Zachary Tong

April 19, 2016
Tweet

More Decks by Zachary Tong

Other Decks in Technology

Transcript

  1. ‹#›
    Zachary Tong
    [email protected]
    Building a Statistical
    Anomaly Detector

    View Slide

  2. 2
    The Problem
    • 45m data points
    • 75,000 time-series
    • 8 large-scale, simulated
    “disruptions”

    View Slide

  3. 3
    Some Random Disruptions

    View Slide

  4. 4
    eBay’s Atlas Monitoring System
    • Atlas was designed to monitor eBay search results in real-time
    • Built in-house, but they published a paper
    • I wanted to re-implement it in Elasticsearch
    • Goldberg, David, and Yinan Shan. "The importance of features for statistical anomaly
    detection." 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15).
    2015.

    View Slide

  5. 5
    Queries and Metrics
    Query: “thinkpad laptop”
    Metrics: • Number of results
    • Average price
    • Min price
    • Max price
    • Average age
    • Distinct sellers
    • etc

    View Slide

  6. 6
    Queries and Metrics
    Query: “thinkpad laptop”
    Metrics: • Number of results
    • Average price
    • Min price
    • Max price
    • Average age
    • Distinct sellers
    • etc

    View Slide

  7. 7
    Queries and Metrics
    Query: “thinkpad laptop”
    Metrics: • Number of results
    • Average price
    • Min price
    • Max price
    • Average age
    • Distinct sellers
    • etc

    View Slide

  8. 8
    Queries and Metrics
    Query: “thinkpad laptop”
    Metrics: • Number of results
    • Average price
    • Min price
    • Max price
    • Average age
    • Distinct sellers
    • etc

    View Slide

  9. 9
    Source: Gray Arial10pt
    Can be any type of metric!
    Netflow
    Click traffic
    Server stats
    Cohort analysis
    IoT Sensor readings
    Marketing campaigns

    Just needs timestamp + numeric value

    View Slide

  10. 10
    Finding “surprising” series

    View Slide

  11. 11
    Calculate
    series average
    Finding “surprising” series

    View Slide

  12. 12
    Find largest “surprise”
    Finding “surprising” series

    View Slide

  13. 13
    Repeat for all series
    25
    74
    15
    3
    19
    82
    Finding “surprising” series

    View Slide

  14. 14
    Sort the surprise
    25
    74
    15
    3
    19
    82
    82
    74
    25
    19
    15
    3
    Finding “surprising” series

    View Slide

  15. 15
    Calculate 95th
    Percentile
    25
    74
    15
    3
    19
    82
    82
    74
    25
    19
    15
    3
    68
    Finding “surprising” series

    View Slide

  16. 16
    Plot value,
    wait n minutes
    25
    74
    15
    3
    19
    82
    82
    74
    25
    19
    15
    3
    68
    Finding “surprising” series

    View Slide

  17. 17
    Repeat entire
    procedure
    Finding “surprising” series

    View Slide

  18. 18
    28
    62
    23
    25
    4
    19
    Repeat entire
    procedure
    Finding “surprising” series

    View Slide

  19. 19
    Repeat entire
    procedure
    62
    28
    25
    23
    19
    4
    28
    62
    23
    25
    4
    19
    Finding “surprising” series

    View Slide

  20. 20
    Repeat entire
    procedure
    62
    28
    25
    23
    19
    4
    61
    28
    62
    23
    25
    4
    19
    Finding “surprising” series

    View Slide

  21. 21
    Finding “surprising” series
    Repeat entire
    procedure
    62
    28
    25
    23
    19
    4
    61
    28
    62
    23
    25
    4
    19

    View Slide

  22. 22
    Surprise
    Time
    Top 95th percentile Surprise
    Flagging Anomalies

    View Slide

  23. 23
    Surprise
    Time
    Top 95th percentile Surprise
    3 standard deviation threshold
    Flagging Anomalies

    View Slide

  24. 24
    Flagging Anomalies
    Surprise
    Time
    Top 95th percentile Surprise
    Anomaly!
    3 standard deviation threshold

    View Slide

  25. 25
    Turns meaningless data …. into discrete alerts

    View Slide

  26. 26
    Elasticsearch
    Pipeline
    Aggregations
    Generates the raw data
    Terms
    Terms
    Date_histo
    Avg
    Moving Avg
    Bucket Script
    Max Bucket
    Percentiles Bucket

    View Slide

  27. 27
    Watcher
    Executes data collection
    & anomaly detector aggs

    View Slide

  28. 28
    Kibana’s
    Timelion
    Flexible ad-hoc
    charting

    View Slide

  29. 29
    Resources
    • eBay’s original article
    http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/

    • “Implementing a Statistical Anomaly Detector in Elasticsearch”
    https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-1
    https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-2
    https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-3

    View Slide

  30. ‹#›
    Questions?

    View Slide

  31. ‹#›
    Please attribute Elastic with a link to elastic.co
    Except where otherwise noted, this work is licensed under
    http://creativecommons.org/licenses/by-nd/4.0/
    Creative Commons and the double C in a circle are
    registered trademarks of Creative Commons in the United States and other countries.
    Third party marks and brands are the property of their respective holders.
    31

    View Slide