Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic User Group NL - Feb 2017

Avatar for stevedodson stevedodson
February 03, 2017

Elastic User Group NL - Feb 2017

Anomaly Detection and Machine Learning with Elasticsearch

Avatar for stevedodson

stevedodson

February 03, 2017
Tweet

Other Decks in Technology

Transcript

  1. 1 Dr. Stephen Dodson Tech Lead Machine Learning, Elastic Machine

    Learning and the Elastic Stack © Elasticsearch BV
  2. 2 Overview •  Background •  Machine Learning and the Elastic

    Stack •  Demo •  Technical Deep Dive! •  Architecture © Elasticsearch BV
  3. Background •  Me –  Currently, Tech Lead, Machine Learning @

    Elastic –  Formally, Founder and CTO of Prelert (acquired by Elastic September 2016) ‒  Presented overview of Prelert at Elastic London User Group in May 2016 •  Prelert –  VC backed software company, founded 2009 –  Behavioural analytics for machine data based (mainly) on unsupervised machine learning –  100+ customers + OEMs with CA, Bluecoat, NetApp + others ‒  IT Operations, IT Security, Retail analytics, IoT etc..
  4. 4 Machine Learning •  Algorithms and methods for data driven

    prediction, decision making, and modelling1 ‒  Learn models from past behaviour (training, modelling) ‒  Use models to predict future behaviour (prediction) ‒  Use predictions to make decisions •  Examples ‒  Image Recognition ‒  Language Translation ‒  Anomaly Detection 1Machine Learning Overview, Tommi Jaakkola, MIT © Elasticsearch BV
  5. 5 How is this relevant to the Elastic Stack? • 

    Extracting useful, valuable information is hard Search Aggregations Visualization Machine Learning © Elasticsearch BV
  6. 6 How is this relevant to the Elastic Stack? • 

    What if we want to search for: ‒  Has my order rate dropped significantly? ‒  Do my application logs contain unusual messages? ‒  Are any users behaving unusually? ‒  What transactions are fraudulent? •  Goal of ML at Elastic: Extend the Elastic Stack to allow the user to ask these type of questions and get understandable answers •  Constraints: ‒  Data may be limited: no markup may be available or relevant ‒  Compute resource dedicated to machine learning may be limited ‒  User should not need to be a machine learning expert or data scientist © Elasticsearch BV
  7. 8 Has my order rate dropped significantly? •  Learn models

    from past behaviour (training, modelling) •  Use models to predict future behaviour (prediction) •  Use predictions to make decisions Expected value @ 15:05 = 1859 Actual value @ 15:05 = 280 Probability = 0.0000174025 © Elasticsearch BV
  8. 11 Do my application logs contain unusual messages? Classify unstructured

    log messages by clustering similar messages Normal Log Messages Unusual log Messages © Elasticsearch BV
  9. 14 Problem Overview •  Overview of Problem: ‒  Obtain actionable,

    defensible insight into useful anomalous behaviours in multi-dimensional time series in real-time ‒  Not all anomalous behaviour is operationally useful ‒  Not just making the best prediction about what the system will do, but understanding the probability of the data being normal or a useful anomaly ‒  Implementation must be able to: ‒  Scale to huge volumes of data on reasonable hardware ‒  Operate online as close to real-time as possible ‒  Adapt to change in the underlying data ‒  Robust to arbitrary data (general purpose) •  Overview of Solution: ‒  Use unsupervised machine learning to create a predictive model for the distribution of feature values at given time as a function of time, based on the historical values we have seen to date. © Elasticsearch BV
  10. 18 Univariate Metric Example one-of-n: weight 0.25 gamma mean =

    18.38113 std = 14.32349 weight 0.25 log-normal mean = 18.30681 std = 10.18887 weight 0.25 normal mean = 18.36997 std = 13.22049 weight 0.25 multimodal: weight 0.25 # samples 3.664231 weight 0.25 weight 1 one-of-n: weight 0.25 weight 1 weight 0.3333333 gamma mean = 18.38113 std = 14.32349 weight 0.25 weight 1 weight 0.3333333 log-normal mean = 18.31183 std = 10.18451 weight 0.25 weight 1 weight 0.3333333 normal mean = 18.38113 std = 13.18854 one-of-n: weight 0.9872225 multimodal: weight 0.9872225 # samples 114.2597 weight 0.9872225 weight 0.4498603 one-of-n: weight 0.9872225 weight 0.4498603 weight 0.2418992 gamma mean = 10.52094 std = 1.728486 weight 0.9872225 weight 0.4498603 weight 0.2711693 log-normal mean = 10.52424 std = 1.745909 weight 0.9872225 weight 0.4498603 weight 0.4869315 normal mean = 10.52094 std = 1.722452 weight 0.9872225 weight 0.5501397 one-of-n: weight 0.9872225 weight 0.5501397 weight 0.3222726 gamma mean = 26.6806 std = 8.068258 weight 0.9872225 weight 0.5501397 weight 0.5707544 log-normal mean = 26.69812 std = 8.290151 weight 0.9872225 weight 0.5501397 weight 0.1069731 normal mean = 26.6806 std = 8.097122
  11. 26 Analytics Outside of Elastic Architecture Beats Logstash Kibana X-Pack

    X-Pack Elasticsearch Prelert analysis node Data Kibana Prelert UI •  Issues –  Data Gravity – data from Elasticsearch needs to be sent to Prelert analytics node –  Context – anomalies and data are stored in different data stores and viewed in different Uis –  Scale – Prelert analysis was not easily distributable across nodes –  Resilience – Prelert analysis needed to be restored manually on failover © Elasticsearch BV
  12. 27 Architecture •  Machine Learning will be part of X-Pack

    •  Machine Learning jobs will be automatically distributed across the Elasticsearch cluster •  Machine Learning jobs will be resilient to failover •  Machine Learning results and data can be in the same cluster Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning ICON TBD!! X-Pack © Elasticsearch BV
  13. 28 Status •  Demo on Elastic 5.4 available at Elastic{ON}

    (March 7th 2017) •  GA shortly after… (ask Sophie!) •  Focus of initial ML product is time series analysis in real-time ‒  Metric anomaly detection ‒  Log message classification and anomaly detection ‒  Population analysis (entity profiling) •  Shrink-wrapped configurations on Beats data - full Elastic Stack experience! Beats X-Pack Elasticsearch Alerting Machine Learning ICON TBD!! Kibana © Elasticsearch BV