Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch: you know, for more than search

Russ Cam
August 18, 2017

Elasticsearch: you know, for more than search

Talk from NDC Sydney on capabilities in the Elastic stack that can be used for a variety of use cases, including Recommendations, Ad-hoc analysis, Reactive monitoring and Anomaly Detection

Russ Cam

August 18, 2017
Tweet

More Decks by Russ Cam

Other Decks in Technology

Transcript

  1. 2 The Elastic Stack Elastic Cloud Security X-Pack Kibana User

    Interface Elasticsearch Store, Index, & Analyze Ingest Logstash Beats + Alerting Monitoring Graph ML
  2. 11 Documents in Elasticsearch { "id": 1, "likes": ["Pi", "Dark

    City", "Requiem For a Dream"] } { "id": 2, "likes": ["Pi", "Mullholland Drive"] } { "id": 3, "likes": ["Pi", "The Shawshank Redemption", "Fight Club"] } { "id": 4, "likes": ["Pi", "Pulp Fiction"] }
  3. 12 Graphing Relationships Wisdom of Crowds Pi Requiem for A

    Dream Dark City Mullholland Drive The Shawshank Redemption Fight Club Pulp Fiction
  4. 14 Graphing Relationships Super Connected Entities Pi Requiem for A

    Dream Dark City Mullholland Drive The Shawshank Redemption Fight Club Pulp Fiction
  5. 15 Graphing Relationships Super Connected Entities Data Document type Vertices

    Super connected Entities Twitter Tweet Hashtags #YOLO, #MAGA Movies User Favourite Movie The Shawshank Redemption, Fight Club Music User Favourite Band The Beatles, Coldplay, Radiohead Wikipedia Article Linked article United States, Living people Phone Statements Call Phone number Taxi firms, Centrelink Bank Statements Transaction Paid to account Amazon, Paypal, Energy Company
  6. 17 How Graph Databases typically work • Specify connections between

    vertices • Apply a weight to each connection • Calculate a threshold for connections • Use threshold to determine interesting connections Limiting the effect of super connected entities
  7. 18 How Elasticsearch Graph API works Combining graph algorithms with

    search Calculate term frequencies when indexing documents 1 2 3 4 Use terms co- occurrence across documents to infer relationships Calculate statistical significance of inferred relationships Only show significant (interesting) relationships
  8. 19 Significant Terms Aggregation Interesting or unusual occurrences of terms

    in a set All user's favourite movies Users who favourite movies include the movie "Pi" Requiem for a Dream Requiem for a Dream
  9. 21 Etiology of events Using graph for ad-hoc exploration User

    A Port 22 10.0.0.4 10.1.1.53 Port 80 User B Virus
  10. 22 Finding Patient Zero • Every connection could be potentially

    significant • Certainty in implied relationships • Traverse relationships and understand linkages Identifying the source
  11. 26 Typical search queries Query in, documents out Cluster Node

    1 Node 2 Node 3 Query JSON Matching Documents
  12. 27 Percolate queries "Reverse search" Cluster Node 1 Node 2

    Node 3 JSON Document Matching Queries
  13. 28 Percolate use cases Simple Alerting Query Targeting Feedback Classification

    • Price Monitoring • News Alerts • Stock price Alerts • Weather Alerts • Advertisements • Marketplaces • Automatic tagging • Geo tagging • Language Detection
  14. 29 Alerting • Temporal patterns are important • Avoiding alert

    fatigue [Action] when [input] is [condition]. Check it [trigger]. Context is key Cluster Node 1 Node 2 Node 3 Alerting Trigger & Input Condition & Action HTTP INDEX LOG
  15. 32 Anomaly detection • Sometimes, rule based alerts are insufficient

    • Defining normal static thresholds is hard For time series data ? ?
  16. 33 Anomaly detection • What is normal / abnormal? •

    Learning what is normal / abnormal For time series data Probability E-mails per day
  17. 34 Types of anomalies Univariate and multivariate time series Unusual

    traffic on website Unusual traffic on website by country China France UK USA