Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting Insider Threats with Elastic

Elastic Co
October 19, 2016

Detecting Insider Threats with Elastic

RedOwl combines sentiment analysis with user behavioral analytics to detect unwanted behavior within the enterprise, and Elasticsearch is at the heart of all of it. We'll talk architecture, data modeling, and aggregations in a brief overview of some of our applications's core functions.
Russel Snyder | Principal Engineer | RedOwl Analytics
Adam Reeve | Principal Architect | RedOwl Analytics

Elastic Co

October 19, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 3 { } Joseph Blankenship, Forrester Research Treating insiders as

    a technology problem ignores the human aspects of motivation and behavior.
  2. Result Action Recon Target Vulnerability Tool 4 Motivation Actor Employee,

    Contractor, Manager Money, Power, Revenge, Blackmail Account Privilege, Physical Access, Admin Culture, Policy, Flaws PII, IP, Keys, Money, Plans Scan, Research, Probe Read, Copy, Destroy, Modify, Steal Reputational Damage, Financial Loss, Outage The Insider Threat Kill Chain
  3. 5 Entitled Insider | IP Thief “Recruited by a competitor.

    Took client lists, product ideas, internal working documents - everything he’d ever been a part of.” Blackmailed Developer | PII Thief “Social media posts about financial troubles led by a ‘recruiter’ to contact her. Simple requests quickly escalated into blackmail.” Insiders aren’t Metadata, they’re People Disgruntled Employee | Saboteur “Huge fight with boss. Quit and deployed time-bomb corrupting our HR system, inserted false transactions in a client back-end system.”
  4. 7 Unstructured data ▸  Many formats, many sources Some common,

    some unique 100,000’s of events per day, inconsistent ordering Complex analytics ▸  Create valuable, actionable insight Diverse deployment model ▸  Public Cloud ▸  Client Private Cloud ▸  Client Bare Metal This is a Hard Problem
  5. 10 Reveal Analytics Service (c. 2012) ▸  Custom Hadoop workflow

    ▸  Heavy-weight inferential statistics engine ▸  Such cool, many novel So many issues… ▸  Resource intensive ▸  Difficult to configure/manage ▸  Interpretability of results Early Days of RedOwl
  6. 11 What questions are we really trying to answer? ▸ 

    What is “risky” behavior? ▸  Which entities are being “risky”? Example: IP Theft ▸  What are the behaviors one might exhibit? •  Late night activity •  Use of internal keywords •  Use of attachments •  Data export ▸  How do we model these? Enter Elastic
  7. 12 Features ▸  Scoring events on ingest ▸  E.g. -

    “late night”, “project keywords”, etc. Models ▸  Define what “risk” means ▸  Aggregate features by entity at runtime ▸  Pre-canned models capture most common definitions of “risk” Key Differentiators ▸  Lighter, faster, more configurable ▸  Makes customization possible Anomaly Detection with Elastic
  8. Printed Data Breach: Event Document { “type”: “printlog”, “entities”: [

    {“roleId”: “user”, “entity”: “John Doe”}, {“roleId”: “device”, “entity”: “MX-1234K”} ], “attachments”: [ {“name”: “client-list.xslx”, “content”: “Client,Addr,...”} ], “features”: { “late_night”: {“score”: 1, “percentile”: 0.999}, “client_names”: {“score”: 4.782, “percentile”: 0.971} } }
  9. 14 Function Score Query over Features ▸  Each feature gets

    its own function ▸  Function scores combine to make model score ▸  Feeds into aggregation Filtered Query for Subsetting ▸  Since corpus represented by query, filter to score different groups of events •  e.g. - organizational unit, country, etc. ▸  Can subset by any event-level metadata Printed Data Breach: Query
  10. 15 Aggregate model score by user... ▸  Bucket by user

    •  Nested (roles) •  Filter (role = user) •  Terms (role.entity) •  Reverse Nested ▸  Stats across “_score” •  We use sum or average ...or bucket by literally anything else. ▸  Create buckets across any event-level field •  e.g. - datacenter, domain, portfolio, etc. ▸  Great degree of flexibility Printed Data Breach: Aggregation
  11. Printed Data Breach: Feature Model Query IP Theft 16 Query:

    function_score( ) Aggregation: nested( ) Aggregation: filter( ) Aggregation: terms( ) Path: roles Query: roles.roleId:user Field: roles.entity Aggregation: reverse_nested() Aggregation: stats( ) Script: _score Function: field_value_factor( ) Score Mode: sum Field: features.late_night.percentile Function: field_value_factor( ) Field: features.client_names.percentile
  12. 17 Models can be defined after ingest ▸  Allows “risk”

    to mean more/less anything ▸  Limited to features we can score Elasticsearch powers exploration of context ▸  E.g. - “Show me all of the riskiest entity’s activity” ▸  Denormalization makes this possible How do you have that much memory? ▸  We don’t, we use UX/UI/workflow ▸  Precludes users from doing dumb things Oh the Possibilities...
  13. 18 2016 Case Study on RedOwl ▸  Global private equity

    firm ▸  Identified multiple cases of undesirable behavior •  Negligent sharing of information internally •  Malicious theft of private information by departing employees ▸  Differentiators •  Unstructured data analysis •  High-fidelity reporting •  Faster, comprehensive investigation RedOwl in Practice
  14. 19 Elastic Product Suite ▸  Marvel •  A true blessing

    performance testing ▸  Shield •  User based “entitlements” ▸  Watcher •  Alerts on cluster stability and/or risk scores Elastic Support ▸  Quick Start w/ Architectural review ▸  Client confidence w/ Level 1 ▸  Engineering confidence w/ Level 3 RedOwl + Elastic OEM = <3
  15. Except where otherwise noted, this work is licensed under h6p://crea:vecommons.org/licenses/by-nd/4.0/

    Crea:ve Commons and the double C in a circle are registered trademarks of Crea:ve Commons in the United States and other countries. Third party marks and brands are the property of their respec:ve holders.