Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Under the Surface of Optum's Security Big Data Lake

Elastic Co
March 09, 2017

Under the Surface of Optum's Security Big Data Lake

Optum’s Cyber Defense organization utilizes the Elastic Stack within its Security Big Data Lake (SDBL) to search and pivot between cyber threats. The Hadoop and Elastic architecture of the data lake allows correlation and enrichment of logs prior to Elastic ingestion, accelerating investigation timelines. The SBDL can replace and improve on many cyber products offered by third parties at significantly lower cost and risk.

William Casey l Director, Data Analytics & Security Innovation l Optum
Johanna Favole l Data Scientist l Optum

Elastic Co

March 09, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 2 • Why we built it • How we built

    it • How we use it • Where we’re going
  2. 4 • More than 50M personal records reported stolen.* •

    55% increase in spear-phishing campaigns since 2014.* • 35% increase in crypto-ransomware since 2014.* • 25% increase in total breaches since 2013.* • 125% increase in zero-day vulnerabilities identified since 2014.* Everyone has a cybersecurity problem *Symantec Internet Security Threat Report, Vol. 21, April 2016. • 1.7 times more mobile malware detected in Q2 than Q1.** • Increasing volume of new, unpatchable IP-enabled devices. • Healthcare-focused ransomware proven effective with Hollywood Presbyterian Medical Center ransomware payment. **IT Threat Evolution in Q2 2016, Kaspersky Labs • Every year will be a record-breaking year for cybersecurity threats and attacks.
  3. 5 Our cybersecurity problem A noisy, diverse, fast-changing information environment

    140M customers Thousands of employees with multiple devices Operations in 130 countries 250k+ endpoints 8TB+ raw logs daily 160 different log device categories 7BN events daily Overlapping regulatory requirements Time-sensitive breach notification requirements Non-standard application logging PII leakage Security tools != troubleshooting tools Unusable SIEM data Data silos Inadequate data enrichment
  4. 6 The way forward • Custom development of utilities, sources

    of truth, and non-rules-based reporting • Reduced reliance on third party vendors, who offer data lake-like enrichment on a smaller scale • Unprecedented speed in investigative search • Ownership of organizational data • A scalable, highly-available solution ready for investigators when they need it
  5. 7 Our operational priorities • Ease of use to enable

    new and non-technical employees to become effective quickly and reduce preventable error • Segmented, protected access to data based on need-to-know • Open-source focus to minimize cost to deploy • Multiple search and analysis interfaces to enable analysts to be flexible in response to situational demands • Tools to enable non-specialists to conduct pattern recognition, relationship, and predictive analysis • Support for aggressive, proactive threat hunting
  6. 8 • Open-source • Flexible • Scalable • Easy to

    use • Securable • Functional Why Elastic?
  7. 10 Project management challenges Take 1: Try the enterprise data

    lake • January-June 2015 • One of many tenants • Inflexible services • Support challenges • Unclear requirements • Result was closer to cold storage • Difficult internal cost structure for scale Take 2: Roll your own cluster • June -December 2015 • Underpowered VM architecture • Difficulty in communicating requirements and results • Support and scale challenges Take 3: Get lots of help • December 2015- May 2016 • Dedicated enterprise support and environment • Great third party help • Expectations managed • Progress shown in iterations
  8. 12 The security structure • Group-based access to source data

    streams • Project-based access to analytic projects and aggregated data • Integration with Livy to proxy user jobs Hortonworks HDFS • Document-level or index-level security for each user and role • User-level query and access • Deployment of Elastic Security plugin Elasticsearch • Active Directory integration to manage permissions by global group membership • Apache Shiro for applications without native Active Directory capability • Access restricted to pre-identified hosts and IP addresses with iptables • Encryption zones for each team and data source, with all user-writable areas encrypted • All user access logged All SBDL Access
  9. 13 The data • SIEM loggers • Firewalls • Email

    security and web proxy appliances • Database activity monitors • Endpoint sensors • Vulnerability scans • Security ticketing system • Incident response data collectors Transactional Enriching • IP reputation • Threat feeds • External vulnerabilities • External geolocation • Contextual transaction data • Analyst feedback • Human capital management data • System configuration management data • Enterprise technology management • Acquired entity (AE) references • Application configuration management data • Internal geolocation Referential 7 billion events per day from 160+ sources Reactive • Forensic data collection • Forensic data analysis • Vulnerability scan data correlation
  10. 14 Checking the big data boxes • 10+ TB of

    transactional, referential, and enriching logs daily • Myriad collectors, sensors, aggregators, parsers, filters • Investigative and analytic needs for 1+ year of data Volume Velocity Veracity • Streaming, near-real-time, some daily and weekly batches • Increased velocity reduces time-to-detect • Ability to quickly call back cold storage data for priority investigations • Analyst-derived knowledge • Network and endpoint collectors, agents, and appliances • Organizational enriching and referential data • Combinations of unstructured, semi-structured, and structured data • Difficult to validate completeness from many sources • Volumes vary considerably by time-of-day and day-of-week Variety
  11. 16 Asking bigger questions Question Answer Which user profiles have

    accessed files at strange hours that are exclusively accessed by another business unit? Cluster file access by file and user, then determine movement between clusters by a file or user How do I detect port enumeration if it’s done slowly and inconspicuously? Query the unique number of destination ports by UHG source IP/host over 48-72 hours Which users have downloaded executables or zipped files and have subsequently shown unusual traffic patterns? Query web/proxy logs for file extension types and for all IPs with hits, compare to new user agent strings, Kansa processes, Sinkhole, A/V, and Damballa alerts Are uncommon user agent strings indicative of malicious activity? Gather data from all investigated and rebuilt machines and compare incidence of uncommon user agent strings in those machines compared to others Is this system vulnerable and is the vulnerability being actively exploited? Tying vulnerability scan data to actual security events within the enterprise.
  12. 17 Workstreams The goal The means The outcomes Situational awareness

    Ø Dashboards Ø Ad hoc reports Ø High-level network awareness Ø Topic-specific awareness Ø Infrastructure monitoring Investigation Ø Ad hoc search Ø Saved searches Ø Discovery of IOCs Ø IOC linkage Ø More thorough investigations Analytics Ø Ad hoc search Ø Saved searches Ø Prelert queries Ø Dashboards Ø Reports Ø Established baselines to highlight anomalies Ø Historical context Ø Predictive models Ingestion Ø Enriching data streams in additional indices Ø Organizational context Ø Enhanced partnerships Ø Comprehensive analysis
  13. 18 • Quick, ad hoc searches during investigations • Hundreds

    of saved searches, on topics including – Endpoint malware detection – Database access monitoring alerts – Suspicious email attachments – Spam and spoofed email – Ransomware victims Searching
  14. 21 The system • Currently under par with the SIEM

    on retention, with goal to surpass it • Data volumes increasing 20% annually, requiring expanded hardware to maintain same retention • Starting by adding hard drives to maximize capacity in existing machines Capacity expansion • Upgraded to Elastic 5.x.x with X- Pack • Added separate monitoring cluster with Monitoring (formerly Marvel) and Alerting (formerly Watcher) • Added NiFi and Kafka to manage heterogeneous and streaming data • UBA enhancements Platform upgrades
  15. 22 • Expanding scope beyond Enterprise Information Security to other

    operations and technology groups • Adding new sources of data, including – Threat and intelligence feeds – Vulnerability scan data – Network topology data – Employee hierarchy data • Breaking data out of the ArcSight CEF format when native sources enable greater detail • Learning to use Graph, Prelert/ML,Timelion, and ES-Hadoop Data and analytics
  16. 23 • Superior ØIntegration of new tools to automate forensic

    data collection ØIngestion of collected data into the SBDL ØFaster, more accurate investigations The Great Lakes projects • Michigan ØCreation of a workflow management and ticketing system ØWill integrate with SBDL and Superior workflows ØWill require metadata capture and management • Huron ØPlatform enhancements (e.g., Prelert) ØIngestion enhancements • Erie ØParsing of paid, open-source, and analyst-derived internal and external data ØAutomated lookup of threat data against security data ØIntegration with Threat Connect for IOCs and Knowledge Management • Ontario ØIngestion of internal vulnerability IP scan data, penetration testing, and external vulnerability data ØAutomated lookup of vulnerability data against security data ØApplication for vulnerability prioritization and remediation
  17. 24 • Expanding use of platform as a SIEM •

    Automation and orchestration woven into security fabric • Acquired entity collection strategy • Cloud collection 2017 and beyond