Slide 1

Slide 1 text

Detecting and Preventing Cybersecurity Threats Stefano Meschiari

Slide 2

Slide 2 text

What you’ll take away 1. What Threat Detection is 2. Why it can be a hard problem 3. Lessons learned @ Duo 4. Summary 
 (Spoiler alert: It’s Still Hard)

Slide 3

Slide 3 text

Stefano Meschiari Senior Data Scientist Duo Security labs.duo.com

Slide 4

Slide 4 text

What is Threat Detection? 
 THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

Slide 5

Slide 5 text

DISCLAIMER: 
 I am not a security expert by any means.

Slide 6

Slide 6 text

● Vectors: ○ Phishing (e.g. spoofed websites, social engineering, etc.) ○ Data breaches ○ Malware (e.g. keyloggers) ○ Breaking and leveraging weak credentials ○ ... ● ~ 2 billion usernames and passwords exposed via breaches in March 2016-2017 (Thomas, Li, et al., 2017). ● ~ 63% of all breaches in the 2016 Verizon Data Breach Investigations Report were associated with credential theft. Credential Theft and Account Takeover

Slide 7

Slide 7 text

“Classic” Threat Detection ● Objective: recognize activity patterns within an access log that are indicative of policy violations or breaches. ● Security analysts analyze data sources (e.g. access logs) and use domain expertise to manually pull out interesting events. ● Many automated threat detection systems are based around signatures and static rules. ○ Need to build extensive domain knowledge (patterns known to be malicious) ○ Organizations and their data tend to evolve quickly ○ Tend to be reactive rather than proactive

Slide 8

Slide 8 text

“Classic” Threat Detection ● Objective: recognize activity patterns within an access log that are indicative of policy violations or outright breaches. ● Security analysts analyze data sources (e.g. access logs) and use domain expertise to manually pull out interesting events. ● Most automated threat detection systems are based around static rules and signatures. ○ Need to build extensive domain knowledge ○ Organizations and their data tend to evolve quickly ○ Tend to be reactive rather than proactive

Slide 9

Slide 9 text

Sounds like a perfect application for ML, right? ...Right?

Slide 10

Slide 10 text

Why do we care? It is a problem that people care about and that is very resource intensive, so it is worth optimizing. Cybersecurity attacks are frequent, on the rise, underreported, and costly. 
 Base rates are hard to quantify. Why does care?

Slide 11

Slide 11 text

● We’re trying to recognize patterns outside the norm. ● These patterns might not look like any other attack we’ve ever seen (novel attacks). NORMAL BEHAVIOR INTRUSION TYPE 2 INTRUSION TYPE 1 NOVEL ATTACK Automated Threat Detection = Anomaly Detection

Slide 12

Slide 12 text

Successful applications of Anomaly Detection ● Academic work has shown that many algorithms can successfully detect attacks when tested on common synthetic datasets. see, e.g., Gilmore & Hayadaman, 2016 TAKE YOUR PICK! ○ Simple density-based / statistical detectors ○ Clustering ○ Bayesian Networks ○ Isolation Forests ○ One-class SVMs ○ Inductive Association Rules, Frequent Sequence Mining ○ NLP-inspired DL models ○ … That success doesn’t always translate to an industrial setting...

Slide 13

Slide 13 text

...because Threat Detection with ML is hard.* 
 * Well, everything is hard. THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

Slide 14

Slide 14 text

A convergence of Awful A non-exhaustive list. See, e.g., Sommer & Paxson, 2010; Axelsson, 1999 ● Annoying data ● Lack of Labels ● Diversity of benign behavior ● Anomaly ≠ Threat ● High cost of mispredictions ● Usability > Metrics

Slide 15

Slide 15 text

Data types that can be annoying to work with 42.42 We like numbers 70.114.110.15
 Austin, TX Google Chrome We like categoricals less

Slide 16

Slide 16 text

Lack of Labels ● Very small fraction, or no labels at all (Attacks are rare). ● Unsupervised. ● Strong assumptions on “normality” vs. “attack”. ● Without ground truth, you can’t verify your assumptions (easily)

Slide 17

Slide 17 text

Diversity of 
 benign behavior ● We assume normal behavior is uniform and frequent. “Sales” “Engineering” “HR/Recruiting” “Anomalies” ● In reality, diverse benign behavior across multiple users.

Slide 18

Slide 18 text

High anomaly score ≠ Threat

Slide 19

Slide 19 text

High costs of mispredictions FP FN Loss of 
 time and trust (Catastrophic) 
 Data breaches

Slide 20

Slide 20 text

Risk Score Prioritization of events Components of a Threat Detection system

Slide 21

Slide 21 text

Risk Score Prioritization of events Components of a Usable Threat Detection system Context
 Providing contextual behavior information Explanation
 Justifying model decisions Feedback
 Human-in-the-loop knowledge transfer Action
 Suggesting next steps and mitigations A great model produces useful, understandable, actionable detections. Model accuracy is not sufficient.

Slide 22

Slide 22 text

Lessons Learned Building a Threat Detection system
 @ Duo THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

Slide 23

Slide 23 text

What does Duo do?

Slide 24

Slide 24 text

Duo’s User and Entity Behavior Analytics (UEBA) ● Currently in beta ● Tested with a core group of Duo customers

Slide 25

Slide 25 text

Lesson 1: Keep your scope small (but relevant).

Slide 26

Slide 26 text

Data captured as part of the authentication flow Authentication Log ● Company ● User ● Application ● Time of access ● Outcome of authentication ● Properties of the access device ● Properties of the second factor ● Network of origin

Slide 27

Slide 27 text

Production pipeline architecture Normalize data sources Extract features Train models Artifacts Models Database Queue or 
 Database Normalize data sources Extract features Score and explain Presentation of potential security events Spark on EMR Spark on EMR

Slide 28

Slide 28 text

Current state ● The pipeline builds build customer- and user-level models that consider: ○ Distribution of the geographical origin, timestamp, and essential attributes of the authentications; ○ Surrounding context at a session level; ○ Calibration information derived from recent historical data. ● We return an anomaly score paired with an explanation (“authenticated from a novel device, ...”) and context (surrounding auths for the user).

Slide 29

Slide 29 text

Lesson 2: Production ≠ Research.* * especially if there’s a lot of research left to be done.

Slide 30

Slide 30 text

Lesson 3: Figure out the Monitoring Story Early.* * especially in an adversarial environment.

Slide 31

Slide 31 text

Lesson 4: Don’t Trust your Labels Implicitly.

Slide 32

Slide 32 text

First pass: we have labels...?

Slide 33

Slide 33 text

Lesson 4: Don’t Trust your Labels Implicitly.* * because non-expert users are not good labelers.

Slide 34

Slide 34 text

Lesson 5: Ask your Operators.* * you don’t have to wait for your models to be perfect.

Slide 35

Slide 35 text

Get feedback as early as possible ● We sought feedback from alpha clients very early on, and kept iterating. ● Gives iterative insights into: ○ Threat models 
 What do they consider suspicious vs. merely anomalous? what is “useful”? ○ Semantic gap 
 Can we explain our outputs? can they explain their decisions? ○ User populations 
 Dave does X all the time when Y, and it’s not suspicious ○ Use cases that might be rare in the customer base at large.

Slide 36

Slide 36 text

Conclusions THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

Slide 37

Slide 37 text

Summary ● Machine Learning can be a powerful tool for Threat Detection, but using it in production requires a lot of care. ● The risk score is only a small fraction of the work. ● The business metric is whether your end user finds predictions useful and usable. ● We have a lot of ongoing research - both on improving ML models, and on usability. THANK YOU! www.stefanom.io @RhymeWithEskimo

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Outstanding Issues ● Large variance in the amount and variety of data produced by users and companies. ● Unsupervised models are still annoying, and hard to deal with the temporal component. 
 ● Categorize blocks of authentications into “tasks” to characterize normal and fraudulent behavior. Research Threads ● Learn or impose a data hierarchy. 
 
 ● Point process models can describe temporal patterns. ○ Intrigued? Attend Bronwyn Woods’ talk at CAMLIS 2018. ● Association rules can help surface frequent actions shared by groups of users.

Slide 40

Slide 40 text

What makes it hard? ● Labels, Labels, Labels ○ For known attacks: if any labels at all, extreme class imbalance. ○ For novel attacks: nah. ● High cost of both false positives and false negatives. ● Dearth of realistic public datasets. ● Assumes some notion of normal behavior, but there are multiple agents with potentially very different behaviors.

Slide 41

Slide 41 text

What makes it hard? There’s more! ● Interested in sequential and temporal anomalies, not just point anomalies. ● Mixed data types with varying proportions of categorical, boolean, and numerical data. ● Context is important. ● Hard to distinguish between noise, anomalies, and actual security events. 
 High anomaly score ≠ Threat. see, e.g., Sommer & Paxson, 2010

Slide 42

Slide 42 text

Pain points ● Production ≠ Research ○ Tension between what’s needed for fast, iterative research and more heavyweight processes for production. ○ Tooling mismatch. ● We don’t have a good story for: ○ Monitoring model performance ○ Dynamically scheduling retraining ● These are very both important components, especially in an adversarial environment.

Slide 43

Slide 43 text

Example Vector: Email Phishing