Detecting and Preventing Cybersecurity Threats

Detecting and Preventing Cybersecurity Threats Stefano Meschiari

What you’ll take away 1. What Threat Detection is 2.
Why it can be a hard problem 3. Lessons learned @ Duo 4. Summary   (Spoiler alert: It’s Still Hard)

Stefano Meschiari Senior Data Scientist Duo Security labs.duo.com

What is Threat Detection?   THREAT DETECTION • THE AWFULNESS
• ML AT DUO • CONCLUSIONS

DISCLAIMER:   I am not a security expert by any
means.

• Vectors: ◦ Phishing (e.g. spoofed websites, social engineering, etc.)
◦ Data breaches ◦ Malware (e.g. keyloggers) ◦ Breaking and leveraging weak credentials ◦ ... • ~ 2 billion usernames and passwords exposed via breaches in March 2016-2017 (Thomas, Li, et al., 2017). • ~ 63% of all breaches in the 2016 Verizon Data Breach Investigations Report were associated with credential theft. Credential Theft and Account Takeover

“Classic” Threat Detection • Objective: recognize activity patterns within an
access log that are indicative of policy violations or breaches. • Security analysts analyze data sources (e.g. access logs) and use domain expertise to manually pull out interesting events. • Many automated threat detection systems are based around signatures and static rules. ◦ Need to build extensive domain knowledge (patterns known to be malicious) ◦ Organizations and their data tend to evolve quickly ◦ Tend to be reactive rather than proactive

“Classic” Threat Detection • Objective: recognize activity patterns within an
access log that are indicative of policy violations or outright breaches. • Security analysts analyze data sources (e.g. access logs) and use domain expertise to manually pull out interesting events. • Most automated threat detection systems are based around static rules and signatures. ◦ Need to build extensive domain knowledge ◦ Organizations and their data tend to evolve quickly ◦ Tend to be reactive rather than proactive

Sounds like a perfect application for ML, right? ...Right?

Why do we care? It is a problem that people
care about and that is very resource intensive, so it is worth optimizing. Cybersecurity attacks are frequent, on the rise, underreported, and costly.   Base rates are hard to quantify. Why does care?

• We’re trying to recognize patterns outside the norm. •
These patterns might not look like any other attack we’ve ever seen (novel attacks). NORMAL BEHAVIOR INTRUSION TYPE 2 INTRUSION TYPE 1 NOVEL ATTACK Automated Threat Detection = Anomaly Detection

Successful applications of Anomaly Detection • Academic work has shown
that many algorithms can successfully detect attacks when tested on common synthetic datasets. see, e.g., Gilmore & Hayadaman, 2016 TAKE YOUR PICK! ◦ Simple density-based / statistical detectors ◦ Clustering ◦ Bayesian Networks ◦ Isolation Forests ◦ One-class SVMs ◦ Inductive Association Rules, Frequent Sequence Mining ◦ NLP-inspired DL models ◦ … That success doesn’t always translate to an industrial setting...

...because Threat Detection with ML is hard.*   * Well,
everything is hard. THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

A convergence of Awful A non-exhaustive list. See, e.g., Sommer
& Paxson, 2010; Axelsson, 1999 • Annoying data • Lack of Labels • Diversity of benign behavior • Anomaly ≠ Threat • High cost of mispredictions • Usability > Metrics

Data types that can be annoying to work with 42.42
We like numbers 70.114.110.15  Austin, TX Google Chrome We like categoricals less

Lack of Labels • Very small fraction, or no labels
at all (Attacks are rare). • Unsupervised. • Strong assumptions on “normality” vs. “attack”. • Without ground truth, you can’t verify your assumptions (easily)

Diversity of   benign behavior • We assume normal behavior
is uniform and frequent. “Sales” “Engineering” “HR/Recruiting” “Anomalies” • In reality, diverse benign behavior across multiple users.

High anomaly score ≠ Threat

High costs of mispredictions FP FN Loss of   time
and trust (Catastrophic)   Data breaches

Risk Score Prioritization of events Components of a Threat Detection
system

Risk Score Prioritization of events Components of a Usable Threat
Detection system Context  Providing contextual behavior information Explanation  Justifying model decisions Feedback  Human-in-the-loop knowledge transfer Action  Suggesting next steps and mitigations A great model produces useful, understandable, actionable detections. Model accuracy is not sufficient.

Lessons Learned Building a Threat Detection system  @ Duo THREAT
DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

What does Duo do?

Duo’s User and Entity Behavior Analytics (UEBA) • Currently in
beta • Tested with a core group of Duo customers

Lesson 1: Keep your scope small (but relevant).

Data captured as part of the authentication flow Authentication Log
• Company • User • Application • Time of access • Outcome of authentication • Properties of the access device • Properties of the second factor • Network of origin

Production pipeline architecture Normalize data sources Extract features Train models
Artifacts Models Database Queue or   Database Normalize data sources Extract features Score and explain Presentation of potential security events Spark on EMR Spark on EMR

Current state • The pipeline builds build customer- and user-level
models that consider: ◦ Distribution of the geographical origin, timestamp, and essential attributes of the authentications; ◦ Surrounding context at a session level; ◦ Calibration information derived from recent historical data. • We return an anomaly score paired with an explanation (“authenticated from a novel device, ...”) and context (surrounding auths for the user).

Lesson 2: Production ≠ Research.* * especially if there’s a
lot of research left to be done.

Lesson 3: Figure out the Monitoring Story Early.* * especially
in an adversarial environment.

Lesson 4: Don’t Trust your Labels Implicitly.

First pass: we have labels...?

Lesson 4: Don’t Trust your Labels Implicitly.* * because non-expert
users are not good labelers.

Lesson 5: Ask your Operators.* * you don’t have to
wait for your models to be perfect.

Get feedback as early as possible • We sought feedback
from alpha clients very early on, and kept iterating. • Gives iterative insights into: ◦ Threat models   What do they consider suspicious vs. merely anomalous? what is “useful”? ◦ Semantic gap   Can we explain our outputs? can they explain their decisions? ◦ User populations   Dave does X all the time when Y, and it’s not suspicious ◦ Use cases that might be rare in the customer base at large.

Conclusions THREAT DETECTION • THE AWFULNESS • ML AT DUO
• CONCLUSIONS

Summary • Machine Learning can be a powerful tool for
Threat Detection, but using it in production requires a lot of care. • The risk score is only a small fraction of the work. • The business metric is whether your end user finds predictions useful and usable. • We have a lot of ongoing research - both on improving ML models, and on usability. THANK YOU! www.stefanom.io @RhymeWithEskimo

Outstanding Issues • Large variance in the amount and variety
of data produced by users and companies. • Unsupervised models are still annoying, and hard to deal with the temporal component.   • Categorize blocks of authentications into “tasks” to characterize normal and fraudulent behavior. Research Threads • Learn or impose a data hierarchy.     • Point process models can describe temporal patterns. ◦ Intrigued? Attend Bronwyn Woods’ talk at CAMLIS 2018. • Association rules can help surface frequent actions shared by groups of users.

What makes it hard? • Labels, Labels, Labels ◦ For
known attacks: if any labels at all, extreme class imbalance. ◦ For novel attacks: nah. • High cost of both false positives and false negatives. • Dearth of realistic public datasets. • Assumes some notion of normal behavior, but there are multiple agents with potentially very different behaviors.

What makes it hard? There’s more! • Interested in sequential
and temporal anomalies, not just point anomalies. • Mixed data types with varying proportions of categorical, boolean, and numerical data. • Context is important. • Hard to distinguish between noise, anomalies, and actual security events.   High anomaly score ≠ Threat. see, e.g., Sommer & Paxson, 2010

Pain points • Production ≠ Research ◦ Tension between what’s
needed for fast, iterative research and more heavyweight processes for production. ◦ Tooling mismatch. • We don’t have a good story for: ◦ Monitoring model performance ◦ Dynamically scheduling retraining • These are very both important components, especially in an adversarial environment.

Example Vector: Email Phishing

Detecting and Preventing Cybersecurity Threats

Detecting and Preventing Cybersecurity Threats

More Decks by Stefano Meschiari

Other Decks in Programming

Featured

Transcript