Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting and Preventing Cybersecurity Threats

Detecting and Preventing Cybersecurity Threats

A6f78eb8c4b69c5f49ed82641138a316?s=128

Stefano Meschiari

March 11, 2022
Tweet

More Decks by Stefano Meschiari

Other Decks in Programming

Transcript

  1. Detecting and Preventing Cybersecurity Threats Stefano Meschiari

  2. What you’ll take away 1. What Threat Detection is 2.

    Why it can be a hard problem 3. Lessons learned @ Duo 4. Summary 
 (Spoiler alert: It’s Still Hard)
  3. Stefano Meschiari Senior Data Scientist Duo Security labs.duo.com

  4. What is Threat Detection? 
 THREAT DETECTION • THE AWFULNESS

    • ML AT DUO • CONCLUSIONS
  5. DISCLAIMER: 
 I am not a security expert by any

    means.
  6. • Vectors: ◦ Phishing (e.g. spoofed websites, social engineering, etc.)

    ◦ Data breaches ◦ Malware (e.g. keyloggers) ◦ Breaking and leveraging weak credentials ◦ ... • ~ 2 billion usernames and passwords exposed via breaches in March 2016-2017 (Thomas, Li, et al., 2017). • ~ 63% of all breaches in the 2016 Verizon Data Breach Investigations Report were associated with credential theft. Credential Theft and Account Takeover
  7. “Classic” Threat Detection • Objective: recognize activity patterns within an

    access log that are indicative of policy violations or breaches. • Security analysts analyze data sources (e.g. access logs) and use domain expertise to manually pull out interesting events. • Many automated threat detection systems are based around signatures and static rules. ◦ Need to build extensive domain knowledge (patterns known to be malicious) ◦ Organizations and their data tend to evolve quickly ◦ Tend to be reactive rather than proactive
  8. “Classic” Threat Detection • Objective: recognize activity patterns within an

    access log that are indicative of policy violations or outright breaches. • Security analysts analyze data sources (e.g. access logs) and use domain expertise to manually pull out interesting events. • Most automated threat detection systems are based around static rules and signatures. ◦ Need to build extensive domain knowledge ◦ Organizations and their data tend to evolve quickly ◦ Tend to be reactive rather than proactive
  9. Sounds like a perfect application for ML, right? ...Right?

  10. Why do we care? It is a problem that people

    care about and that is very resource intensive, so it is worth optimizing. Cybersecurity attacks are frequent, on the rise, underreported, and costly. 
 Base rates are hard to quantify. Why does care?
  11. • We’re trying to recognize patterns outside the norm. •

    These patterns might not look like any other attack we’ve ever seen (novel attacks). NORMAL BEHAVIOR INTRUSION TYPE 2 INTRUSION TYPE 1 NOVEL ATTACK Automated Threat Detection = Anomaly Detection
  12. Successful applications of Anomaly Detection • Academic work has shown

    that many algorithms can successfully detect attacks when tested on common synthetic datasets. see, e.g., Gilmore & Hayadaman, 2016 TAKE YOUR PICK! ◦ Simple density-based / statistical detectors ◦ Clustering ◦ Bayesian Networks ◦ Isolation Forests ◦ One-class SVMs ◦ Inductive Association Rules, Frequent Sequence Mining ◦ NLP-inspired DL models ◦ … That success doesn’t always translate to an industrial setting...
  13. ...because Threat Detection with ML is hard.* 
 * Well,

    everything is hard. THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS
  14. A convergence of Awful A non-exhaustive list. See, e.g., Sommer

    & Paxson, 2010; Axelsson, 1999 • Annoying data • Lack of Labels • Diversity of benign behavior • Anomaly ≠ Threat • High cost of mispredictions • Usability > Metrics
  15. Data types that can be annoying to work with 42.42

    We like numbers 70.114.110.15
 Austin, TX Google Chrome We like categoricals less
  16. Lack of Labels • Very small fraction, or no labels

    at all (Attacks are rare). • Unsupervised. • Strong assumptions on “normality” vs. “attack”. • Without ground truth, you can’t verify your assumptions (easily)
  17. Diversity of 
 benign behavior • We assume normal behavior

    is uniform and frequent. “Sales” “Engineering” “HR/Recruiting” “Anomalies” • In reality, diverse benign behavior across multiple users.
  18. High anomaly score ≠ Threat

  19. High costs of mispredictions FP FN Loss of 
 time

    and trust (Catastrophic) 
 Data breaches
  20. Risk Score Prioritization of events Components of a Threat Detection

    system
  21. Risk Score Prioritization of events Components of a Usable Threat

    Detection system Context
 Providing contextual behavior information Explanation
 Justifying model decisions Feedback
 Human-in-the-loop knowledge transfer Action
 Suggesting next steps and mitigations A great model produces useful, understandable, actionable detections. Model accuracy is not sufficient.
  22. Lessons Learned Building a Threat Detection system
 @ Duo THREAT

    DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS
  23. What does Duo do?

  24. Duo’s User and Entity Behavior Analytics (UEBA) • Currently in

    beta • Tested with a core group of Duo customers
  25. Lesson 1: Keep your scope small (but relevant).

  26. Data captured as part of the authentication flow Authentication Log

    • Company • User • Application • Time of access • Outcome of authentication • Properties of the access device • Properties of the second factor • Network of origin
  27. Production pipeline architecture Normalize data sources Extract features Train models

    Artifacts Models Database Queue or 
 Database Normalize data sources Extract features Score and explain Presentation of potential security events Spark on EMR Spark on EMR
  28. Current state • The pipeline builds build customer- and user-level

    models that consider: ◦ Distribution of the geographical origin, timestamp, and essential attributes of the authentications; ◦ Surrounding context at a session level; ◦ Calibration information derived from recent historical data. • We return an anomaly score paired with an explanation (“authenticated from a novel device, ...”) and context (surrounding auths for the user).
  29. Lesson 2: Production ≠ Research.* * especially if there’s a

    lot of research left to be done.
  30. Lesson 3: Figure out the Monitoring Story Early.* * especially

    in an adversarial environment.
  31. Lesson 4: Don’t Trust your Labels Implicitly.

  32. First pass: we have labels...?

  33. Lesson 4: Don’t Trust your Labels Implicitly.* * because non-expert

    users are not good labelers.
  34. Lesson 5: Ask your Operators.* * you don’t have to

    wait for your models to be perfect.
  35. Get feedback as early as possible • We sought feedback

    from alpha clients very early on, and kept iterating. • Gives iterative insights into: ◦ Threat models 
 What do they consider suspicious vs. merely anomalous? what is “useful”? ◦ Semantic gap 
 Can we explain our outputs? can they explain their decisions? ◦ User populations 
 Dave does X all the time when Y, and it’s not suspicious ◦ Use cases that might be rare in the customer base at large.
  36. Conclusions THREAT DETECTION • THE AWFULNESS • ML AT DUO

    • CONCLUSIONS
  37. Summary • Machine Learning can be a powerful tool for

    Threat Detection, but using it in production requires a lot of care. • The risk score is only a small fraction of the work. • The business metric is whether your end user finds predictions useful and usable. • We have a lot of ongoing research - both on improving ML models, and on usability. THANK YOU! www.stefanom.io @RhymeWithEskimo
  38. None
  39. Outstanding Issues • Large variance in the amount and variety

    of data produced by users and companies. • Unsupervised models are still annoying, and hard to deal with the temporal component. 
 • Categorize blocks of authentications into “tasks” to characterize normal and fraudulent behavior. Research Threads • Learn or impose a data hierarchy. 
 
 • Point process models can describe temporal patterns. ◦ Intrigued? Attend Bronwyn Woods’ talk at CAMLIS 2018. • Association rules can help surface frequent actions shared by groups of users.
  40. What makes it hard? • Labels, Labels, Labels ◦ For

    known attacks: if any labels at all, extreme class imbalance. ◦ For novel attacks: nah. • High cost of both false positives and false negatives. • Dearth of realistic public datasets. • Assumes some notion of normal behavior, but there are multiple agents with potentially very different behaviors.
  41. What makes it hard? There’s more! • Interested in sequential

    and temporal anomalies, not just point anomalies. • Mixed data types with varying proportions of categorical, boolean, and numerical data. • Context is important. • Hard to distinguish between noise, anomalies, and actual security events. 
 High anomaly score ≠ Threat. see, e.g., Sommer & Paxson, 2010
  42. Pain points • Production ≠ Research ◦ Tension between what’s

    needed for fast, iterative research and more heavyweight processes for production. ◦ Tooling mismatch. • We don’t have a good story for: ◦ Monitoring model performance ◦ Dynamically scheduling retraining • These are very both important components, especially in an adversarial environment.
  43. Example Vector: Email Phishing