$30 off During Our Annual Pro Sale. View Details »

Detecting and Preventing Cybersecurity Threats

Detecting and Preventing Cybersecurity Threats

Stefano Meschiari

March 11, 2022
Tweet

More Decks by Stefano Meschiari

Other Decks in Programming

Transcript

  1. Detecting and
    Preventing
    Cybersecurity Threats
    Stefano Meschiari

    View Slide

  2. What you’ll take away
    1. What Threat Detection is


    2. Why it can be a hard problem


    3. Lessons learned @ Duo


    4. Summary

    (Spoiler alert: It’s Still Hard)

    View Slide

  3. Stefano Meschiari
    Senior Data Scientist


    Duo Security


    labs.duo.com

    View Slide

  4. What is Threat
    Detection?

    THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

    View Slide

  5. DISCLAIMER:

    I am not a security expert
    by any means.

    View Slide

  6. ● Vectors:


    ○ Phishing (e.g. spoofed websites, social engineering, etc.)


    ○ Data breaches


    ○ Malware (e.g. keyloggers)


    ○ Breaking and leveraging weak credentials


    ○ ...


    ● ~ 2 billion usernames and passwords exposed via breaches in March
    2016-2017 (Thomas, Li, et al., 2017).


    ● ~ 63% of all breaches in the 2016 Verizon Data Breach Investigations Report
    were associated with credential theft.
    Credential Theft and Account Takeover

    View Slide

  7. “Classic” Threat Detection
    ● Objective: recognize activity patterns within an access log that are indicative
    of policy violations or breaches.
    ● Security analysts analyze data sources (e.g. access logs) and use domain expertise
    to manually pull out interesting events.


    ● Many automated threat detection systems are based around signatures and static
    rules.


    ○ Need to build extensive domain knowledge (patterns known to be malicious)


    ○ Organizations and their data tend to evolve quickly


    ○ Tend to be reactive rather than proactive

    View Slide

  8. “Classic” Threat Detection
    ● Objective: recognize activity patterns within an access log that are indicative
    of policy violations or outright breaches.
    ● Security analysts analyze data sources (e.g. access logs) and use domain
    expertise to manually pull out interesting events.


    ● Most automated threat detection systems are based around static rules and
    signatures.


    ○ Need to build extensive domain knowledge
    ○ Organizations and their data tend to evolve quickly
    ○ Tend to be reactive rather than proactive

    View Slide

  9. Sounds like a perfect application for ML, right?
    ...Right?

    View Slide

  10. Why do we care?
    It is a problem that people care about and that is very resource intensive, so
    it is worth optimizing.
    Cybersecurity attacks are frequent, on the rise, underreported, and costly.

    Base rates are hard to quantify.
    Why does care?

    View Slide

  11. ● We’re trying to
    recognize patterns
    outside the norm.


    ● These patterns might
    not look like any other
    attack we’ve ever seen
    (novel attacks).
    NORMAL
    BEHAVIOR
    INTRUSION
    TYPE 2
    INTRUSION
    TYPE 1
    NOVEL
    ATTACK
    Automated Threat Detection = Anomaly Detection

    View Slide

  12. Successful applications of Anomaly Detection
    ● Academic work has shown that many
    algorithms can successfully detect attacks
    when tested on common synthetic
    datasets.
    see, e.g., Gilmore & Hayadaman, 2016
    TAKE YOUR PICK!
    ○ Simple density-based / statistical detectors


    ○ Clustering


    ○ Bayesian Networks


    ○ Isolation Forests


    ○ One-class SVMs


    ○ Inductive Association Rules, Frequent
    Sequence Mining


    ○ NLP-inspired DL models


    ○ …
    That success doesn’t always translate to an industrial setting...

    View Slide

  13. ...because Threat
    Detection with ML is hard.*

    * Well, everything is hard.
    THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

    View Slide

  14. A convergence of Awful
    A non-exhaustive list. See, e.g., Sommer & Paxson, 2010; Axelsson, 1999
    ● Annoying data


    ● Lack of Labels


    ● Diversity of benign
    behavior


    ● Anomaly ≠ Threat


    ● High cost of
    mispredictions


    ● Usability > Metrics


    View Slide

  15. Data types that can be annoying to work with
    42.42
    We like numbers
    70.114.110.15

    Austin, TX
    Google Chrome
    We like categoricals less

    View Slide

  16. Lack of Labels
    ● Very small fraction, or
    no labels at all (Attacks
    are rare).


    ● Unsupervised.


    ● Strong assumptions on
    “normality” vs. “attack”.


    ● Without ground truth,
    you can’t verify your
    assumptions (easily)

    View Slide

  17. Diversity of 

    benign behavior
    ● We assume normal
    behavior is uniform and
    frequent.
    “Sales”
    “Engineering”
    “HR/Recruiting”
    “Anomalies”
    ● In reality, diverse
    benign behavior
    across multiple users.

    View Slide

  18. High anomaly score ≠ Threat

    View Slide

  19. High costs of mispredictions
    FP FN
    Loss of

    time and trust
    (Catastrophic)

    Data breaches

    View Slide

  20. Risk Score
    Prioritization of

    events
    Components of a Threat Detection system

    View Slide

  21. Risk Score
    Prioritization of

    events
    Components of a Usable Threat Detection system
    Context

    Providing contextual
    behavior information
    Explanation

    Justifying model
    decisions
    Feedback

    Human-in-the-loop
    knowledge transfer
    Action

    Suggesting next steps
    and mitigations
    A great model produces useful, understandable, actionable detections.


    Model accuracy is not sufficient.

    View Slide

  22. Lessons Learned
    Building a Threat Detection system

    @ Duo
    THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

    View Slide

  23. What does Duo do?

    View Slide

  24. Duo’s User and Entity Behavior Analytics (UEBA)
    ● Currently in beta


    ● Tested with a core
    group of Duo
    customers

    View Slide

  25. Lesson 1:
    Keep your scope small
    (but relevant).

    View Slide

  26. Data captured as part of the authentication flow
    Authentication Log
    ● Company


    ● User


    ● Application


    ● Time of access


    ● Outcome of
    authentication
    ● Properties of the access
    device


    ● Properties of the second
    factor


    ● Network of origin

    View Slide

  27. Production pipeline architecture
    Normalize

    data sources
    Extract
    features
    Train

    models
    Artifacts Models
    Database
    Queue or

    Database
    Normalize

    data sources
    Extract
    features
    Score and
    explain
    Presentation of potential
    security events
    Spark on EMR
    Spark on EMR

    View Slide

  28. Current state
    ● The pipeline builds build customer- and user-level models that consider:


    ○ Distribution of the geographical origin, timestamp, and essential
    attributes of the authentications;


    ○ Surrounding context at a session level;


    ○ Calibration information derived from recent historical data.


    ● We return an anomaly score paired with an explanation (“authenticated from
    a novel device, ...”) and context (surrounding auths for the user).

    View Slide

  29. Lesson 2:
    Production ≠ Research.*
    * especially if there’s a lot of research left to be done.

    View Slide

  30. Lesson 3:
    Figure out the
    Monitoring Story Early.*
    * especially in an adversarial environment.

    View Slide

  31. Lesson 4:
    Don’t Trust your
    Labels Implicitly.

    View Slide

  32. First pass: we have labels...?

    View Slide

  33. Lesson 4:
    Don’t Trust your
    Labels Implicitly.*
    * because non-expert users are not good labelers.

    View Slide

  34. Lesson 5:
    Ask your Operators.*
    * you don’t have to wait for your models to be perfect.

    View Slide

  35. Get feedback as early as possible
    ● We sought feedback from alpha clients very early on, and kept iterating.


    ● Gives iterative insights into:


    ○ Threat models

    What do they consider suspicious vs. merely anomalous? what is “useful”?


    ○ Semantic gap

    Can we explain our outputs? can they explain their decisions?


    ○ User populations

    Dave does X all the time when Y, and it’s not suspicious


    ○ Use cases that might be rare in the customer base at large.

    View Slide

  36. Conclusions
    THREAT DETECTION • THE AWFULNESS • ML AT DUO • CONCLUSIONS

    View Slide

  37. Summary
    ● Machine Learning can be a powerful tool for Threat
    Detection, but using it in production requires a lot
    of care.


    ● The risk score is only a small fraction of the work.


    ● The business metric is whether your end user finds
    predictions useful and usable.


    ● We have a lot of ongoing research - both on
    improving ML models, and on usability.
    THANK YOU!
    www.stefanom.io
    @RhymeWithEskimo

    View Slide

  38. View Slide

  39. Outstanding Issues
    ● Large variance in the amount and variety
    of data produced by users and
    companies.


    ● Unsupervised models are still annoying,
    and hard to deal with the temporal
    component.

    ● Categorize blocks of authentications into
    “tasks” to characterize normal and
    fraudulent behavior.
    Research Threads
    ● Learn or impose a data hierarchy.


    ● Point process models can describe
    temporal patterns.
    ○ Intrigued? Attend Bronwyn Woods’ talk at
    CAMLIS 2018.


    ● Association rules can help surface
    frequent actions shared by groups of
    users.

    View Slide

  40. What makes it hard?
    ● Labels, Labels, Labels
    ○ For known attacks: if any labels at all, extreme class imbalance.


    ○ For novel attacks: nah.


    ● High cost of both false positives and false negatives.


    ● Dearth of realistic public datasets.


    ● Assumes some notion of normal behavior, but there are multiple agents with
    potentially very different behaviors.

    View Slide

  41. What makes it hard? There’s more!
    ● Interested in sequential and temporal anomalies, not just point anomalies.


    ● Mixed data types with varying proportions of categorical, boolean, and
    numerical data.


    ● Context is important.


    ● Hard to distinguish between noise, anomalies, and actual security events. 

    High anomaly score ≠ Threat.
    see, e.g., Sommer & Paxson, 2010

    View Slide

  42. Pain points
    ● Production ≠ Research
    ○ Tension between what’s needed for fast, iterative research and more heavyweight processes for
    production.


    ○ Tooling mismatch.


    ● We don’t have a good story for:


    ○ Monitoring model performance


    ○ Dynamically scheduling retraining


    ● These are very both important components, especially in an adversarial
    environment.

    View Slide

  43. Example Vector: Email Phishing

    View Slide