Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where We’re Going, We Don’t Need Labels: Anomaly Detection for 2FA

Stefano Meschiari
March 11, 2022
88

Where We’re Going, We Don’t Need Labels: Anomaly Detection for 2FA

Authors: Becca Lynch and Stefano Meschiari, presented during DEFCON AI Village 29

Typical machine learning models in the security space use labels (annotations that describe whether a certain action is benign or malicious) in order to learn how to discriminate between threats and normal activity.

In practice, however, many systems in the security space that would benefit from machine learning models are critically hampered by a scarcity of labels. This may be due to many factors, such as low coverage of the collected labels, long latency between threat events and receiving the corresponding label, and noise in the feedback from domain experts and the system's users. New systems may have to be bootstrapped in the complete absence of established historical data (cold starts). Human behavior, being intrinsically difficult to quantifiably predict, often leaves us with benign activity that is constantly shifting and attack techniques that are constantly improving.

In this talk, we will discuss how we addressed the issues stemming from this complex ecosystem in the detection of two-factor authentication anomalies. We will be describing some of the algorithms, heuristics, and systems we developed to both understand user behavior and detect attack vectors, as well as discussing the many ways in which we fail miserably (and -- sometimes -- enjoy small successes).

Stefano Meschiari

March 11, 2022
Tweet

Transcript

  1. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Our approach
    What can we do with our data?
    Anomaly
    Detection for 2FA
    Becca Lynch & Stefano Meschiari
    Where We’re Going,
    We Don’t Need Labels*:
    1

    View full-size slide

  2. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Add photo
    here
    Add photo
    here
    Hello!
    Stefano Meschiari
    Data Scientist
    Becca Lynch
    Data Scientist
    2

    View full-size slide

  3. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Agenda
    Problem Space
    Attacks, data issues, and a
    world without labels
    Lessons Learned
    Where we fail and fall short
    in the “real world”
    Questions
    Our Approach
    Self-supervised models
    and simple(ish) heuristics
    Conclusions &
    What’s Next
    Future research directions and
    improvements
    3

    View full-size slide

  4. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Problem Space
    2FA attacks, lack of labels, data
    dimensionality
    4

    View full-size slide

  5. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Authenticates to access
    applications
    5

    View full-size slide

  6. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Typically credentials,
    something you know
    6

    View full-size slide

  7. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Using an additional
    “factor”, something you
    have/are
    7

    View full-size slide

  8. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    The factor is the method used
    for secondary authentication
    8

    View full-size slide

  9. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    9

    View full-size slide

  10. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Goals
    Identify successful attacks or
    abnormal access
    ● Compromised primary credentials
    ● Phishing
    ● Access with no 2FA or low-trust factor
    ● Abnormal based on corporate policy
    10

    View full-size slide

  11. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Goals
    Identify successful attacks or
    abnormal access
    ● Compromised primary credentials
    ● Phishing
    ● Access with no 2FA or low-trust factor
    ● Abnormal based on corporate policy
    Provide accessible, timely, and
    actionable visibility
    ● Clear and concise info on suspicious access
    ● Enable analysts to make informed decisions
    on user security and policies
    gonzo 66.10.10.XXX
    11

    View full-size slide

  12. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Framework
    12

    View full-size slide

  13. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Framework
    A security analyst who configures
    Duo for the users and wants to
    know if auths are anomalous
    13

    View full-size slide

  14. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Framework
    ML models trained on historical
    data used to detect whether new
    authentications are suspicious
    14

    View full-size slide

  15. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Framework
    ML models trained on historical
    data used to detect whether new
    authentications are suspicious
    Does the work of detecting and
    ranking “events”, and is available to
    the analyst without any setup or
    extra installation
    15

    View full-size slide

  16. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Framework
    ML models trained on historical
    data used to detect whether new
    authentications are suspicious
    Does the work of detecting and
    ranking “events”, and is available to
    the analyst without any setup or
    extra installation
    For a model to be trained, it needs
    some kind of label in order to
    differentiate between suspicious
    and benign behavior
    16

    View full-size slide

  17. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    ● Avoiding alarm fatigue for analysts
    ● Detections need an associated explanation
    ○ A black box solution won’t cut it!
    ● Threat models and out-of-band information
    need to be inferred
    ○ Not all detections have equal value
    Constraints
    17

    View full-size slide

  18. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    {
    "customer": "Hollywood Studios",
    "user": "kfrog",
    "timestamp": "2020-06-09 12:36:00",
    "app": "Studio Creator Portal",
    "factor": "Duo Push",
    "access_ip": "123.45.678.XXX",
    "access_device": "Mac OS X, Chrome",
    "country_code": "US",
    "result": "FAILURE",
    "reason": "User not in allowed group",
    ...
    }
    Data
    ● Primarily categorical with many
    possible levels for each attribute
    ● Makes some algorithms very hard to
    implement
    ● Certain components of data may be
    highly censored
    ● Every customer has their own setup
    for authentication
    18

    View full-size slide

  19. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Labels
    User-Generated
    ● Users can approve or deny an
    authentication
    ● If denied, it can be marked as fraudulent
    ● Our first label is whether or not the user
    chose to mark an auth as fraud
    ● Problems:
    ○ Users are not security experts
    ○ The authentication may timeout
    ○ Generally unreliable
    [email protected]
    19

    View full-size slide

  20. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    ● Analysts review surfaced
    authentications
    ● Problems
    ○ Analysts lack bandwidth
    ○ “Suspicious” is
    conditional on expertise
    and providing the right
    context
    ○ Providing feedback is
    optional
    Labels
    Analyst-Generated
    20

    View full-size slide

  21. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Approach
    Self-supervised anomaly detection,
    OCCAM, detection pipeline
    21

    View full-size slide

  22. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Approach
    Our approach includes a
    number of interacting
    components that make up
    a detection pipeline
    22

    View full-size slide

  23. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Approach
    We will later see how each
    of the components of our
    pipeline work together...
    23

    View full-size slide

  24. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Approach
    ...but first we will zoom in
    on one of the more
    interesting algorithms
    We will later see how each
    of the components of our
    pipeline work together...
    24

    View full-size slide

  25. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Idea: use an ensemble of
    supervised models as an “avatar”
    for the unsupervised problem.
    ● Leverages off-the-shelf base learning
    algorithms
    ● Models structure in the data
    ● Applies more naturally to mixed-type
    data
    ● Reserve labels for careful evaluation,
    rather than supervision
    A Different Spin on Anomaly Detection
    Sparse, noisy, long-latency labels
    Data unsuitable for most common
    anomaly detection algorithms
    25

    View full-size slide

  26. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    ● Algorithm we developed specifically for this
    type of data.
    ● OCCAM recasts an unsupervised problem
    (no labels) into a self-supervised problem
    (the data itself provides the supervision).
    ● Intuition: define anomalies as authentications
    with attributes that are predicted to be unlikely,
    even given all the contextual information.
    OCCAM:
    Self-Supervision on Authentication Data
    Outlier Classification
    with Categorical
    Attribute Models
    26

    View full-size slide

  27. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Ingredients
    Self-supervised
    Submodels
    Anomaly Ratios
    Submodel
    Weights
    Anomaly Score
    27

    View full-size slide

  28. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Ingredient 1: Self-supervised Submodel
    Ensemble
    ● Goal: recover responses that have
    been hidden from the model.
    ● Each submodel is trained on historical
    auth data to recover one relevant
    attribute of the authentication at a
    time.
    ● Base learners (random forests) for
    each submodel return probabilistic
    predictions for all alternatives.
    factor ~ S
    1
    (user, location, device, …)
    device ~ S
    2
    (user, location, factor, …)
    location ~ S
    3
    (user, device, factor, …)
    28

    View full-size slide

  29. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Ingredient 2: Anomaly Ratios
    Anomaly ratios r
    For each auth, the ratios of the predicted
    probability of the most probable attribute
    value and the probability of the observed
    attribute value.
    29

    View full-size slide

  30. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Submodel weight w (Brier score)
    A proxy for the accuracy of the
    probabilistic predictions of a given
    submodel, measured on calibration data.
    Ingredient 3: Submodel Weight
    Anomaly ratios r
    For each auth, the ratios of the predicted
    probability of the most probable attribute
    value and the probability of the observed
    attribute value.
    30

    View full-size slide

  31. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Submodel weight w (Brier score)
    A proxy for the accuracy of the
    probabilistic predictions of a given
    submodel, measured on calibration data.
    OCCAM Anomaly Score
    Anomaly score A
    Average of how “surprised” each
    submodel is to see the auth’s observed
    attributes, weighted by the quality of
    each submodel.
    Anomaly ratio r
    For each auth, the ratios of the predicted
    probability of the most probable attribute
    value and the probability of the observed
    attribute value.
    31

    View full-size slide

  32. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    LOW
    ANOMALY
    SCORE
    (ratio < 1)
    Observed
    Alternate
    HIGH
    ANOMALY
    SCORE
    (ratio >> 1)
    Alternate
    Observed
    p(Italy | Lee, Personal Device, Saturday, …)
    p(United States | Lee, Work Device, Saturday, …)
    w
    location

    most likely
    alternative location
    actual location
    model weight
    32

    View full-size slide

  33. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    What does this buy us?
    ● Self-supervision for categorical data.
    ● Models latent structure.
    Even if an attribute is rare, it might be expected given other
    properties of the auth.
    ● Provides a backbone for iteration.
    OCCAM provides a framework for evaluating and
    incorporating new features and base learners.
    33

    View full-size slide

  34. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    ● OCCAM
    ● Detectors that focus on specific aspects of
    risk or trust
    Example: 2FA executed via SMS or by higher-risk user
    groups; office locations
    ● Expert rules and heuristics
    Example: administrator action enabling 2FA bypass
    ● Meta-models
    Example: data drift detector
    Detection Pipeline
    34

    View full-size slide

  35. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    35

    View full-size slide

  36. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    36

    View full-size slide

  37. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    37

    View full-size slide

  38. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    38

    View full-size slide

  39. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Lessons learned
    Successes and failures
    39

    View full-size slide

  40. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Measuring Success
    ● We observed initial customers using the
    system and finding relevant events; reports
    of true positives
    ● Our pipeline surfaces a variable number of
    daily detections for analysts to triage
    ● Can we use these feedback labels to
    measure metrics like precision, recall,
    etc. and evaluate models at scale?
    40

    View full-size slide

  41. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Measuring Success
    41

    View full-size slide

  42. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Interpreting feedback is hard
    Prioritization
    Low volume of triageable detections, from
    multiple layers of models and rules
    Presentation
    Semantic gap will influence feedback if not
    enough explanation and context is provided
    Human factors
    UX, actionability, experience, time constraints,
    out-of-band knowledge are an implicit part of
    feedback 42

    View full-size slide

  43. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Practical Consequences
    ● Low volume of labels
    ● Determining ground truth
    and get a handle on FN
    and TN is difficult and
    sensitive
    ● Disentangling model
    performance from
    human-in-the-loop
    considerations is hard
    ● Introspecting decisions is
    complex
    Build mechanisms for observability and
    introspection
    Prioritization Presentation
    Human factor
    43

    View full-size slide

  44. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    ● Collaboration with Product &
    Design to help define qualitative
    and quantitative metrics
    ● Regular internal “dog-fooding”
    ● Before A/B testing, recruit
    customers for interviews to test
    hypotheses and watching out for
    semantic gaps
    Iteration &
    Collaboration in ML
    development
    44

    View full-size slide

  45. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Watching out for silent failure
    ● A lack of labels exacerbates any robustness and
    data quality issues
    ○ Data and concept drift
    ○ Authentication semantics
    ○ Lack of data
    ● Build meta-models that can monitor and correct
    for data issues
    ● When all else fails, reduce scope
    45

    View full-size slide

  46. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Conclusions &
    Future Directions
    46

    View full-size slide

  47. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Tying it all together...
    ● Combination of probabilistic sub-models,
    detectors based on understanding of
    universally risky/trusted behavior, and
    rule-based heuristics
    47

    View full-size slide

  48. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Tying it all together...
    ● Combination of probabilistic sub-models,
    detectors based on understanding of
    universally risky/trusted behavior, and
    rule-based heuristics
    ● Feedback from customers is interpreted
    from both feedback data (labels from
    experts) and customer engagement
    48

    View full-size slide

  49. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Tying it all together...
    In reality…
    ● Feedback is optional, sparse, and
    highly subjective
    ● Straightforward performance evaluation
    is nearly impossible
    ● Progress can be made by solving proxy
    problems, careful monitoring and
    interpretation of what feedback we do
    have
    49

    View full-size slide

  50. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Future Work
    Ask us more in Discord!
    User Clustering
    + Cluster users based on
    authentication data
    + Understand patterns and
    outliers
    Geolocation
    + Combine disparate
    sources of
    supervision in more
    principled manner
    (anomaly scores,
    focused detectors,
    threat feeds, expert
    labels, ...)
    + Location as a density
    function for each customer
    + Improve precision
    reducing false positives
    occurring in likely
    locations, and identify
    unknown anomalies in
    unusual locations within
    typical country
    Weak Supervision
    50

    View full-size slide

  51. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    Where We’re Going
    We Don’t Need Labels
    51

    View full-size slide

  52. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
    If You Don’t Have Labels...
    Be Careful Where You’re
    Going
    52
    Thanks! See you Discord

    View full-size slide