$30 off During Our Annual Pro Sale. View Details »

Where We’re Going, We Don’t Need Labels: Anomaly Detection for 2FA

Stefano Meschiari
March 11, 2022
33

Where We’re Going, We Don’t Need Labels: Anomaly Detection for 2FA

Authors: Becca Lynch and Stefano Meschiari, presented during DEFCON AI Village 29

Typical machine learning models in the security space use labels (annotations that describe whether a certain action is benign or malicious) in order to learn how to discriminate between threats and normal activity.

In practice, however, many systems in the security space that would benefit from machine learning models are critically hampered by a scarcity of labels. This may be due to many factors, such as low coverage of the collected labels, long latency between threat events and receiving the corresponding label, and noise in the feedback from domain experts and the system's users. New systems may have to be bootstrapped in the complete absence of established historical data (cold starts). Human behavior, being intrinsically difficult to quantifiably predict, often leaves us with benign activity that is constantly shifting and attack techniques that are constantly improving.

In this talk, we will discuss how we addressed the issues stemming from this complex ecosystem in the detection of two-factor authentication anomalies. We will be describing some of the algorithms, heuristics, and systems we developed to both understand user behavior and detect attack vectors, as well as discussing the many ways in which we fail miserably (and -- sometimes -- enjoy small successes).

Stefano Meschiari

March 11, 2022
Tweet

Transcript

  1. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Our approach What can we do with our data? Anomaly Detection for 2FA Becca Lynch & Stefano Meschiari Where We’re Going, We Don’t Need Labels*: 1
  2. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Add photo here Add photo here Hello! Stefano Meschiari Data Scientist Becca Lynch Data Scientist 2
  3. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Agenda Problem Space Attacks, data issues, and a world without labels Lessons Learned Where we fail and fall short in the “real world” Questions Our Approach Self-supervised models and simple(ish) heuristics Conclusions & What’s Next Future research directions and improvements 3
  4. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Problem Space 2FA attacks, lack of labels, data dimensionality 4
  5. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Authenticates to access applications 5
  6. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Typically credentials, something you know 6
  7. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Using an additional “factor”, something you have/are 7
  8. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. The factor is the method used for secondary authentication 8
  9. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. 9
  10. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Goals Identify successful attacks or abnormal access • Compromised primary credentials • Phishing • Access with no 2FA or low-trust factor • Abnormal based on corporate policy 10
  11. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Goals Identify successful attacks or abnormal access • Compromised primary credentials • Phishing • Access with no 2FA or low-trust factor • Abnormal based on corporate policy Provide accessible, timely, and actionable visibility • Clear and concise info on suspicious access • Enable analysts to make informed decisions on user security and policies gonzo 66.10.10.XXX 11
  12. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Framework 12
  13. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Framework A security analyst who configures Duo for the users and wants to know if auths are anomalous 13
  14. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Framework ML models trained on historical data used to detect whether new authentications are suspicious 14
  15. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Framework ML models trained on historical data used to detect whether new authentications are suspicious Does the work of detecting and ranking “events”, and is available to the analyst without any setup or extra installation 15
  16. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Framework ML models trained on historical data used to detect whether new authentications are suspicious Does the work of detecting and ranking “events”, and is available to the analyst without any setup or extra installation For a model to be trained, it needs some kind of label in order to differentiate between suspicious and benign behavior 16
  17. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. • Avoiding alarm fatigue for analysts • Detections need an associated explanation ◦ A black box solution won’t cut it! • Threat models and out-of-band information need to be inferred ◦ Not all detections have equal value Constraints 17
  18. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. { "customer": "Hollywood Studios", "user": "kfrog", "timestamp": "2020-06-09 12:36:00", "app": "Studio Creator Portal", "factor": "Duo Push", "access_ip": "123.45.678.XXX", "access_device": "Mac OS X, Chrome", "country_code": "US", "result": "FAILURE", "reason": "User not in allowed group", ... } Data • Primarily categorical with many possible levels for each attribute • Makes some algorithms very hard to implement • Certain components of data may be highly censored • Every customer has their own setup for authentication 18
  19. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Labels User-Generated • Users can approve or deny an authentication • If denied, it can be marked as fraudulent • Our first label is whether or not the user chose to mark an auth as fraud • Problems: ◦ Users are not security experts ◦ The authentication may timeout ◦ Generally unreliable kermit@hollywood.com 19
  20. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. • Analysts review surfaced authentications • Problems ◦ Analysts lack bandwidth ◦ “Suspicious” is conditional on expertise and providing the right context ◦ Providing feedback is optional Labels Analyst-Generated 20
  21. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Approach Self-supervised anomaly detection, OCCAM, detection pipeline 21
  22. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Approach Our approach includes a number of interacting components that make up a detection pipeline 22
  23. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Approach We will later see how each of the components of our pipeline work together... 23
  24. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Approach ...but first we will zoom in on one of the more interesting algorithms We will later see how each of the components of our pipeline work together... 24
  25. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Idea: use an ensemble of supervised models as an “avatar” for the unsupervised problem. • Leverages off-the-shelf base learning algorithms • Models structure in the data • Applies more naturally to mixed-type data • Reserve labels for careful evaluation, rather than supervision A Different Spin on Anomaly Detection Sparse, noisy, long-latency labels Data unsuitable for most common anomaly detection algorithms 25
  26. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. • Algorithm we developed specifically for this type of data. • OCCAM recasts an unsupervised problem (no labels) into a self-supervised problem (the data itself provides the supervision). • Intuition: define anomalies as authentications with attributes that are predicted to be unlikely, even given all the contextual information. OCCAM: Self-Supervision on Authentication Data Outlier Classification with Categorical Attribute Models 26
  27. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Ingredients Self-supervised Submodels Anomaly Ratios Submodel Weights Anomaly Score 27
  28. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Ingredient 1: Self-supervised Submodel Ensemble • Goal: recover responses that have been hidden from the model. • Each submodel is trained on historical auth data to recover one relevant attribute of the authentication at a time. • Base learners (random forests) for each submodel return probabilistic predictions for all alternatives. factor ~ S 1 (user, location, device, …) device ~ S 2 (user, location, factor, …) location ~ S 3 (user, device, factor, …) 28
  29. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Ingredient 2: Anomaly Ratios Anomaly ratios r For each auth, the ratios of the predicted probability of the most probable attribute value and the probability of the observed attribute value. 29
  30. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Submodel weight w (Brier score) A proxy for the accuracy of the probabilistic predictions of a given submodel, measured on calibration data. Ingredient 3: Submodel Weight Anomaly ratios r For each auth, the ratios of the predicted probability of the most probable attribute value and the probability of the observed attribute value. 30
  31. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Submodel weight w (Brier score) A proxy for the accuracy of the probabilistic predictions of a given submodel, measured on calibration data. OCCAM Anomaly Score Anomaly score A Average of how “surprised” each submodel is to see the auth’s observed attributes, weighted by the quality of each submodel. Anomaly ratio r For each auth, the ratios of the predicted probability of the most probable attribute value and the probability of the observed attribute value. 31
  32. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. LOW ANOMALY SCORE (ratio < 1) Observed Alternate HIGH ANOMALY SCORE (ratio >> 1) Alternate Observed p(Italy | Lee, Personal Device, Saturday, …) p(United States | Lee, Work Device, Saturday, …) w location ✖ most likely alternative location actual location model weight 32
  33. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. What does this buy us? • Self-supervision for categorical data. • Models latent structure. Even if an attribute is rare, it might be expected given other properties of the auth. • Provides a backbone for iteration. OCCAM provides a framework for evaluating and incorporating new features and base learners. 33
  34. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. • OCCAM • Detectors that focus on specific aspects of risk or trust Example: 2FA executed via SMS or by higher-risk user groups; office locations • Expert rules and heuristics Example: administrator action enabling 2FA bypass • Meta-models Example: data drift detector Detection Pipeline 34
  35. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. 35
  36. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. 36
  37. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. 37
  38. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. 38
  39. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Lessons learned Successes and failures 39
  40. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Measuring Success • We observed initial customers using the system and finding relevant events; reports of true positives • Our pipeline surfaces a variable number of daily detections for analysts to triage • Can we use these feedback labels to measure metrics like precision, recall, etc. and evaluate models at scale? 40
  41. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Measuring Success 41
  42. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Interpreting feedback is hard Prioritization Low volume of triageable detections, from multiple layers of models and rules Presentation Semantic gap will influence feedback if not enough explanation and context is provided Human factors UX, actionability, experience, time constraints, out-of-band knowledge are an implicit part of feedback 42
  43. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Practical Consequences • Low volume of labels • Determining ground truth and get a handle on FN and TN is difficult and sensitive • Disentangling model performance from human-in-the-loop considerations is hard • Introspecting decisions is complex Build mechanisms for observability and introspection Prioritization Presentation Human factor 43
  44. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. • Collaboration with Product & Design to help define qualitative and quantitative metrics • Regular internal “dog-fooding” • Before A/B testing, recruit customers for interviews to test hypotheses and watching out for semantic gaps Iteration & Collaboration in ML development 44
  45. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Watching out for silent failure • A lack of labels exacerbates any robustness and data quality issues ◦ Data and concept drift ◦ Authentication semantics ◦ Lack of data • Build meta-models that can monitor and correct for data issues • When all else fails, reduce scope 45
  46. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Conclusions & Future Directions 46
  47. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Tying it all together... • Combination of probabilistic sub-models, detectors based on understanding of universally risky/trusted behavior, and rule-based heuristics 47
  48. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Tying it all together... • Combination of probabilistic sub-models, detectors based on understanding of universally risky/trusted behavior, and rule-based heuristics • Feedback from customers is interpreted from both feedback data (labels from experts) and customer engagement 48
  49. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Tying it all together... In reality… • Feedback is optional, sparse, and highly subjective • Straightforward performance evaluation is nearly impossible • Progress can be made by solving proxy problems, careful monitoring and interpretation of what feedback we do have 49
  50. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Future Work Ask us more in Discord! User Clustering + Cluster users based on authentication data + Understand patterns and outliers Geolocation + Combine disparate sources of supervision in more principled manner (anomaly scores, focused detectors, threat feeds, expert labels, ...) + Location as a density function for each customer + Improve precision reducing false positives occurring in likely locations, and identify unknown anomalies in unusual locations within typical country Weak Supervision 50
  51. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. Where We’re Going We Don’t Need Labels 51
  52. © 2020 Cisco Systems, Inc. and/or its affiliates. All rights

    reserved. If You Don’t Have Labels... Be Careful Where You’re Going 52 Thanks! See you Discord