Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Defending deep learning from adversarial attack...

Defending deep learning from adversarial attacks - presented at Cream City Code 2019

Deep learning is widely used in critical applications. Can it be trusted? In this talk we describe adversarial attacks and defenses found by researchers in the last several years and the open source Python library Adversarial Robustness 360 Toolbox (ART). We consider some Jupyter notebooks illustrationg possible uses of ART. In addition, we briefly talk about AI Fairness 360 and AI Explainability 360, other open source libraries, now accepted into Linux Foundation Trusted AI Committee.

Avatar for Svetlana Levitan

Svetlana Levitan

October 05, 2019
Tweet

More Decks by Svetlana Levitan

Other Decks in Technology

Transcript

  1. Defending deep learning from adversarial attacks — Svetlana Levitan, PhD

    Developer Advocate in Chicago Center for Open-source Data and AI Technologies IBM Cloud and Cognitive Software October 5, 2019 @SvetaLevitan @ibmcodait
  2. 2 Who is Svetlana Levitan? Originally from Moscow, Russia, now

    in Chicago PhD in Applied Mathematics and MS in Computer Science from University of Maryland, College Park Software Engineer for SPSS Analytic components (2000-2018) Working on PMML since 2001, ONNX recently IBM acquired SPSS in 2009 Developer Advocate with IBM Center for Open Data and AI Technologies (since June 2018) Meetup organizer: Big Data Developers in Chicago, Open Source Analytics, working with Chicago ML Two daughters love programming: IIT and Niles North
  3. IBM and open source Intro to neural networks and deep

    learning Intro to adversarial attacks and defenses Adversarial Robustness Toolbox (ART) AI Fairness 360 and AI Explainability 360 Links and resources Cloud and Cognitive Applications/ October 5, 2019 / © 2019 IBM Corporation 3 Contents
  4. IBM Cloud and Cognitive Software/October 5, 2019 / © 2019

    IBM Corporation 4 and open standards
  5. Center for Open Source Data and AI Technologies (CODAIT) Code

    – Build and improve practical frameworks to enable more developers to realize immediate value. Content – Showcase solutions for complex and real-world AI problems. Community – Bring developers and data scientists to engage with IBM Improving Enterprise AI lifecycle in Open Source • Team contributes to over 10 open source projects • 17 committers and many contributors in Apache projects • Over 1100 JIRAs and 66,000 lines of code committed to Apache Spark itself; over 65,000 LoC into SystemML • Over 25 product lines within IBM leveraging Apache Spark • Speakers at over 100 conferences, meetups, unconferences and more CODAIT codait.org
  6. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997

    … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Deep Learning and AI history 2012 Introduced deep learning with GPUs IBM Cognitive Aplications / © 2019 IBM Corporation 6
  7. A human brain has: • 200 billion neurons • 32

    trillion connections between them • 25 million “neurons” • 100 million connections (parameters) Deep Learning = Training Artificial Neural Networks IBM Cognitive Applications / © 2019 IBM Corporation 7
  8. IBM Cognitive Applications / © 2019 IBM Corporation Some history

    8 Elementary Perceptron 1957 Frank Rosenblatt Multilayer Perceptron
  9. Backpropagation Labeled Training Data Coat Sneaker T-shirt Sneaker Pullover Output

    Errors Pullover Coat Coat Sneaker T-shirt ❌ ❌ ❌ Fashion-MNIST dataset by Zalando Research, on GitHub <https://github.com/zalandoresearch/fashion-mnist> (MIT License). Slide created by Bradley Holt
  10. Input Output Sneaker 98% Neural Network Inferencing Fashion-MNIST dataset by

    Zalando Research, on GitHub <https://github.com/zalandoresearch/fashion-mnist> (MIT License). Slide created by Bradley Holt
  11. Convolutional Neural Networks IBM Cloud and Cognitive Software/September 27, 2019

    / © 2019 IBM Corporation 11 https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
  12. Convolutional layer in greater detail IBM Cloud and Cognitive Software/October

    5, 2019 / © 2019 IBM Corporation 12 http://cs231n.github.io/convolutional-networks/
  13. AIOps Prepared and Analyzed Data Trained Model Deployed Model Many

    tools to train machine learning and deep learning models Prepared Data Initial Model Deployed Model
  14. AIOps Trained Model Deployed Model And there are platforms to

    serve your models, create model catalogues etc. Prepared Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX Istio OpenWhisk
  15. AIOps Prepared and Analyzed Data Trained Model Deployed Model But

    what about trust in AI? Prepared Data Initial Model Deployed Model Can the trained model be trusted? Can the dataset be trusted? Is the deployed model robust enough? Is the model vulnerable to adversarial attacks?
  16. Is it fair? Is it easy to understand? Did anyone

    tamper with it? Is it accountable? #21, #32, #93 #21, #32, #93 What does it take to trust a decision made by a machine? (Other than that it is 99% accurate)?
  17. FAIRNESS EXPLAINABILITY ROBUSTNESS ASSURANCE Our vision for Trusted AI Pillars

    of trust, woven into the lifecycle of an AI application
  18. AIOps Prepared and Analyzed Data Trained Model Deployed Model Let`s

    talk about Robustness Prepared Data Initial Model Deployed Model Is the model vulnerable to adversarial attacks? Is the dataset poisoned?
  19. Data Attacker Neural Network poison train input perturb output result

    $$$ benefit Adversarial Threats to AI Evasion attacks ▪ Performed at test time ▪ Perturb inputs with crafted noise ▪ Model fails to predict correctly ▪ Undetectable by humans Poisoning attacks ▪ Performed at training time ▪ Insert poisoned sample in training data ▪ Use backdoor later
  20. Exposure to poisoning • Could the attacker have created backdoors

    via poisoning of training data? Plausible deniability • How important is it for the adversary to use adversarial samples with strong resemblance to the original inputs? Type I vs type II errors • Is the attacker trying to bypass safeguards or aiming to cause false alarms? • What are the costs associated with such errors? Black vs white box • What knowledge does the attacker have about the AI model? • How does the attacker access the AI model? • Limitations to the number of queries? 25 Threat Models
  21. Evasion attacks – an analysis 26 Why do adversarial examples

    exist? • Unless test error is 0%, there is always room for adversarial samples. • Attacks push inputs across the decision boundary. • Surprising: proximity of the nearest decision boundary! [Gilmer et al., 2018. Adversarial Spheres. https://arxiv.org/abs/1801.02774]
  22. Evasion attacks – an analysis 28 Why do adversarial examples

    exist? Fooling images: • DNNs don’t learn actually to recognize e.g. a schoolbus, but to discriminate it from any other object in the training set. [Nguyen et al., 2014. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. https://arxiv.org/abs/1412.1897]
  23. Robustness Metrics Detection Model Hardening Static Data Preprocessing Model Design

    Statistical Tests Detector Networks Bayesian Uncertainty Attack Independent Attack Specific Adversarial Training Dynamic Gaussian Data Augmentation Feature Squeezing Label Smoothing Shattered Gradients Stochastic Gradients Saddlepoint Optimization Dimensionality Reduction BReLUs CLEVER Global Lipschitz Bound Loss Sensitivity Minimal Perturbation Adversarial Success Rates MMD Kernel Density Estimates Local Intrinsic Dimensionality Magnet Detectors on Inputs Detectors on Internal Representations Dropout Uncertainty Bayesian SVMs How to defend? Taxonomy of defenses
  24. How to defend? 30 Adversarial training • Train DNNs solely

    on adversarial samples • Increase DNN capacity to maintain accuracy on clean data • Use specific algorithm for crafting the adversarial samples [Madry et al., 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. https://arxiv.org/abs/1706.06083] Performance on CIFAR-10 data Data Model Accuracy Original A 87.3% PGD-20 A 45.8% PGD-7 A’ 64.2% FGSM Anat 85.6%
  25. How to defend? 31 Preprocessing data • Process samples in

    order to remove adversarial noise • Input the cleaned samples to the classifier • Somewhat effective, however can be easily defeated by an adaptive adversary. Feature squeezing [W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017a.]
  26. How to defend? 32 Poisoning detection Poisoned MNIST sample (will

    be classified as ‘1’ by poisoned model with high probability). Unsupervised clustering of training data based on DNN internal activations: Discovers partition of poisonous vs normal training samples.
  27. The Adversarial Robustness 360 Toolbox (ART) 34 • Library for

    adversarial machine learning • Baseline implementation of attacks and defenses for classifiers • Dedicated to images • Python 2 & 3 • MIT license • Supported frameworks: Load classifier model (Keras, TF, PyTorch etc) Perform attack Load ART modules Evaluate robustness
  28. Adversarial Robustness 360 Toolbox (ART) Poisoning detection • Detection based

    on clustering activations • Proof of attack strategy Evasion detection • Detector based on inputs • Detector based on activations Robustness metrics • CLEVER • Empirical robustness • Loss sensitivity Unified model API • Training • Prediction • Access to loss and prediction gradients Evasion defenses • Feature squeezing • Spatial smoothing • Label smoothing • Adversarial training • Virtual adversarial training • Thermometer encoding • Gaussian augmentation • Total variance minimization Evasion attacks • FGSM • JSMA • BIM • PGD • Carlini & Wagner • DeepFool • NewtonFool • Elastic net attack • Universal perturbation • Spatial transformations 36
  29. Jupyter notebooks with examples of ART use An attack and

    a simple defense on a model for clothing Attack and defensive model building for digit data Building a detector for adversarial inputs See e.g. https://developer.ibm.com/patterns/integrate-adversarial-attacks- model-training-pipeline/ 39
  30. 47

  31. 48

  32. 49

  33. 53

  34. 54

  35. 55

  36. 56

  37. 57

  38. 58

  39. 59

  40. Using the detector model Works well even on low strength

    adversarial attack Apply detector on new inputs If an attack detected, trace back to source Original model does not need changes 61
  41. Conclusions Adversarial attacks pose a threat to the deployment of

    AI in security critical applications There is ongoing work on practical defenses with strong guarantees Future work: analyzing the adversarial threat on other types of data (text, speech, video, time series…) Bigger picture: Trusted AI Security ↔ Fairness ↔ Explainability ↔ Privacy https://www.research.ibm.com/artificial-intelligence/trusted-ai/ 62
  42. AIOps Prepared and Analyzed Data Trained Model Deployed Model Now

    how do we check for bias throughout AI lifecycle? Prepared Data Initial Model Deployed Model Are model weights biased? Are predictions biased? Is the dataset biased?
  43. © 2018 IBM Corporation IBM Confidential Demo Application: AI Fairness

    360 Web Application http://aif360.mybluemix.net/
  44. 67 × 2019 IBM Corporation AIX360: DIFFERENT WAYS TO EXPLAIN

    One explanation does not fit all Different stakeholders require explanations for different purposes and with different objectives, and explanations will have to be tailored to their needs. End users/customers (trust) Doctors: Why did you recommend this treatment? Customers: Why was my loan denied? Teachers: Why was my teaching evaluated in this way? Gov’t/regulators (compliance, safety) Prove to me that you didn't discriminate. Developers (quality, “debuggability”) Is our system performing well? How can we improve it?
  45. data model samples features local global direct Understand data or

    model? Explanations as samples, distributions or features? distributions tabular image text ProtoDash (Case-based reasoning) DIP-VAE (Learning meaningful features) Explanations for individual samples (local) or overall behavior (global)? A directly interpretable model or posthoc explanations? BRCG or GLRM posthoc A surrogate model or visualize behavior? surrogate visualize ProfWeight (Learning accurate interpretable model) (Easy to understand rules) interactive Explanations based on samples, features, or elicited explanations? ? ? ? ProtoDash (Case-based reasoning) CEM or CEM-MAF (Feature based explanations) TED (Persona-specific explanations) features samples elicited explanations One-shot static or interactive explanations? static
  46. AIOps Trained Model Deployed Model AI Pipeline Prepared Data Initial

    Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART AIX360 AIX360 PMML, PFA , ONNX
  47. Thank you! @SvetaLevitan CODAIT.org @ibmcodait Developer.ibm.com @IBMDeveloper ART Demo: https://art-demo.mybluemix.net/

    ART: https://github.com/IBM/adversarial-robustness-toolbox Sign up for free IBM Cloud account: https://ibm.biz/BdzA6i If you are in or near Chicago, join Meetup groups: Big Data Developers in Chicago, Chicago ML. Come to Chicago ML workshop at IBM office on October 28! 70
  48. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter:

    AI Fairness 360 AIF360 Prepared and Analyzed Data Initial Model Deployed Model