Pro Yearly is on sale from $80 to $50! »

Is "adversarial examples" an Adversarial Example?

Is "adversarial examples" an Adversarial Example?

Keynote talk at 1st Deep Learning and Security Workshop
May 24, 2018
co-located with the
39th IEEE Symposium on Security and Privacy
San Francisco, California

Abstract:
Over the past few years, there has been an explosion of research in
security of machine learning and on adversarial examples in
particular. Although this is in many ways a new and immature research
area, the general problem of adversarial examples has been a core
problem in information security for thousands of years. In this talk,
I'll look at some of the long-forgotten lessons from that quest and
attempt to understand what, if anything, has changed now we are in the
era of deep learning classifiers. I will survey the prevailing
definitions for "adversarial examples", argue that those definitions
are unlikely to be the right ones, and raise questions about whether
those definitions are leading us astray.

Bio:
David Evans (https://www.cs.virginia.edu/evans/) is a Professor of
Computer Science at the University of Virginia where he leads the
Security Research Group (https://www.jeffersonswheel.org). He is the author of an open computer science textbook
(http://www.computingbook.org) and a children's book on combinatorics and computability (http://www.dori-mic.org). He won the Outstanding Faculty Award from the State Council of Higher Education for Virginia, and was Program Co-Chair for the 24th ACM Conference on Computer and Communications Security (CCS 2017) and the 30th (2009) and 31st (2010) IEEE Symposia on Security and Privacy. He has SB, SM and PhD degrees in Computer Science from MIT and has been a faculty member at the University of Virginia since 1999.

40e37c08199ed4d3866ce6e1ff0be06d?s=128

David Evans

May 24, 2018
Tweet

Transcript

  1. Is "adversarial examples" an Adversarial Example? David Evans University of

    Virginia evadeML.org Deep Learning and Security Workshop 24 May 2018 San Francisco, CA
  2. GDPR in effect May 25 (tomorrow)!

  3. GDPR in effect now!

  4. GDPR in Effect 00:37:34 Response Due 71:22:26 Maximum Fine (Google)

    $2,120,889,281 GDPR in Effect 00:37:35 Response Due 71:22:25 Maximum Fine (Google) $2,120,889,451 “Manager’s nightmare, but a researcher’s paradise!” – David Basin GDPR in Effect 00:37:36 Response Due 71:22:24 Maximum Fine (Google) $2,120,889,622 GDPR in Effect 00:37:37 Response Due 71:22:23 Maximum Fine (Google) $2,120,889,792 GDPR in Effect 00:37:38 Response Due 71:22:22 Maximum Fine (Google) $2,120,889,962 GDPR in Effect 00:37:39 Response Due 71:22:21 Maximum Fine (Google) $2,120,890,133 GDPR in Effect 00:37:40 Response Due 71:22:20 Maximum Fine (Google) $2,120,890,304 GDPR in Effect 00:37:41 Response Due 71:22:19 Maximum Fine (Google) $2,120,890,474 GDPR in Effect 00:37:42 Response Due 71:22:18 Maximum Fine (Google) $2,120,890,645 GDPR in Effect 00:37:43 Response Due 71:22:17 Maximum Fine (Google) $2,120,890,815 GDPR in Effect 00:37:42 Response Due 71:22:18 Maximum Fine (Google) $2,120,890,986 GDPR in Effect 00:37:43 Response Due 71:22:17 Maximum Fine (Google) $2,120,891,156 GDPR in Effect 00:37:44 Response Due 71:22:16 Maximum Fine (Google) $2,120,891,327 GDPR in Effect 00:37:45 Response Due 71:22:15 Maximum Fine (Google) $2,120,891,497 GDPR in Effect 00:37:46 Response Due 71:22:14 Maximum Fine (Google) $2,120,891,667 GDPR in Effect 00:37:47 Response Due 71:22:13 Maximum Fine (Google) $2,120,891,838 GDPR in Effect 00:37:48 Response Due 71:22:12 Maximum Fine (Google) $2,120,891,838 GDPR in Effect 00:37:49 Response Due 71:22:11 Maximum Fine (Google) $2,120,892,008 GDPR in Effect 00:37:50 Response Due 71:22:10 Maximum Fine (Google) $2,120,892,179 GDPR in Effect 00:37:51 Response Due 71:22:09 Maximum Fine (Google) $2,120,892,349 GDPR in Effect 00:37:52 Response Due 71:22:08 Maximum Fine (Google) $2,120,892,520 GDPR in Effect 00:37:53 Response Due 71:22:07 Maximum Fine (Google) $2,120,892,690 GDPR in Effect 00:37:54 Response Due 71:22:06 Maximum Fine (Google) $2,120,892,861 GDPR in Effect 00:37:55 Response Due 71:22:05 Maximum Fine (Google) $2,120,893,031 GDPR in Effect 00:37:56 Response Due 71:22:04 Maximum Fine (Google) $2,120,893,202 GDPR in Effect 00:37:57 Response Due 71:22:03 Maximum Fine (Google) $2,120,893,372 GDPR in Effect 00:37:58 Response Due 71:22:02 Maximum Fine (Google) $2,120,893,543 GDPR in Effect 00:37:59 Response Due 71:22:01 Maximum Fine (Google) $2,120,893,713 GDPR in Effect 00:38:00 Response Due 71:22:00 Maximum Fine (Google) $2,120,893,884 GDPR in Effect 00:38:01 Response Due 71:21:59 Maximum Fine (Google) $2,120,894,054 GDPR in Effect 00:38:02 Response Due 71:21:58 Maximum Fine (Google) $2,120,894,224 GDPR in Effect Response Due Maximum Fine (Google) GDPR in effect now!
  5. Article 22

  6. Is “adversarial examples” an Adversarial Example?

  7. 6 Papers on “Adversarial Examples” (Google Scholar) 675 0 200

    400 600 800 1000 1200 2018 (5/22) 2017 2016 2015 2014 2013 1241.5 papers expected in 2018!
  8. Adversarial Examples before Deep Learning 7

  9. Adversarial Examples “before ML” Péter Ször (1970-2013)

  10. Adversarial Examples before “Oakland” 9

  11. Adversarial Examples before “Oakland” 10 The crowd, uncertain, was split

    by opposing opinions. Then Laocoön rushes down eagerly from the heights of the citadel, to confront them all, a large crowd with him, and shouts from far off: ‘O unhappy citizens, what madness? ... Do you think the enemy’s sailed away? Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)
  12. 11 How should we define “adversarial example”?

  13. How should we define “adversarial example”? 12 “Adversarial examples are

    inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.” Ian Goodfellow, earlier today
  14. Adversarial Examples across Domains 13 Domain Classifier Space “Reality” Space

    Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Later Next Not DL
  15. Malware Adversarial Examples 14 Classifier Space Oracle Space actual program

    execution https://github.com/cuckoosandbox Cuckoo
  16. “Oracle” Definition 15 Given seed sample, !, !" is an

    adversarial example iff: # !" = % Class is % (for malware, %= “benign”) ℬ !′) = ℬ(! Behavior we care about is the same Malware: evasive variant preserves malicious behavior of seed, but is classified as benign No requirement that ! ~ !′ except through ℬ.
  17. Definitions suggest Attacks 16 Given seed sample, !, !" is

    an adversarial example iff: # !" = % Class is % (for malware, %= “benign”) ℬ !′) = ℬ(! Behavior we care about is the same Generic attack: heuristically explore input space for !′ that satisfies definition.
  18. Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation
  19. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  20. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
  21. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
  22. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  23. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  24. Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable

    Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces
  25. Fitness Function Assumes lost malicious behavior will not be recovered

    !itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise
  26. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  27. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
  28. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  29. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations
  30. Attacks suggest Defenses* 29 Definitions suggest Attacks

  31. Attacks suggest Defenses* 30 * That only work against a

    very particular instantiation of that attack. Definitions suggest Attacks Maginot Line Enigma Plugboard
  32. Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score

    Malware Seed (sorted by original score) Discovered Evasive Variants
  33. Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust

    threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)
  34. Variants found with threshold = 0.25 Variants found with threshold

    = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)
  35. Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF

    Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  36. Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious

    PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017
  37. 36 Defenses should be designed around clear definitions of adversary

    goals and capabilities, not around thwarting particular attacks. (The second oldest principle in security.)
  38. Adversarial Examples across Domains 37 Domain Classifier Space “Reality” Space

    Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Next Done Not DL
  39. 38 Adversarial Examples across Domains Domain Classifier Space “Reality” Space

    Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Fixing (Breaking?) the Definition
  40. Image Classification DNN Classifier !(#) = & Human Perception !∗(#)

    = ( 39 Fixing (Breaking?) the Definition
  41. Well-Trained Classifier 40 Model and visualization based on work by

    Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)
  42. Adversarial Examples 41 Model and visualization based on work by

    Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)
  43. Misleading Visualization 42 Cartoon Reality 2 dimensions thousands of dimensions

    few samples near boundaries all samples near boundaries every sample near 1-3 classes every sample near all classes Classifier Space (DNN Model)
  44. Adversarial Examples 43 Adversary’s goal: find a small perturbation that

    changes class for classifier, but imperceptible to oracle. Classifier Space (DNN Model) “Oracle” Space (human perception)
  45. 44 Battista Biggio, et al. ECML-KDD 2013

  46. “Biggio” Definition 45 Assumption (to map to earlier definition): small

    perturbation does not change class in “Reality Space” Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *.
  47. “Biggio” Definition 46 Given seed sample, !, !" is an

    adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *. Problem #1: Every model with boundaries has adversarial examples. Problem #2: Very unnatural limit on adversary strength. Problem #3: Values all adversarial examples equally.
  48. DSML Papers 47 Biggio Definition (6) No Version On-Line (5)

    Oracle Definition (3) KFS, YKLALYP, RG AHHO, CW, GLSQ, HD, MW, SBC Building Classifiers (5) AMNKV, CSS, DAF, SHWS, ZCPS, Software (2) BGS, XLZX
  49. Impact of Adversarial Perturbations 48 Distance between layer output and

    its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile
  50. Impact of Adversarial Perturbations 49 Distance between layer output and

    its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile Mainuddin Jonas
  51. Impact of Adversarial Perturbations 50 Distance between layer output and

    its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet
  52. Impact of Adversarial Perturbations 51 Distance between layer output and

    its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet
  53. Definitions Suggest Defenses 52 Given seed sample, !, !" is

    an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *. Suggested Defense: given an input !∗, see how the model behaves on 0(!∗) where 0(3) reverses transformations in ∆-space.
  54. Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* ,

    … , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Weilin Xu Yanjun Qi
  55. Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* ,

    … , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Feature Squeezer coalesces similar inputs into one point: • Barely change legitimate inputs. • Destruct adversarial perturbations.
  56. Coalescing by Feature Squeezing 55 Metric Space 1: Target Classifier

    Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
  57. Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4

    0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 56 Signal Quantization
  58. Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4

    0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 57 Signal Quantization Seed 1 1 4 2 2 1 1 1 1 1 CW 2 CW ∞ BIM FGSM
  59. Other Potential Squeezers 58 C Xie, et al. Mitigating Adversarial

    Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means
  60. “Feature Squeezing” (Vacuous) Conjecture For any distance-limited adversarial method, there

    exists some feature squeezer that accurately detects its adversarial examples. 59 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.
  61. Feature Squeezing Detection Model (7-layer CNN) Model Model Bit Depth-

    1 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max '( )* , )( , '( )* , )2 > -
  62. Detecting Adversarial Examples Distance between original input and its squeezed

    version Adversarial inputs (CW attack) Legitimate inputs
  63. 62 0 200 400 600 800 0.0 0.4 0.8 1.2

    1.6 2.0 Number of Examples Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4% Training a detector (MNIST) set the detection threshold to keep false positive rate below target
  64. ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median

    2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(() (*+ , {*) , *. , */ }) > 3 Model Non-local Mean Prediction3
  65. 64 0 20 40 60 80 100 120 140 0.0

    0.4 0.8 1.2 1.6 2.0 Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 1.24 detection: 85%, FP < 5% Training a detector (ImageNet)
  66. How should we evaluate defenses? 65

  67. Threat Models Oblivious attack: The adversary has full knowledge of

    the target model, but is not aware of the detector. Adaptive attack: The adversary has full knowledge of the target model and the detector. 66
  68. (Generic) Adaptive Adversary Adaptive CW 2 attack, unbounded adversary: Warren

    He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song, Adversarial Example Defense: Ensembles of Weak Defenses are not Strong, USENIX WOOT’17. !"#"!"$% & '( − * + , ∗ Δ ', '( + 0 ∗ 12 3456%('′) 67 Misclassification term Distance term Detection term
  69. Adaptive Adversarial Examples 68 No successful adversarial examples were found

    for images originally labeled as 3 or 8. Mean L2 2.80 4.14 4.67 Attack Untargeted Targeted (next) Targeted (least likely)
  70. Adaptive Adversary Success Rates 69 0.68 0.06 0.01 0.44 0.01

    0.24 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Adversary’s Success Rate Clipped ε Targeted (Next) Targeted (LL) Untargeted Unbounded Typical !
  71. Revisiting Attacker’s Goal Find one adversarial example Find many adversarial

    examples Suya Yuan Tian
  72. Attacker Visibility “White-box attacker” Knows model architecture and all parameters

    “Black-box attacker” Interacts with model through API Limited number of interactions Output is <class, confidence> vector decision-based: output is just class “bird”, 0.09 “horse”, 0.84 ...
  73. Black-Box Cost Variance 72 CIFAR-10 MNIST Number of Queries Number

    of Adversarial Examples Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISec 2017. (1000 queries per iteration, 256 max iters) fails for 14 117,820 Average for MNIST 60,378 CIFAR-10 overall average 15,795 (26%) CIFAR-10 lowest-cost 20 ave: 50,240 Average for lowest 20 Target: least-likely class max ! 2 = 3 256,000
  74. Easy and Hard Examples “Easy” “Hard” “Easy” images: 5 with

    fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries (failed) 256,000 query attempts without success 14,592 0 → (least likely) 1 43,008 43,776 49,152 49,920
  75. Easy and Hard Examples “Easy” “Hard” “Easy” images: 5 with

    fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries (failed) 256,000 query attempts without success 14,592 0 → (least likely) 1 “airplane” → “frog” 43,008 43,776 49,152 49,920 9,728 10,496 10,752 12,288 13,824 256,000 query attempts without success
  76. White-Box Cost Variance 75 CIFAR-10 MNIST Number of Iterations Number

    of Adversarial Examples Carlini-Wagner L 2 Attack 82 CIFAR-10 average Target: least-likely class MNIST: max ! 2 = 3.0 CIFAR-10: max ! 2 = 1.0 2000 566 Average for MNIST 174 Average for lowest 20
  77. White-Box Cost Variance 76 CIFAR-10 MNIST Number of Iterations Number

    of Adversarial Examples Carlini-Wagner L 2 Attack Target: least-likely class MNIST: max ! 2 = 3.0 CIFAR-10: max ! 2 = 1.0 2000 566 174 Average for lowest 20 CIFAR-10 lowest 20 (average: 3.6) 82 CIFAR-10 average
  78. How does cost-variance impact attack cost? 77

  79. 78 CIFAR-10 Average queries per AE found (× 10$) Random

    target selection Greedy heuristic Oracle Optimal Simple Greedy Search Works Well Number of Adversarial Examples MNIST Number of Adversarial Examples ZOO Black-Box Attack Target: 20 MNIST CIFAR Greedy/Optimal 1.50 1.30 Random/Optimal 2.37 3.86 Target: 50 MNIST CIFAR Greedy/Optimal 1.46 1.21 Random/Optimal 1.96 2.45
  80. White-Box Batch Attack Cost 79 Random target selection Greedy heuristic

    Oracle Optimal CIFAR-10 Average iterations per AE found Number of Adversarial Examples MNIST Number of Adversarial Examples CW L 2 Attack Target: 20 MNIST CIFAR Greedy/Optimal 2.01 1.22 Random/Optimal 3.20 20.05 Target: 50 MNIST CIFAR Greedy/Optimal 1.76 1.50 Random/Optimal 2.45 15.11
  81. Madry Defense 80 Accuracy “9” “7” “0” Batch (10 samples,

    sorted by initial distance) MNIST airplane cars deer CIFAR-10 Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. https://github.com/Madr yLab/mnist_challenge
  82. History of the destruction of Troy, 1498 Conclusion

  83. Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$

    information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"#* white-box, black-box making progress? 82
  84. 83 Ali Rahimi NIPS Test-of-Time Award Speech (Dec 2017) ”If

    you're building photo- sharing systems alchemy is okay but we're beyond that; now we're building systems that govern healthcare and mediate our civic dialogue”
  85. 84 Ali Rahimi NIPS Test-of-Time Award Speech (Dec 2017) ”If

    you're building photo- sharing systems alchemy is okay but we're beyond that; now we're building systems that govern healthcare and mediate our civic dialogue”
  86. Alchemy (~700 − 1660) Well-defined, testable goal (turn lead into

    gold) Established theory (four elements: earth, fire, water, air) Methodical experiments and lab techniques (Jabir ibn Hayyan in 8th century) Wrong and ultimately unsuccessful, but led to modern chemistry.
  87. 86 Domain Classifier Space “Reality” Space Trojan Wars Judgment of

    Trojans !(#) = “gi=” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *
  88. 87 Domain Classifier Space “Reality” Space Trojan Wars Judgment of

    Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Academic Research Conferences, Fun !(+,s) = “awesome” Systems, Society, Ideas !∗ +,s = ?
  89. David Evans University of Virginia evans@virginia.edu EvadeML.org Weilin Xu Yanjun

    Qi Fnu Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel
  90. 89

  91. David Evans University of Virginia evans@virginia.edu EvadeML.org Weilin Xu Yanjun

    Qi Fnu Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel
  92. 91 @_youhadonejob1

  93. 92