Is "adversarial examples" an Adversarial Example?

May 24, 2018

Research

4.5k

Is "adversarial examples" an Adversarial Example?

Keynote talk at 1st Deep Learning and Security Workshop
May 24, 2018
co-located with the
39th IEEE Symposium on Security and Privacy
San Francisco, California

Abstract:
Over the past few years, there has been an explosion of research in
security of machine learning and on adversarial examples in
particular. Although this is in many ways a new and immature research
area, the general problem of adversarial examples has been a core
problem in information security for thousands of years. In this talk,
I'll look at some of the long-forgotten lessons from that quest and
attempt to understand what, if anything, has changed now we are in the
era of deep learning classifiers. I will survey the prevailing
definitions for "adversarial examples", argue that those definitions
are unlikely to be the right ones, and raise questions about whether
those definitions are leading us astray.

Bio:
David Evans (https://www.cs.virginia.edu/evans/) is a Professor of
Computer Science at the University of Virginia where he leads the
Security Research Group (https://www.jeffersonswheel.org). He is the author of an open computer science textbook
(http://www.computingbook.org) and a children's book on combinatorics and computability (http://www.dori-mic.org). He won the Outstanding Faculty Award from the State Council of Higher Education for Virginia, and was Program Co-Chair for the 24th ACM Conference on Computer and Communications Security (CCS 2017) and the 30th (2009) and 31st (2010) IEEE Symposia on Security and Privacy. He has SB, SM and PhD degrees in Computer Science from MIT and has been a faculty member at the University of Virginia since 1999.

David Evans

May 24, 2018

Tweet

More Decks by David Evans

See All by David Evans

FOSAD Trustworthy Machine Learning: Class 2

1

3.1k

FOSAD Trustworthy Machine Learning: Class 2

1

2.8k

FOSAD Trustworthy Machine Learning: Class 1

1

3.3k

Evaluating Differentially Private Machine Learning in Practice

0

2k

Class 27: Cryptocurrency

1

3.6k

Class 24: Privacy

0

630

Class 18: Hardness of Auctions

0

620

Class 17: Algorithmic Mechanism Design

0

700

Class 16: Auctions with Integrity

0

690

Other Decks in Research

See All in Research

地域丸ごとデイサービス「Go トレ」の紹介

smartfukushilab1

0

400

AIスパコン「さくらONE」の オブザーバビリティ / Observability for AI Supercomputer SAKURAONE

2

780

[CV勉強会@関東 CVPR2025] VLM自動運転model S4-Driver

2

630

A scalable, annual aboveground biomass product for monitoring carbon impacts of ecosystem restoration projects

4

440

心理言語学の視点から再考する言語モデルの学習過程

2

680

若手研究者が国際会議（例えばIROS）でワークショップを企画するメリットと成功法！

0

110

スキマバイトサービスにおける現場起点でのデザインアプローチ

yoshioshingyouji

0

260

1

330

Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion

4

250

ドメイン知識がない領域での自然言語処理の始め方

1

140

【輪講資料】Moshi: a speech-text foundation model for real-time dialogue

3

790

大学見本市2025 JSTさきがけ事業セミナー「顔の見えないセンシング技術：多様なセンサにもとづく個人情報に配慮した人物状態推定」

0

180

Featured

See All Featured

Distributed Sagas: A Protocol for Coordinating Microservices

333

22k

The Art of Programming - Codeland 2020

56

14k

Making Projects Easy

120

6.5k

How GitHub (no longer) Works

315

140k

I Don’t Have Time: Getting Over the Fear to Launch Your Podcast

34

2.5k

Why Our Code Smells

PRO

340

57k

Rails Girls Zürich Keynote

95

14k

Easily Structure & Communicate Ideas using Wireframe

194

17k

Building a Modern Day  E-commerce SEO Strategy

45

8.1k

Intergalactic Javascript Robots from Outer Space

273

27k

Design and Strategy: How to Deal with People Who Don’t "Get" Design

132

19k

[RailsConf 2023 Opening Keynote] The Magic of Rails

31

9.7k

Transcript

Is "adversarial examples" an Adversarial Example? David Evans University of
Virginia evadeML.org Deep Learning and Security Workshop 24 May 2018 San Francisco, CA
GDPR in effect May 25 (tomorrow)!
GDPR in effect now!
GDPR in Effect 00:37:34 Response Due 71:22:26 Maximum Fine (Google)
$2,120,889,281 GDPR in Effect 00:37:35 Response Due 71:22:25 Maximum Fine (Google) $2,120,889,451 “Manager’s nightmare, but a researcher’s paradise!” – David Basin GDPR in Effect 00:37:36 Response Due 71:22:24 Maximum Fine (Google) $2,120,889,622 GDPR in Effect 00:37:37 Response Due 71:22:23 Maximum Fine (Google) $2,120,889,792 GDPR in Effect 00:37:38 Response Due 71:22:22 Maximum Fine (Google) $2,120,889,962 GDPR in Effect 00:37:39 Response Due 71:22:21 Maximum Fine (Google) $2,120,890,133 GDPR in Effect 00:37:40 Response Due 71:22:20 Maximum Fine (Google) $2,120,890,304 GDPR in Effect 00:37:41 Response Due 71:22:19 Maximum Fine (Google) $2,120,890,474 GDPR in Effect 00:37:42 Response Due 71:22:18 Maximum Fine (Google) $2,120,890,645 GDPR in Effect 00:37:43 Response Due 71:22:17 Maximum Fine (Google) $2,120,890,815 GDPR in Effect 00:37:42 Response Due 71:22:18 Maximum Fine (Google) $2,120,890,986 GDPR in Effect 00:37:43 Response Due 71:22:17 Maximum Fine (Google) $2,120,891,156 GDPR in Effect 00:37:44 Response Due 71:22:16 Maximum Fine (Google) $2,120,891,327 GDPR in Effect 00:37:45 Response Due 71:22:15 Maximum Fine (Google) $2,120,891,497 GDPR in Effect 00:37:46 Response Due 71:22:14 Maximum Fine (Google) $2,120,891,667 GDPR in Effect 00:37:47 Response Due 71:22:13 Maximum Fine (Google) $2,120,891,838 GDPR in Effect 00:37:48 Response Due 71:22:12 Maximum Fine (Google) $2,120,891,838 GDPR in Effect 00:37:49 Response Due 71:22:11 Maximum Fine (Google) $2,120,892,008 GDPR in Effect 00:37:50 Response Due 71:22:10 Maximum Fine (Google) $2,120,892,179 GDPR in Effect 00:37:51 Response Due 71:22:09 Maximum Fine (Google) $2,120,892,349 GDPR in Effect 00:37:52 Response Due 71:22:08 Maximum Fine (Google) $2,120,892,520 GDPR in Effect 00:37:53 Response Due 71:22:07 Maximum Fine (Google) $2,120,892,690 GDPR in Effect 00:37:54 Response Due 71:22:06 Maximum Fine (Google) $2,120,892,861 GDPR in Effect 00:37:55 Response Due 71:22:05 Maximum Fine (Google) $2,120,893,031 GDPR in Effect 00:37:56 Response Due 71:22:04 Maximum Fine (Google) $2,120,893,202 GDPR in Effect 00:37:57 Response Due 71:22:03 Maximum Fine (Google) $2,120,893,372 GDPR in Effect 00:37:58 Response Due 71:22:02 Maximum Fine (Google) $2,120,893,543 GDPR in Effect 00:37:59 Response Due 71:22:01 Maximum Fine (Google) $2,120,893,713 GDPR in Effect 00:38:00 Response Due 71:22:00 Maximum Fine (Google) $2,120,893,884 GDPR in Effect 00:38:01 Response Due 71:21:59 Maximum Fine (Google) $2,120,894,054 GDPR in Effect 00:38:02 Response Due 71:21:58 Maximum Fine (Google) $2,120,894,224 GDPR in Effect Response Due Maximum Fine (Google) GDPR in effect now!
Article 22
Is “adversarial examples” an Adversarial Example?
6 Papers on “Adversarial Examples” (Google Scholar) 675 0 200
400 600 800 1000 1200 2018 (5/22) 2017 2016 2015 2014 2013 1241.5 papers expected in 2018!
Adversarial Examples before Deep Learning 7
Adversarial Examples “before ML” Péter Ször (1970-2013)
Adversarial Examples before “Oakland” 9
Adversarial Examples before “Oakland” 10 The crowd, uncertain, was split
by opposing opinions. Then Laocoön rushes down eagerly from the heights of the citadel, to confront them all, a large crowd with him, and shouts from far off: ‘O unhappy citizens, what madness? ... Do you think the enemy’s sailed away? Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)
11 How should we define “adversarial example”?
How should we define “adversarial example”? 12 “Adversarial examples are
inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.” Ian Goodfellow, earlier today
Adversarial Examples across Domains 13 Domain Classifier Space “Reality” Space
Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Later Next Not DL
Malware Adversarial Examples 14 Classifier Space Oracle Space actual program
execution https://github.com/cuckoosandbox Cuckoo
“Oracle” Definition 15 Given seed sample, !, !" is an
adversarial example iff: # !" = % Class is % (for malware, %= “benign”) ℬ !′) = ℬ(! Behavior we care about is the same Malware: evasive variant preserves malicious behavior of seed, but is classified as benign No requirement that ! ~ !′ except through ℬ.
Definitions suggest Attacks 16 Given seed sample, !, !" is
an adversarial example iff: # !" = % Class is % (for malware, %= “benign”) ℬ !′) = ℬ(! Behavior we care about is the same Generic attack: heuristically explore input space for !′ that satisfies definition.
Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation
Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101
Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation
01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation
01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable
Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces
Fitness Function Assumes lost malicious behavior will not be recovered
!itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise
0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations
Attacks suggest Defenses* 29 Definitions suggest Attacks
Attacks suggest Defenses* 30 * That only work against a
very particular instantiation of that attack. Definitions suggest Attacks Maginot Line Enigma Plugboard
Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score
Malware Seed (sorted by original score) Discovered Evasive Variants
Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust
threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)
Variants found with threshold = 0.25 Variants found with threshold
= 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)
Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF
Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious
PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017
36 Defenses should be designed around clear definitions of adversary
goals and capabilities, not around thwarting particular attacks. (The second oldest principle in security.)
Adversarial Examples across Domains 37 Domain Classifier Space “Reality” Space
Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Next Done Not DL
38 Adversarial Examples across Domains Domain Classifier Space “Reality” Space
Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Fixing (Breaking?) the Definition
Image Classification DNN Classifier !(#) = & Human Perception !∗(#)
= ( 39 Fixing (Breaking?) the Definition
Well-Trained Classifier 40 Model and visualization based on work by
Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)
Adversarial Examples 41 Model and visualization based on work by
Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)
Misleading Visualization 42 Cartoon Reality 2 dimensions thousands of dimensions
few samples near boundaries all samples near boundaries every sample near 1-3 classes every sample near all classes Classifier Space (DNN Model)
Adversarial Examples 43 Adversary’s goal: find a small perturbation that
changes class for classifier, but imperceptible to oracle. Classifier Space (DNN Model) “Oracle” Space (human perception)
44 Battista Biggio, et al. ECML-KDD 2013
“Biggio” Definition 45 Assumption (to map to earlier definition): small
perturbation does not change class in “Reality Space” Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *.
“Biggio” Definition 46 Given seed sample, !, !" is an
adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *. Problem #1: Every model with boundaries has adversarial examples. Problem #2: Very unnatural limit on adversary strength. Problem #3: Values all adversarial examples equally.
DSML Papers 47 Biggio Definition (6) No Version On-Line (5)
Oracle Definition (3) KFS, YKLALYP, RG AHHO, CW, GLSQ, HD, MW, SBC Building Classifiers (5) AMNKV, CSS, DAF, SHWS, ZCPS, Software (2) BGS, XLZX
Impact of Adversarial Perturbations 48 Distance between layer output and
its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile
Impact of Adversarial Perturbations 49 Distance between layer output and
its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile Mainuddin Jonas
Impact of Adversarial Perturbations 50 Distance between layer output and
its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet
Impact of Adversarial Perturbations 51 Distance between layer output and
its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet
Definitions Suggest Defenses 52 Given seed sample, !, !" is
an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *. Suggested Defense: given an input !∗, see how the model behaves on 0(!∗) where 0(3) reverses transformations in ∆-space.
Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* ,
… , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Weilin Xu Yanjun Qi
Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* ,
… , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Feature Squeezer coalesces similar inputs into one point: • Barely change legitimate inputs. • Destruct adversarial perturbations.
Coalescing by Feature Squeezing 55 Metric Space 1: Target Classifier
Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 56 Signal Quantization
Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 57 Signal Quantization Seed 1 1 4 2 2 1 1 1 1 1 CW 2 CW ∞ BIM FGSM
Other Potential Squeezers 58 C Xie, et al. Mitigating Adversarial
Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means
“Feature Squeezing” (Vacuous) Conjecture For any distance-limited adversarial method, there
exists some feature squeezer that accurately detects its adversarial examples. 59 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.
Feature Squeezing Detection Model (7-layer CNN) Model Model Bit Depth-
1 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max '( )* , )( , '( )* , )2 > -
Detecting Adversarial Examples Distance between original input and its squeezed
version Adversarial inputs (CW attack) Legitimate inputs
62 0 200 400 600 800 0.0 0.4 0.8 1.2
1.6 2.0 Number of Examples Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4% Training a detector (MNIST) set the detection threshold to keep false positive rate below target
ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median
2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(() (*+ , {*) , *. , */ }) > 3 Model Non-local Mean Prediction3
64 0 20 40 60 80 100 120 140 0.0
0.4 0.8 1.2 1.6 2.0 Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 1.24 detection: 85%, FP < 5% Training a detector (ImageNet)
How should we evaluate defenses? 65
Threat Models Oblivious attack: The adversary has full knowledge of
the target model, but is not aware of the detector. Adaptive attack: The adversary has full knowledge of the target model and the detector. 66
(Generic) Adaptive Adversary Adaptive CW 2 attack, unbounded adversary: Warren
He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song, Adversarial Example Defense: Ensembles of Weak Defenses are not Strong, USENIX WOOT’17. !"#"!"$% & '( − * + , ∗ Δ ', '( + 0 ∗ 12 3456%('′) 67 Misclassification term Distance term Detection term
Adaptive Adversarial Examples 68 No successful adversarial examples were found
for images originally labeled as 3 or 8. Mean L2 2.80 4.14 4.67 Attack Untargeted Targeted (next) Targeted (least likely)
Adaptive Adversary Success Rates 69 0.68 0.06 0.01 0.44 0.01
0.24 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Adversary’s Success Rate Clipped ε Targeted (Next) Targeted (LL) Untargeted Unbounded Typical !
Revisiting Attacker’s Goal Find one adversarial example Find many adversarial
examples Suya Yuan Tian
Attacker Visibility “White-box attacker” Knows model architecture and all parameters
“Black-box attacker” Interacts with model through API Limited number of interactions Output is <class, confidence> vector decision-based: output is just class “bird”, 0.09 “horse”, 0.84 ...
Black-Box Cost Variance 72 CIFAR-10 MNIST Number of Queries Number
of Adversarial Examples Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISec 2017. (1000 queries per iteration, 256 max iters) fails for 14 117,820 Average for MNIST 60,378 CIFAR-10 overall average 15,795 (26%) CIFAR-10 lowest-cost 20 ave: 50,240 Average for lowest 20 Target: least-likely class max ! 2 = 3 256,000
Easy and Hard Examples “Easy” “Hard” “Easy” images: 5 with
fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries (failed) 256,000 query attempts without success 14,592 0 → (least likely) 1 43,008 43,776 49,152 49,920
Easy and Hard Examples “Easy” “Hard” “Easy” images: 5 with
fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries (failed) 256,000 query attempts without success 14,592 0 → (least likely) 1 “airplane” → “frog” 43,008 43,776 49,152 49,920 9,728 10,496 10,752 12,288 13,824 256,000 query attempts without success
White-Box Cost Variance 75 CIFAR-10 MNIST Number of Iterations Number
of Adversarial Examples Carlini-Wagner L 2 Attack 82 CIFAR-10 average Target: least-likely class MNIST: max ! 2 = 3.0 CIFAR-10: max ! 2 = 1.0 2000 566 Average for MNIST 174 Average for lowest 20
White-Box Cost Variance 76 CIFAR-10 MNIST Number of Iterations Number
of Adversarial Examples Carlini-Wagner L 2 Attack Target: least-likely class MNIST: max ! 2 = 3.0 CIFAR-10: max ! 2 = 1.0 2000 566 174 Average for lowest 20 CIFAR-10 lowest 20 (average: 3.6) 82 CIFAR-10 average
How does cost-variance impact attack cost? 77
78 CIFAR-10 Average queries per AE found (× 10$) Random
target selection Greedy heuristic Oracle Optimal Simple Greedy Search Works Well Number of Adversarial Examples MNIST Number of Adversarial Examples ZOO Black-Box Attack Target: 20 MNIST CIFAR Greedy/Optimal 1.50 1.30 Random/Optimal 2.37 3.86 Target: 50 MNIST CIFAR Greedy/Optimal 1.46 1.21 Random/Optimal 1.96 2.45
White-Box Batch Attack Cost 79 Random target selection Greedy heuristic
Oracle Optimal CIFAR-10 Average iterations per AE found Number of Adversarial Examples MNIST Number of Adversarial Examples CW L 2 Attack Target: 20 MNIST CIFAR Greedy/Optimal 2.01 1.22 Random/Optimal 3.20 20.05 Target: 50 MNIST CIFAR Greedy/Optimal 1.76 1.50 Random/Optimal 2.45 15.11
Madry Defense 80 Accuracy “9” “7” “0” Batch (10 samples,
sorted by initial distance) MNIST airplane cars deer CIFAR-10 Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. https://github.com/Madr yLab/mnist_challenge
History of the destruction of Troy, 1498 Conclusion
Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$
information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"#* white-box, black-box making progress? 82
83 Ali Rahimi NIPS Test-of-Time Award Speech (Dec 2017) ”If
you're building photo- sharing systems alchemy is okay but we're beyond that; now we're building systems that govern healthcare and mediate our civic dialogue”
84 Ali Rahimi NIPS Test-of-Time Award Speech (Dec 2017) ”If
you're building photo- sharing systems alchemy is okay but we're beyond that; now we're building systems that govern healthcare and mediate our civic dialogue”
Alchemy (~700 − 1660) Well-defined, testable goal (turn lead into
gold) Established theory (four elements: earth, fire, water, air) Methodical experiments and lab techniques (Jabir ibn Hayyan in 8th century) Wrong and ultimately unsuccessful, but led to modern chemistry.
86 Domain Classifier Space “Reality” Space Trojan Wars Judgment of
Trojans !(#) = “gi=” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *
87 Domain Classifier Space “Reality” Space Trojan Wars Judgment of
Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Academic Research Conferences, Fun !(+,s) = “awesome” Systems, Society, Ideas !∗ +,s = ?
David Evans University of Virginia [email protected] EvadeML.org Weilin Xu Yanjun
Qi Fnu Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel
89
David Evans University of Virginia [email protected] EvadeML.org Weilin Xu Yanjun
Qi Fnu Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel
91 @_youhadonejob1
92