DaST

DaST: Data-free Substitute Training for Adversarial Attack [CVPR2020] M1, Kaede
Shiohara Mingyi Zhou, Jing Wu, Yipeng Liu, Shuaicheng Liu, Ce Zhu University of Electronic Science and Technology of China Megvii Technology 1

Contents • Explanation part • Main contribution • Traditional Adversarial
Attack methods • Idea • Attack Scenario • Adversarial Generator-Classifier Training • Experiments • Visualizations • Re-implementation part • Model Architecture • Experiment on MNIST 2

Explanation part 3

Main contribution (Why this paper is accepted) •The first to
train substitute model without real training data in two attack scenario. 4

Traditional Adversarial Attack methods • Gradient-based (e.g. FGSM[1]) ✓Need pretrained
model which imitates target model -> Need real training data (That is very difficult in real problems!) • Score-based, Decision-based (e.g. ZOO[2]) ✓Need many query on test ✓Not need substitute model [1]Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. Inter- national Conference on Learning Representations (ICLR), 2015 [2] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black- box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15‒26. ACM, 2017 5

Traditional Adversarial Attack methods • Gradient-based (e.g. FGSM[1]) ✓Need pretrained
model which imitates target model -> Need real training data (That is very difficult in real problems!) • Score-based, Decision-based (e.g. ZOO[2]) ✓Need many query on test ✓Not need substitute model • DaST(proposed mothod) : Not attack method • Train substitute model without real training data -> useful when we need substitute model as Gradient-based attack methods [1]Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. Inter- national Conference on Learning Representations (ICLR), 2015 [2] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black- box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15‒26. ACM, 2017 6 Give a Solution

・Naïve Gradient-based Attack online local Target Substitute Loss func online
local Target Substitute Loss func ・Gradient-based Attack with DaST It is difficult for attacker to collect same training datasets as target model used lol. Trained by backpropagation Trained by backpropagation No need to collect training datasets because DaST generates them. T T D D G 7

Idea Use image generator(G ) for training substitute model(D )
• Objective of D • Imitate attacked model(T ) • Objective of G • Generate new samples with the given label n ( ) • Generate new samples that maximizes distance between D and T ( ) (CE: cross entropy) ( is more stable on training than ) to increase diversity of generated samples 8

Idea Use image generator(G ) for training substitute model(D )
• Objective of D • Imitate attacked model(T ) • Objective of G • Generate new samples with the given label n ( ) • Generate new samples that maximizes distance between D and T ( ) Canʼt access T ʼs grad In training progresses, D≒T (CE: cross entropy) to increase diversity of generated samples 9

Attack scenario • Label-only • Attackers can probe the output
hard-label of the attacked model • Probability-only • Attackers can probe the output probability of the attacked model prob label prob prob (e.g. [0, 0, …, 0, 1, 0, …, 0]) (e.g. [0.03, 0.1, …, 0.05, 0.7, 0.01, …, 0.04]) 10

Adversarial Generator-Classifier Training N: # of classes 11

Adversarial Generator-Classifier Training Li generates samples with label i N:
# of classes 12

Adversarial Generator-Classifier Training Conv layers are shared by all Li
N: # of classes 13

Adversarial Generator-Classifier Training ( ) 14

Experiment on MNIST Substitute model type Pretrained: train with same
dataset as attacked model used DaST-P: probability-only scenario DaST-L: label-only scenario Attack method zzzzzzzzz Attack successful rate Attack type (%) 15

Experiment on MNIST DaST-P > Pretrained (> DaST-L) (DaST-P >)
DaST-L > Pretrained • Attacked : 4 convs net • Substitute : 5 convs net 16 Surprisingly, Attack Successful Rate of DaST is higher than one of Pretrained.

Experiment on MNIST • Attacked : 4 convs net •
Substitute S/M/L : 3/4/5 convs net Large > Small ≧ Medium 17

Experiment on CIFAR-10 DaST-P > Pretrained (> DaST-L) (DaST-P >)
DaST-L > Pretrained • Attacked : VGG16 • Substitute : ResNet50 18

Experiment on CIFAR-10 VGG13 > ResNet50 > ResNet18 • Attacked
: VGG16 • Substitute : VGG16/ResNet18/ResNet50 Small model is better unlike in MNIST 19

Visualization 20

Experiment on Microsoft Azure (online model) DaST-L > DaST-P >
Pretrained • Attacked : unknown • Substitute : 5 convs net The low attack successful rate of ʻpretrainedʼ implies that unknown model is very different from substitute model. 21

Visualization DaST generates ʻsingularʼ images because of first term e-d(T(X),D(X))
of LG 22

Re-implementation part 23

Model Architecture 24

Model Architecture ・論⽂に層の数やパラメータなどの記載なし 25

Model Architecture ・論⽂に層の数やパラメータなどの記載なし 26

Model Architecture ・論⽂には ”3(,4,5) convolutional layers“ としか記載がない 27

Training α=0.2 ・Dataset : MNIST ・Scenario : Non-Targeted, Probability-only/Label-only ・Optimizer
: Adam(lr=0.0001) (論⽂に記載なし) ・# of samples : 記載がなかったので⼗分に繰り返しを⾏った・Attack method : FGSM 以下の設定で再現実験を⾏った 28

Result (Prob-only) 再現実験ではLG やLD は下がったが、MNISTに対するAccuracyが論⽂ほど上がらなかった。 -> その結果、Target modelに対する有効なAdversarial Examplesが⽣成できなかった(ASR が論⽂ほど上がらなかった)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 379 393 407 421 435 449 463 477 491 Epoch Acc_mnist Acc_synth ASR 0 0.2 0.4 0.6 0.8 1 1 20 39 58 77 96 115 134 153 172 191 210 229 248 267 286 305 324 343 362 381 400 419 438 457 476 495 Epoch LC 0 0.001 0.002 0.003 0.004 0.005 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 401 421 441 461 481 Epoch LD = 29

Result (Prob-only) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.8 0.9 1 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 379 393 407 421 435 449 463 477 491 Epoch Acc_mnist Acc_synth ASR 0 0.2 0.4 0.6 0.8 1 1 20 39 58 77 96 115 134 153 172 191 210 229 248 267 286 305 324 343 362 381 400 419 438 457 476 495 Epoch LC 0 0.001 0.002 0.003 0.004 0.005 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 401 421 441 461 481 Epoch LD = 30 Acc_synth: ⽣成された画像に対する代替モデルの精度 Acc_mnist: MNISTのテストセット(10000サンプル) に対する代替モデルの精度 ASR: 代替モデルでのAttack Successful Rate 学習が不安定精度が頭打ちになった損失がすぐに頭打ちになった再現実験ではLG やLD は下がったが、MNISTに対するAccuracyが論⽂ほど上がらなかった。 -> その結果、Target modelに対する有効なAdversarial Examplesが⽣成できなかった(ASR が論⽂ほど上がらなかった)

0 0.5 1 1.5 2 1 20 39 58 77
96 115 134 153 172 191 210 229 248 267 286 305 324 343 362 381 400 419 438 457 476 495 Epoch LD Result (Label-only) Label-onlyの場合でも、Probability-onlyの場合のようにMNISTのテストセットに対する代替モデルの精度が0.4程度で学習が進まなくなってしまった。 0 0.2 0.4 0.6 0.8 1 1 20 39 58 77 96 115 134 153 172 191 210 229 248 267 286 305 324 343 362 381 400 419 438 457 476 495 Epoch LC = 31 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 379 393 407 421 435 449 463 477 491 Epoch Acc_mnist Acc_synth ASR

Result 論⽂に記載されている値を再現できなかった原因として以下が考えられる • 学習の難しさ • 本⼿法では通常のGANの学習のようにGとDのミニマックスゲームになっている (p.8,9 参照) 実際に、前ページで⽰したように学習が不安定であった。 •
実データ(MNIST)は代替モデルからは invisibleであり、精度が保証されない前ページの実験では提案されているLD やLG がきちんと下がっているにも関わらず、MNISTに対する精度は上がらなかった。以上より、モデルのハイパパラメータや学習⽅法の詳細が省略されている原論⽂の情報だけでは実験結果が再現できない可能性がある。 ( ) 32 (※再現実験に使⽤したパラメータはいくつか⾏った実験のうち最良のものを載せている) Code URL: https://github.com/mapooon/DaST_reimplement

DaST

DaST

Kaede Shiohara

Other Decks in Research

Featured

Transcript

DaST: Data-free Substitute Training for Adversarial Attack [CVPR2020] M1, Kaede

Contents • Explanation part • Main contribution • Traditional Adversarial

Explanation part 3

Main contribution (Why this paper is accepted) •The first to

Traditional Adversarial Attack methods • Gradient-based (e.g. FGSM[1]) ✓Need pretrained

Traditional Adversarial Attack methods • Gradient-based (e.g. FGSM[1]) ✓Need pretrained

・Naïve Gradient-based Attack online local Target Substitute Loss func online

Idea Use image generator(G ) for training substitute model(D )

Idea Use image generator(G ) for training substitute model(D )

Attack scenario • Label-only • Attackers can probe the output

Adversarial Generator-Classifier Training N: # of classes 11

Adversarial Generator-Classifier Training Li generates samples with label i N:

Adversarial Generator-Classifier Training Conv layers are shared by all Li

Adversarial Generator-Classifier Training ( ) 14

Experiment on MNIST Substitute model type Pretrained: train with same

Experiment on MNIST DaST-P > Pretrained (> DaST-L) (DaST-P >)

Experiment on MNIST • Attacked : 4 convs net •

Experiment on CIFAR-10 DaST-P > Pretrained (> DaST-L) (DaST-P >)

Experiment on CIFAR-10 VGG13 > ResNet50 > ResNet18 • Attacked

Visualization 20

Experiment on Microsoft Azure (online model) DaST-L > DaST-P >

Visualization DaST generates ʻsingularʼ images because of first term e-d(T(X),D(X))

Re-implementation part 23

Model Architecture 24

Model Architecture ・論⽂に層の数やパラメータなどの記載なし 25

Model Architecture ・論⽂に層の数やパラメータなどの記載なし 26

Model Architecture ・論⽂には ”3(,4,5) convolutional layers“ としか記載がない 27

Training α=0.2 ・Dataset : MNIST ・Scenario : Non-Targeted, Probability-only/Label-only ・Optimizer

Result (Prob-only) 再現実験ではLG やLD は下がったが、MNISTに対するAccuracyが論⽂ほど上がらなかった。 -> その結果、Target modelに対する有効なAdversarial Examplesが⽣成できなかった(ASR が論⽂ほど上がらなかった)

Result (Prob-only) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0 0.5 1 1.5 2 1 20 39 58 77