Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DeepXplore: Automated Whitebox Testing of Deep ...

Liang Gong
March 14, 2018

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Liang Gong

March 14, 2018
Tweet

More Decks by Liang Gong

Other Decks in Research

Transcript

  1. Presented by Liang Gong DeepXplore: Automated Whitebox Testing of Deep

    Learning Systems Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.
  2. Motivation Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors for corner cases? 2
  3. Motivation Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors of corner cases? 3
  4. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test traditional software to expose erroneous behaviors for corner cases? Software System Test Input Test Output • concolic execution • random testing • coverage-guided fuzz testing • … 4
  5. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test DL systems to expose erroneous behaviors for corner cases? Test Input Test Output • concolic execution? • random testing? • coverage-guided fuzz testing? • … 5
  6. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test DL systems to expose erroneous behaviors for corner cases? Test Input Test Output Their solution • coverage-guided & differential-guided Testing 6
  7. Their Solution Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Test Input Test Output Research Questions: • How to define coverage for DL system? • What is differential-guided? • How to fuzz test input based on those metrics? • How to get test oracle? coverage-guided & differential-guided fuzz testing 7
  8. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Test Input Test Output • LOC coverage? ~100% • Coverage based on the # of neurons processed? 100% • Coverage based on the # of neurons activated? Interesting… How to define coverage for DL system? 8
  9. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • Coverage based on the # of activated neurons Neurons often correspond to self- extracted features at different levels. My Comment: activating neurons  triggering conditionals in programs. 9
  10. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t 10
  11. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t N 11
  12. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? • How to generate input based on feedback? 12
  13. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 13
  14. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 14
  15. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 15
  16. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 16
  17. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing  Max the diff! Test Input Test Output Test Output If different, one NN might be wrong. 17
  18. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 18
  19. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 19
  20. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 20
  21. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 21
  22. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 22
  23. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 23
  24. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 24
  25. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 25
  26. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 26
  27. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 28
  28. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 29
  29. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 30
  30. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 31
  31. Input x Walk through the Algorithm C1  0.3 C2

     0.6 C  0.1 C1  0.05 C2  0.05 C  0.9 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 32
  32. Dataset: Contagio/VirusTotal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Dataset of benign and malicious PDF documents • 5000 benign PDF files • 12,205 malicious PDF files Extract 135 static features as DNN input DNN Model Variations: • 1 input layer • 2 - 4 fully connect layers • 1 softmax output layer (benign or malicious) 34
  33. Dataset: Drebin Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Dataset of benign and malicious Android apps • 123,453 benign apps • 5,560 malicious apps Extract 545,333 binary features as DNN input DNN Model Variations: • … 35
  34. Domain-specific Constraints Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Goal: generate more realistic input 36
  35. Input x Domain-specific Constraints grad = constraint( ) Liang Gong,

    Electric Engineering & Computer Science, University of California, Berkeley. Goal: generate more realistic input 37
  36. #1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. grad = constraint( ) 38
  37. #1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. 39 grad = mean( )
  38. #1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. grad = constraint( ) 40
  39. #2 Simulate Attacks by Masking Ivan Evtimov, Kevin Eykholt, Earlence

    Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song: Robust Physical-World Attacks on Machine Learning Models. CoRR abs/1707.08945 (2017)
  40. Liang Gong, Electric Engineering & Computer Science, University of California,

    Berkeley. 42 pick a m x n rectangle at a random position and patch #2 Simulate Attacks by Masking Cameras
  41. #2 Simulate Attacks by Masking Cameras Liang Gong, Electric Engineering

    & Computer Science, University of California, Berkeley. grad = constraint( ) 43
  42. Liang Gong, Electric Engineering & Computer Science, University of California,

    Berkeley. 44 pick a m x m rectangle at a random position and patch if mean of gradient > 0 #3 Simulate Dirt on Cameras
  43. #3 Simulate Dirt on Cameras Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. grad = constraint( ) 45
  44. #4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer

    Science, University of California, Berkeley. grad = constraint( ) For Android/PDF malware dataset : • turning binary features from 0 to 1 • add features (add permissions in the manifest files) • Deleting features (1 to 0) is not allowed • ensure no functionality changes due to insufficient permissions 46
  45. #4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer

    Science, University of California, Berkeley. Malware incorrectly classified as benign after adding permissions in the input. 47
  46. Effects of Neuron Coverage (NC) Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. L1 distance between generated inputs Higher numbers are better. Neuron Coverage # of difference inducing inputs Optimize without NC Optimize with NC 50
  47. Time to Get the 1st Counter Example Liang Gong, Electric

    Engineering & Computer Science, University of California, Berkeley. # of seed inputs Lower numbers are better. 51
  48. Counter Examples for Improving DNN Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. Higher is better. 52
  49. More Diff in Models  Harder to find fault MNIST

    training set (60,000 samples) and LeNet-1 trained with 10 epochs as the control group