Compiler Fuzzing through Deep Learning (ISSTA'18)

Compiler Fuzzing through Deep Learning (ISSTA'18)

Paper: https://chriscummins.cc/issta18

Random program generation - fuzzing - is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested.

We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative models for compiler inputs. Our approach infers a learned model of the structure of real world code based on a large corpus of open source code. Then, it uses the model to automatically generate tens of thousands of realistic programs. Finally, we apply established differential testing methodologies on them to expose bugs in compilers. We apply our approach to the OpenCL programming language, automatically exposing bugs with little effort on our side. In 1,000 hours of automated testing of commercial and open source compilers, we discover bugs in all of them, submitting 67 bug reports. Our test cases are on average two orders of magnitude smaller than the state-of-the-art, require 3.03x less time to generate and evaluate, and expose bugs which the state-of-the-art cannot. Our random program generator, comprising only 500 lines of code, took 12 hours to train for OpenCL versus the state-of-the-art taking 9 man months to port from a generator for C and 50,000 lines of code. With 18 lines of code we extended our program generator to a second language, uncovering crashes in Solidity compilers in 12 hours of automated testing.

0d2bf4a55ccf5ca212897c2c09e18c94?s=128

Chris Cummins

July 16, 2018
Tweet

Transcript

  1. Compiler Fuzzing through Deep Learning https://chriscummins.cc/issta18

  2. Chris Cummins Codeplay Software University of Edinburgh Pavlos Petoumenos Alastair

    Murray Hugh Leather University of Edinburgh University of Edinburgh
  3. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... clang6.0 fuzzing a compiler circa [McKeenan98] a.out error: use of undeclared … Timeout Crash
  4. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... clang6.0 fuzzing a compiler circa [McKeenan98] a.out error: use of undeclared … Timeout Bug! Crash
  5. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... clang6.0 fuzzing a compiler circa [McKeenan98] a.out error: use of undeclared … Timeout Bug?? Bug! Crash
  6. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... gcc5.5 differential testing compilers circa [McKeenan98] a.out clang6.0 clang3.6 a.out a.out
  7. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... gcc5.5 differential testing compilers circa [McKeenan98] a.out clang6.0 clang3.6 a.out a.out $ ./a.out 42 $ ./a.out 42 $ ./a.out -14522312
  8. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... gcc5.5 differential testing compilers circa [McKeenan98] a.out clang6.0 clang3.6 a.out a.out $ ./a.out 42 $ ./a.out 42 $ ./a.out -14522312 Majority rules
  9. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... gcc5.5 differential testing compilers circa [McKeenan98] clang6.0 clang3.6 Also works for build failures
  10. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... gcc5.5 differential testing compilers circa [McKeenan98] clang6.0 clang3.6 error: use of undeclared … error: use of undeclared … a.out Also works for build failures
  11. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... int main( int argc, char** argv) { ... gcc5.5 differential testing compilers circa [McKeenan98] clang6.0 clang3.6 error: use of undeclared … error: use of undeclared … a.out Also works for build failures hard to generate!
  12. an ideal fuzzer 1. Cheap
 Easy to implement and extend


    (Languages and features grow quickly) 2. Interpretable Testcases
 Necessary for triage
 (i.e. 45 lines or less [Sun2016]) 3. Plausible Output
 Representative of handwritten code
 (So that bugs gets fixed)
  13. state-of-the-art: CLSmith Random grammar enumeration. Extensive static analyses support subset

    of OpenCL features. Targets compiler middle ends. Incredibly effective!
 100s of bugs to date. https://github.com/ChrisLidbury/CLSmith #include "CLSmith.h" struct S0 { int32_t g_4[4][10]; ... }; kernel void A(global ulong *r) { int i, j, k; struct S0 c_1856; struct S0* p_1855 = &c_1856; c_1856 = c_1857; func_1(p_1855); barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); for (i = 0; i < 4; i++) for (j = 0; j < 10; j++) …>g_4[i][j], "p_1855->g_4[i][j]", print_hash_value); result[get_linear_global_id()] = crc64_context ^ 0xFFFFFFFFFFFFFFFFUL; }
  14. state-of-the-art: CLSmith https://github.com/ChrisLidbury/CLSmith 1. Cheap ✖ nope!
 Years to develop!

    50k lines of C++.
 Each PL feature engineered by hand. 2. Interpretable Testcases ✖ nope!
 Avg. 1200 lines (excluding headers).
 Requires reduction: ~4 hours / test. 3. Plausible Output ✖ nope!
 Unusual and restricted combinations of PL features.
 87 dials control “shape” of output - hand tuned.
  15. Automatic inference of fuzzers from examples. 102x less code than

    state-of-art. Similar bug finding power, simpler test cases. contributions
  16. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  17. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign Mined from • 1k repos • 10k files • 2.0M LOC Filtered by oracle compiler.
  18. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  19. vocabulary encoding #define MY_CONST 3.14 // A very simple kernel.

    kernel void Foo(global float* input, const float x) { input[get_global_id(0)] *= MY_CONST + x; } kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 1. Preprocess. 2. Enforce code style. 3. Rename variables. 4. Rename functions.
  20. vocabulary encoding kernel void A(global float* a, const float b)

    { a[get_global_id(0)] *= 3.14 + b; } Vocab: Encoded:
  21. vocabulary encoding Token Index kernel 0 kernel void A(global float*

    a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 Vocab: Encoded:
  22. vocabulary encoding Token Index kernel 0 [space] 1 kernel void

    A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 Vocab: Encoded:
  23. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 Vocab: Encoded:
  24. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 Vocab: Encoded: 1
  25. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 Vocab: Encoded:
  26. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 Vocab: Encoded:
  27. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 global 5 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 Vocab: Encoded:
  28. vocabulary encoding kernel void A(global float* a, const float b)

    { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 1 Vocab: Encoded: Token Index kernel 0 [space] 1 void 2 A 3 ( 4 global 5
  29. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 global 5 float 6 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 1 6 Vocab: Encoded:
  30. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 global 5 float 6 * 7 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 1 6 7 Vocab: Encoded:
  31. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 global 5 float 6 * 7 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 1 6 7 Vocab: Encoded: 1
  32. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 global 5 float 6 * 7 a 8 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 1 6 7 1 8 Vocab: Encoded:
  33. vocabulary encoding Token Index kernel 0 [space] 1 void 2

    A 3 ( 4 global 5 float 6 * 7 a 8 Token Index ] 18 = 19 3 20 . 21 1 22 4 23 + 24 ; 25 Token Index , 9 const 10 b 11 ) 12 { 13 \n 14 [ 15 get_global_id 16 0 17 kernel void A(global float* a, const float b) { a[get_global_id(0)] *= 3.14 + b; } 0 1 2 1 3 4 5 1 6 7 1 8 ... Vocab: Encoded:
  34. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  35. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  36. neural network 181 tokens 181 tokens Input: 30M token corpus


    Learns probability distribution over corpus. < 500 lines of code, 12 hours training on GPU. 512x2 LSTM 0 1 2 ...
  37. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  38. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  39. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void … Input:
  40. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void … Input:
  41. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A … Input:
  42. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A … Input: 3
  43. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A … Input: 3
  44. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A( … Input: 3
  45. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A( … Input: 3 4
  46. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A( … Input: 3 4
  47. synthesizer + harness 0 1 2 1 1. Seed the

    model with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: 3 4 kernel void A(global 0 1 kernel ' ' void A ( global int double float
  48. synthesizer + harness 1 2 1 1. Seed the model

    with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: 3 4 … 5
  49. synthesizer + harness 1 2 1 1. Seed the model

    with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: 3 4 … 5
  50. synthesizer + harness 1 2 1 1. Seed the model

    with the start of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: 3 4 … 5
  51. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: … 1 2 1 3 4 5
  52. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: … 1 2 1 3 4 5 1
  53. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global … Input: … 1 2 1 3 4 5 1
  54. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global int … Input: … 1 2 1 3 4 5 1
  55. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global int … Input: … 1 2 1 3 4 5 6 3 4 5 1
  56. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. Decoded: Output: 0 1 kernel ' ' void A ( global int double float kernel void A(global int* a) { /* snip */ } … Input: … 1 2 1 3 4 5 1 1 2 1 3 4 5 6 3 4 5 1
  57. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. 3. Can we parse signature? Yes: Generate input data, compile and run it. No: Compile it but don’t run it. Decoded: kernel void A(global int* a) { /* snip */ }
  58. synthesizer + harness 1. Seed the model with the start

    of a program. 2. Predict tokens until { } brackets balance. 3. Can we parse signature? Yes: Generate input data, compile and run it. No: Compile it but don’t run it. Decoded: kernel void A(global int* a) { /* snip */ }
  59. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  60. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign Standard majority voting. False-positive filtering of runtime behavior: Combination of off-the-shelf tools and ad-hoc filters. Took ~1 dev-day to develop.
  61. int main( int argc, char** argv) { ... int main(

    int argc, char** argv) { ... Training Corpus Vocabulary Encoding Neural Network Synthesizer our approach Harness int main( int argc, char** argv) { ... int main( int argc, char** argv) { ... Generated Test Cases Voting Heuristics Testing Campaign
  62. how well does it work?

  63. 48 hours per testbed testing campaign 10 OpenCL compilers 3

    GPUs, 5 CPUs, Xeon Phi, Emulator Test with optimizations on / off Treat as separate testbeds
  64. Errors in ever compiler! Num results (log) 1 100 10,000

    C om piler crash B uild Tim eout B uild Failure Program C rash W rong O utput 69 252 51 860 7,040 results overview
  65. … crashes during parsing / compilation 67 bug reports to

    date… void A() {void* a; uint4 b=0; b=(b>b)?a:a } Affects: Intel OpenCL SDK 1.2.0.25 kernel void A(global int* a) { int b = get_global_id(0); a[b] = (6 * 32) + 4 * (32 / 32) + a; } Affects: Beignet 1.3 “Bad code” finds bugs in error handling
  66. kernel void A() { __builtin_astype(d, uint4); } … crashes during

    type checking 67 bug reports to date… Affects: 6 / 10 compilers we tested Unexpected outcome: Learning from handwritten code leads to bugs found in compiler builtins!
  67. kernel void A(global double* a, global double* b, global double*

    c, int d, int e) { double f;
 int g = get_global_id(0); if (g < e - d - 1) c[g] = (((e) / d) % 5) % (e + d); } 67 bug reports to date… Affects: Intel OpenCL SDK 1.2.0.25 … errors in optimizers CLSmith doesn’t allow
 thread-dependent control flow.
  68. $ docker run chriscummins/opencl_fuzz https://chriscummins.cc/issta18 runs in docker try it

    for yourself! code and paper on GitHub
  69. comparison to CLSmith LLVM robustness experiments extending to a second

    language
 + more! read the paper for … https://chriscummins.cc/issta18 code and paper on GitHub
  70. Compiler Fuzzing through Deep Learning https://chriscummins.cc/issta18 Problem: Compilers inputs are

    hard to generate Use DL to infer PL generator from examples 102x less code than state-of-art, 3.03x faster Lots of bugs!