Synthesizing Benchmarks for Predictive Modelling (CGO'17)

0d2bf4a55ccf5ca212897c2c09e18c94?s=47 Chris Cummins
February 06, 2017

Synthesizing Benchmarks for Predictive Modelling (CGO'17)

Paper: https://github.com/ChrisCummins/paper-synthesizing-benchmarks

Predictive modelling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the quality of learned models, as they have very sparse training data for what are often high-dimensional feature spaces. What is needed is a way to generate an unbounded number of training programs that finely cover the feature space. At the same time the generated programs must be similar to the types of programs that human developers actually write, otherwise the learning will target the wrong parts of the feature space.

We mine open source repositories for program fragments and apply deep learning techniques to automatically construct models for how humans write programs. We sample these models to generate an unbounded number of runnable training programs. The quality of the programs is such that even human developers struggle to distinguish our generated programs from hand-written code.

We use our generator for OpenCL programs, CLgen, to automatically synthesize thousands of programs and show that learning over these improves the performance of a state of the art predictive model by 1.27×. In addition, the fine covering of the feature space automatically exposes weaknesses in the feature design which are invisible with the sparse training examples from existing benchmark suites. Correcting these weaknesses further increases performance by 4.30×.

0d2bf4a55ccf5ca212897c2c09e18c94?s=128

Chris Cummins

February 06, 2017
Tweet

Transcript

  1. 2.

    Chris Cummins Pavlos Petoumenos Zheng Wang Hugh Leather Lancaster University

    University of Edinburgh University of Edinburgh University of Edinburgh
  2. 3.

    y = f(x) Optimisations Features Cflags # instructions Workgroup size

    Arithmetic density CPU or GPU Dataset size machine learning for compilers
  3. 4.

    Use a GPU Use a CPU Target decision boundary machine

    learning for compilers The idea.
  4. 5.

    machine learning for compilers Use a GPU Use a CPU

    Learned decision boundary Target decision boundary The reality.
  5. 6.

    machine learning for compilers Use a GPU Use a CPU

    Target decision boundary Learned decision boundary sparse data leads to inaccurate models!
  6. 7.

    1. there aren’t enough benchmarks problem statement avg compiler paper

    17 Iris dataset 150 10× MNIST dataset 60,000 103× ImageNet dataset 10,000,000 106×
  7. 13.

    1. a deep learning approach to modeling PL semantics &

    usage 2. first solution for general purpose benchmark synthesis 3. automatic 1.27× speedup 4. improved model designs for further 4.30× speedup contributions
  8. 14.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  9. 15.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker 8078 files 2.8M lines
  10. 16.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  11. 17.

    /* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define

    THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } }
  12. 18.

    /* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define

    THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Is this real, valid OpenCL? Can we minimise non-functional variance?
  13. 19.

    /* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define

    THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments
  14. 20.

    #define CLAMPING #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef

    CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif} __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess
  15. 21.

    float myclamp(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions?
  16. 22.

    float myclamp(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate)
  17. 23.

    float myclamp(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names
  18. 24.

    float A(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void B(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = A(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names
  19. 25.

    float A(float a) { return a > 1.0f ? 1.0f

    : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* b, __global float* c, int d) { int e = get_global_id(0); if (e < d) { c[e] = A(b[e]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names Enforce code style
  20. 26.

    float A(float a) { return a > 1.0f ? 1.0f

    : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* b, __global float* c, int d) { int e = get_global_id(0); if (e < d) { c[e] = A(b[e]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names Enforce code style
  21. 27.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  22. 28.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  23. 29.

    2048 node, 3 layer LSTM 96 char y = f(x)

    next char char array character level language model
  24. 30.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  25. 31.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  26. 32.

    string S = ‘__kernel void A(__global float* a) {’ int

    depth = 1 while depth > 0: char c = predict_next_character(S) if c == ‘{’: depth += 1 if c == ‘}’: depth -= 1 S += c return S kernel synthesis
  27. 33.

    $ clgen model.json sampler.json { “corpus": { “path": “~/kernels” },

    "model_type": "lstm", "rnn_size": 2048, "num_layers": 3, "max_epochs": 50 } { "kernels": { "args": [ "__global float*", "__global float*", "const int" ] }, "sampler": { "max_kernels": 1000 } } Deep Learning Program Generator.
  28. 35.

    __kernel void A(__global float* a, __global float* b, __global float*

    c, const int d) { int e = get_global_id(0); float f = 0.0; for (int g = 0; g < d; g++) { c[g] = 0.0f; } barrier(1); a[get_global_id(0)] = 2 * b[get_global_id(0)]; }
  29. 36.

    __kernel void A(__global float* a, __global float* b, __global float*

    c, const int d) { int e = get_global_id(0); if (e >= d) { return; } c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4; }
  30. 37.

    __kernel void A(__global float* a, __global float* b, __global float*

    c, const int d) { unsigned int e = get_global_id(0); float16 f = (float16)(0.0); for (unsigned int g = 0; g < d; g++) { float16 h = a[g]; f.s0 += h.s0; f.s1 += h.s1; /* snip ... */ f.sE += h.sE; f.sF += h.sF; } b[e] = f.s0 + f.s1 + f.s2 + f.s3 + f.s4 + f.s5 + f.s6 + f.s7 + f.s8 + f.s9 + f.sA + f.sB + f.sC + f.sD + f.sE + f.sF; }
  31. 38.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  32. 39.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker =
  33. 40.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  34. 41.

    __kernel void A(__global float* a, __global float* b, __global float*

    c, const float d, const int e) { int f = get_global_id(0); if (f >= e) { return; } c[f] = a[f] + b[f] + 2 * c[f] + d + 4; } Payload for size S: rand() * [S] __global float* a rand() * [S] __global float* b rand() * [S] __global float* c rand() const float d S const int e
  35. 42.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  36. 43.

    CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  37. 44.

    A B C D A A else no outputs ≠

    Verify: B A = else non deterministic A ≠ else input insensitive C Generate Inputs A B C D f(x) f(x) f(x) f(x) Compute Outputs
  38. 45.

    2000 benchmarks per machine per day CLgen Host Driver Language

    Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  39. 46.
  40. 48.

    7 benchmarks, 1,000 synthetic benchmarks. 1.27x faster Speedup 1 1.6

    2.2 2.8 3.4 AMD NVIDIA 3.26 1.57 2.5 1.26 Grewe et. al w. CLgen
  41. 49.

    71 benchmarks, 1,000 synthetic benchmarks. 4.30x faster Speedup 1 2

    3 4 5 AMD NVIDIA 5.04 3.56 3.26 1.57 2.5 1.26 Grewe et. al w. CLgen w. Better Features
  42. 50.

    problem: insufficient benchmarks use DL to learn PL semantics and

    usage turing tested! ;-) improved model performance and design concluding remarks
  43. 51.

    Synthesizing Benchmarks for Predictive Modeling Deep Learning Program Generator. http://chriscummins.cc/clgen

    http://chriscummins.cc/cgo17 For the paper, code and data: For the TensorFlow neural network: Consist ent * Complete * Well Docume nted * Easy to R euse * * Ev aluated * CGO * Artifact * AEC