Synthesizing Benchmarks for Predictive Modelling (CGO'17)

0d2bf4a55ccf5ca212897c2c09e18c94?s=47 Chris Cummins
February 06, 2017

Synthesizing Benchmarks for Predictive Modelling (CGO'17)

Paper: https://github.com/ChrisCummins/paper-synthesizing-benchmarks

Predictive modelling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the quality of learned models, as they have very sparse training data for what are often high-dimensional feature spaces. What is needed is a way to generate an unbounded number of training programs that finely cover the feature space. At the same time the generated programs must be similar to the types of programs that human developers actually write, otherwise the learning will target the wrong parts of the feature space.

We mine open source repositories for program fragments and apply deep learning techniques to automatically construct models for how humans write programs. We sample these models to generate an unbounded number of runnable training programs. The quality of the programs is such that even human developers struggle to distinguish our generated programs from hand-written code.

We use our generator for OpenCL programs, CLgen, to automatically synthesize thousands of programs and show that learning over these improves the performance of a state of the art predictive model by 1.27×. In addition, the fine covering of the feature space automatically exposes weaknesses in the feature design which are invisible with the sparse training examples from existing benchmark suites. Correcting these weaknesses further increases performance by 4.30×.

0d2bf4a55ccf5ca212897c2c09e18c94?s=128

Chris Cummins

February 06, 2017
Tweet

Transcript

  1. Synthesizing Benchmarks for Predictive Modeling http://chriscummins.cc/cgo17

  2. Chris Cummins Pavlos Petoumenos Zheng Wang Hugh Leather Lancaster University

    University of Edinburgh University of Edinburgh University of Edinburgh
  3. y = f(x) Optimisations Features Cflags # instructions Workgroup size

    Arithmetic density CPU or GPU Dataset size machine learning for compilers
  4. Use a GPU Use a CPU Target decision boundary machine

    learning for compilers The idea.
  5. machine learning for compilers Use a GPU Use a CPU

    Learned decision boundary Target decision boundary The reality.
  6. machine learning for compilers Use a GPU Use a CPU

    Target decision boundary Learned decision boundary sparse data leads to inaccurate models!
  7. 1. there aren’t enough benchmarks problem statement avg compiler paper

    17 Iris dataset 150 10× MNIST dataset 60,000 103× ImageNet dataset 10,000,000 106×
  8. 1. there aren’t enough benchmarks 2. more benchmarks = better

    models problem statement
  9. from this to this what we need

  10. mine code from web our approach

  11. model source distr. our approach

  12. sample lang. model our approach

  13. 1. a deep learning approach to modeling PL semantics &

    usage 2. first solution for general purpose benchmark synthesis 3. automatic 1.27× speedup 4. improved model designs for further 4.30× speedup contributions
  14. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  15. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker 8078 files 2.8M lines
  16. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  17. /* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define

    THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } }
  18. /* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define

    THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Is this real, valid OpenCL? Can we minimise non-functional variance?
  19. /* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define

    THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments
  20. #define CLAMPING #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef

    CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif} __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess
  21. float myclamp(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions?
  22. float myclamp(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate)
  23. float myclamp(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names
  24. float A(float in) { return in > 1.0f ? 1.0f

    : in < 0.0f ? 0.0f : in; } __kernel void B(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = A(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names
  25. float A(float a) { return a > 1.0f ? 1.0f

    : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* b, __global float* c, int d) { int e = get_global_id(0); if (e < d) { c[e] = A(b[e]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names Enforce code style
  26. float A(float a) { return a > 1.0f ? 1.0f

    : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* b, __global float* c, int d) { int e = get_global_id(0); if (e < d) { c[e] = A(b[e]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names Enforce code style
  27. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  28. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  29. 2048 node, 3 layer LSTM 96 char y = f(x)

    next char char array character level language model
  30. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  31. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  32. string S = ‘__kernel void A(__global float* a) {’ int

    depth = 1 while depth > 0: char c = predict_next_character(S) if c == ‘{’: depth += 1 if c == ‘}’: depth -= 1 S += c return S kernel synthesis
  33. $ clgen model.json sampler.json { “corpus": { “path": “~/kernels” },

    "model_type": "lstm", "rnn_size": 2048, "num_layers": 3, "max_epochs": 50 } { "kernels": { "args": [ "__global float*", "__global float*", "const int" ] }, "sampler": { "max_kernels": 1000 } } Deep Learning Program Generator.
  34. DEMO (you had to be there)

  35. __kernel void A(__global float* a, __global float* b, __global float*

    c, const int d) { int e = get_global_id(0); float f = 0.0; for (int g = 0; g < d; g++) { c[g] = 0.0f; } barrier(1); a[get_global_id(0)] = 2 * b[get_global_id(0)]; }
  36. __kernel void A(__global float* a, __global float* b, __global float*

    c, const int d) { int e = get_global_id(0); if (e >= d) { return; } c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4; }
  37. __kernel void A(__global float* a, __global float* b, __global float*

    c, const int d) { unsigned int e = get_global_id(0); float16 f = (float16)(0.0); for (unsigned int g = 0; g < d; g++) { float16 h = a[g]; f.s0 += h.s0; f.s1 += h.s1; /* snip ... */ f.sE += h.sE; f.sF += h.sF; } b[e] = f.s0 + f.s1 + f.s2 + f.s3 + f.s4 + f.s5 + f.s6 + f.s7 + f.s8 + f.s9 + f.sA + f.sB + f.sC + f.sD + f.sE + f.sF; }
  38. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  39. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker =
  40. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  41. __kernel void A(__global float* a, __global float* b, __global float*

    c, const float d, const int e) { int f = get_global_id(0); if (f >= e) { return; } c[f] = a[f] + b[f] + 2 * c[f] + d + 4; } Payload for size S: rand() * [S] __global float* a rand() * [S] __global float* b rand() * [S] __global float* c rand() const float d S const int e
  42. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  43. CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith

    Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  44. A B C D A A else no outputs ≠

    Verify: B A = else non deterministic A ≠ else input insensitive C Generate Inputs A B C D f(x) f(x) f(x) f(x) Compute Outputs
  45. 2000 benchmarks per machine per day CLgen Host Driver Language

    Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker
  46. results

  47. 52% blind test http://humanorrobot.uk

  48. 7 benchmarks, 1,000 synthetic benchmarks. 1.27x faster Speedup 1 1.6

    2.2 2.8 3.4 AMD NVIDIA 3.26 1.57 2.5 1.26 Grewe et. al w. CLgen
  49. 71 benchmarks, 1,000 synthetic benchmarks. 4.30x faster Speedup 1 2

    3 4 5 AMD NVIDIA 5.04 3.56 3.26 1.57 2.5 1.26 Grewe et. al w. CLgen w. Better Features
  50. problem: insufficient benchmarks use DL to learn PL semantics and

    usage turing tested! ;-) improved model performance and design concluding remarks
  51. Synthesizing Benchmarks for Predictive Modeling Deep Learning Program Generator. http://chriscummins.cc/clgen

    http://chriscummins.cc/cgo17 For the paper, code and data: For the TensorFlow neural network: Consist ent * Complete * Well Docume nted * Easy to R euse * * Ev aluated * CGO * Artifact * AEC