Synthesizing Benchmarks for Predictive Modelling (CGO'17)

Slide 1

Slide 1 text

Synthesizing Benchmarks for Predictive Modeling http://chriscummins.cc/cgo17

Slide 2

Slide 2 text

Chris Cummins Pavlos Petoumenos Zheng Wang Hugh Leather Lancaster University University of Edinburgh University of Edinburgh University of Edinburgh

Slide 3

Slide 3 text

y = f(x) Optimisations Features Cflags # instructions Workgroup size Arithmetic density CPU or GPU Dataset size machine learning for compilers

Slide 4

Slide 4 text

Use a GPU Use a CPU Target decision boundary machine learning for compilers The idea.

Slide 5

Slide 5 text

machine learning for compilers Use a GPU Use a CPU Learned decision boundary Target decision boundary The reality.

Slide 6

Slide 6 text

machine learning for compilers Use a GPU Use a CPU Target decision boundary Learned decision boundary sparse data leads to inaccurate models!

Slide 7

Slide 7 text

1. there aren’t enough benchmarks problem statement avg compiler paper 17 Iris dataset 150 10× MNIST dataset 60,000 103× ImageNet dataset 10,000,000 106×

Slide 8

Slide 8 text

1. there aren’t enough benchmarks 2. more benchmarks = better models problem statement

Slide 9

Slide 9 text

from this to this what we need

Slide 10

Slide 10 text

mine code from web our approach

Slide 11

Slide 11 text

model source distr. our approach

Slide 12

Slide 12 text

sample lang. model our approach

Slide 13

Slide 13 text

1. a deep learning approach to modeling PL semantics & usage 2. ﬁrst solution for general purpose benchmark synthesis 3. automatic 1.27× speedup 4. improved model designs for further 4.30× speedup contributions

Slide 14

Slide 14 text

CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

/* Copyright (C) 2014, Joe Blogs. */ #define CLAMPING #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif // CLAMPING } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // Do something really flipping cool int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } }

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

#define CLAMPING #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in; #else return in; #endif} __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess

Slide 21

Slide 21 text

float myclamp(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions?

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

float A(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void B(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = A(in[id]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names

Slide 25

Slide 25 text

float A(float a) { return a > 1.0f ? 1.0f : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* b, __global float* c, int d) { int e = get_global_id(0); if (e < d) { c[e] = A(b[e]); } } Strip comments Preprocess Does it compile? Does it contain instructions? (33% discard rate) Rewrite function names Rewrite variable names Enforce code style

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

2048 node, 3 layer LSTM 96 char y = f(x) next char char array character level language model

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

string S = ‘__kernel void A(__global float* a) {’ int depth = 1 while depth > 0: char c = predict_next_character(S) if c == ‘{’: depth += 1 if c == ‘}’: depth -= 1 S += c return S kernel synthesis

Slide 33

Slide 33 text

$ clgen model.json sampler.json { “corpus": { “path": “~/kernels” }, "model_type": "lstm", "rnn_size": 2048, "num_layers": 3, "max_epochs": 50 } { "kernels": { "args": [ "__global float*", "__global float*", "const int" ] }, "sampler": { "max_kernels": 1000 } } Deep Learning Program Generator.

Slide 34

Slide 34 text

DEMO (you had to be there)

Slide 35

Slide 35 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { int e = get_global_id(0); float f = 0.0; for (int g = 0; g < d; g++) { c[g] = 0.0f; } barrier(1); a[get_global_id(0)] = 2 * b[get_global_id(0)]; }

Slide 36

Slide 36 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { int e = get_global_id(0); if (e >= d) { return; } c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4; }

Slide 37

Slide 37 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { unsigned int e = get_global_id(0); float16 f = (float16)(0.0); for (unsigned int g = 0; g < d; g++) { float16 h = a[g]; f.s0 += h.s0; f.s1 += h.s1; /* snip ... */ f.sE += h.sE; f.sF += h.sF; } b[e] = f.s0 + f.s1 + f.s2 + f.s3 + f.s4 + f.s5 + f.s6 + f.s7 + f.s8 + f.s9 + f.sA + f.sB + f.sC + f.sD + f.sE + f.sF; }

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Slide 41

Slide 41 text

__kernel void A(__global float* a, __global float* b, __global float* c, const float d, const int e) { int f = get_global_id(0); if (f >= e) { return; } c[f] = a[f] + b[f] + 2 * c[f] + d + 4; } Payload for size S: rand() * [S] __global float* a rand() * [S] __global float* b rand() * [S] __global float* c rand() const float d S const int e

Slide 42

Slide 42 text

Slide 43

Slide 43 text

Slide 44

Slide 44 text

A B C D A A else no outputs ≠ Verify: B A = else non deterministic A ≠ else input insensitive C Generate Inputs A B C D f(x) f(x) f(x) f(x) Compute Outputs

Slide 45

Slide 45 text

2000 benchmarks per machine per day CLgen Host Driver Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Code Rewriter Model parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 46

Slide 46 text

results

Slide 47

Slide 47 text

52% blind test http://humanorrobot.uk

Slide 48

Slide 48 text

7 benchmarks, 1,000 synthetic benchmarks. 1.27x faster Speedup 1 1.6 2.2 2.8 3.4 AMD NVIDIA 3.26 1.57 2.5 1.26 Grewe et. al w. CLgen

Slide 49

Slide 49 text

71 benchmarks, 1,000 synthetic benchmarks. 4.30x faster Speedup 1 2 3 4 5 AMD NVIDIA 5.04 3.56 3.26 1.57 2.5 1.26 Grewe et. al w. CLgen w. Better Features

Slide 50

Slide 50 text

problem: insufﬁcient benchmarks use DL to learn PL semantics and usage turing tested! ;-) improved model performance and design concluding remarks

Slide 51

Slide 51 text

Synthesizing Benchmarks for Predictive Modeling Deep Learning Program Generator. http://chriscummins.cc/clgen http://chriscummins.cc/cgo17 For the paper, code and data: For the TensorFlow neural network: Consist ent * Complete * Well Docume nted * Easy to R euse * * Ev aluated * CGO * Artifact * AEC