Slide 1

Slide 1 text

Machine Learning & Compilers

Slide 2

Slide 2 text

Optimisation heuristics are too course. (aka. it doesn’t take much to do better than -O3)

Slide 3

Slide 3 text

Better than -O3 #!/bin/sh while true; do sort --random-sort < "cflags.txt" | head -n 20 | xargs gcc -O1 app.c time ./a.out done after 200 attempts, ~5% improvement

Slide 4

Slide 4 text

*actual homework

Slide 5

Slide 5 text

200 attempts sounds like a lot… (200 x 10 x 30 s ≈ 16 hrs)

Slide 6

Slide 6 text

ocean Drop in the 250 GCC flags ≥ 2250 options ≈ 1075 Atoms in the universe ≈ 1080

Slide 7

Slide 7 text

Exhaustive search is not really* practical. *in this universe

Slide 8

Slide 8 text

Machine Learning Estimate y = f(x)

Slide 9

Slide 9 text

Machine Learning Estimate y = f(x)

Slide 10

Slide 10 text

Machine Learning Estimate y = f(x) Optimisations Features Cflags # instructions Workgroup size Arithmetic density CPU or GPU Dataset size

Slide 11

Slide 11 text

Use a GPU Machine Learning Estimate y = f(x) Use a CPU The idea.

Slide 12

Slide 12 text

Use a GPU Machine Learning Estimate y = f(x) Use a CPU The reality. (same data, wrong conclusions)

Slide 13

Slide 13 text

Synthesizing Benchmarks for Predictive Modeling

Slide 14

Slide 14 text

Why?
 There aren’t enough benchmarks

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Why?
 More benchmarks = better models

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Why?
 No adequate solution

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

How? Teach an AI to program from GitHub

Slide 21

Slide 21 text

Implementation

Slide 22

Slide 22 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 23

Slide 23 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 24

Slide 24 text

We teach an AI to code by showing it lots of code. Huge repository of public knowledge: And they have an API :-) $ curl https://api.github.com/search/repositories\? q\=opencl\&sort\=stars\&order\=desc { "total_count": 3155, "incomplete_results": false, "items": [ { "id": 7296244, "name": "lwjgl3", "full_name": "LWJGL/lwjgl3",

Slide 25

Slide 25 text

OpenCL is not a first-class language. Search repositories using loose keyword terms. e.g. opencl, nvidia, gpu, cl, amd. Recursively iterate over git trees to get .cl files. /src/guassian.cl #include /include/common.h #include “detail/math.cl” Foo:MyOpenCLRepo (0.6% miss rate)

Slide 26

Slide 26 text

/* Copyright (C) 2004 Joe Bloggs */ // // DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE // TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION // // 0. You just DO WHAT THE FUCK YOU WANT TO. #define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif // CLAMPING } // Do something really flipping cool __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // // int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } x 8078 files 2.8 million lines

Slide 27

Slide 27 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 28

Slide 28 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 29

Slide 29 text

/* Copyright (C) 2004 Joe Bloggs */ // // DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE // TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION // // 0. You just DO WHAT THE FUCK YOU WANT TO. #define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif // CLAMPING } // Do something really flipping cool __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // // int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } }

Slide 30

Slide 30 text

/* Copyright (C) 2004 Joe Bloggs */ // // DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE // TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION // // 0. You just DO WHAT THE FUCK YOU WANT TO. #define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif // CLAMPING } // Do something really flipping cool __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // // int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Is this real, valid OpenCL? Can we minimise non-functional variance?

Slide 31

Slide 31 text

/* Copyright (C) 2004 Joe Bloggs */ // // DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE // TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION // // 0. You just DO WHAT THE FUCK YOU WANT TO. #define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif // CLAMPING } // Do something really flipping cool __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // // int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Is this real, valid OpenCL? Can we minimise non-functional variance? Strip comments

Slide 32

Slide 32 text

/* Copyright (C) 2004 Joe Bloggs */ // // DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE // TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION // // 0. You just DO WHAT THE FUCK YOU WANT TO. #define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif // CLAMPING } // Do something really flipping cool __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { // // int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Is this real, valid OpenCL? Can we minimise non-functional variance? Strip comments

Slide 33

Slide 33 text

#define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments

Slide 34

Slide 34 text

#define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess

Slide 35

Slide 35 text

#define CLAMPING #define THRESHOLD_MIN 1.0f #define THRESHOLD_MAX 1.0f float myclamp(float in) { #ifdef CLAMPING return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < THRESHOLD_MIN ? THRESHOLD_MIN : in; #else return in; #endif } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Strip comments Preprocess

Slide 36

Slide 36 text

Strip comments float myclamp(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Preprocess

Slide 37

Slide 37 text

Strip comments float myclamp(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Preprocess Does it compile? Does it contain instructions?

Slide 38

Slide 38 text

Strip comments float myclamp(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Preprocess Does it compile? Does it contain instructions? Does it compile? Does it contain instructions?

Slide 39

Slide 39 text

Strip comments float myclamp(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Preprocess Does it compile? Does it contain instructions? Rewrite function names

Slide 40

Slide 40 text

Strip comments float myclamp(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void findAllNodesMergedAabb(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = myclamp(in[id]); } } Preprocess Does it compile? Does it contain instructions? Rewrite function names

Slide 41

Slide 41 text

float A(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void B(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = A(in[id]); } } Rewrite function names Strip comments Preprocess Does it compile? Does it contain instructions?

Slide 42

Slide 42 text

float A(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void B(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = A(in[id]); } } Rewrite function names Strip comments Preprocess Does it compile? Does it contain instructions? Rewrite variable names

Slide 43

Slide 43 text

float A(float in) { return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in; } __kernel void B(__global float* in, __global float* out, int num_elems) { int id = get_global_id(0); if (id < num_elems) { out[id] = A(in[id]); } } Rewrite function names Strip comments Preprocess Does it compile? Does it contain instructions? Rewrite variable names

Slide 44

Slide 44 text

float A(float a) { return a > 1.0f ? 1.0f : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* a, __global float* b, int c) { int d = get_global_id(0); if (d < c) { b[d] = A(a[d]); } } Rewrite variable names Rewrite function names Strip comments Preprocess Does it compile? Does it contain instructions?

Slide 45

Slide 45 text

float A(float a) { return a > 1.0f ? 1.0f : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* a, __global float* b, int c) { int d = get_global_id(0); if (d < c) { b[d] = A(a[d]); } } Rewrite variable names Rewrite function names Strip comments Preprocess Does it compile? Does it contain instructions? Enforce code style

Slide 46

Slide 46 text

float A(float a) { return a > 1.0f ? 1.0f : a < 0.0f ? 0.0f : a; } __kernel void B(__global float* a, __global float* b, int c) { int d = get_global_id(0); if (d < c) { b[d] = A(a[d]); } } Rewrite variable names Rewrite function names Strip comments Preprocess Does it compile? Does it contain instructions? Enforce code style

Slide 47

Slide 47 text

float A(float a) { return a > 1.0f ? 1.0f : in < 0.0f ? 0.0f : a; } __kernel void B(__global float* a, __global float* b, int c) { int d = get_global_id(0); if (d < c) { b[d] = A(a[d]); } } Rewrite variable names Rewrite function names Strip comments Preprocess Enforce code style Does it compile? Does it contain instructions?

Slide 48

Slide 48 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 49

Slide 49 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 50

Slide 50 text

Forward Pass Backward Pass Update Weights Network Design 2048 nodes, 3 layers Stochastic Gradient Descent Initial learning rate 0.005 Train for 50 epochs Learning rate decay every 5 epochs

Slide 51

Slide 51 text

Forward Pass Backward Pass Update Weights Network Design 2048 nodes, 3 layers Stochastic Gradient Descent Initial learning rate 0.005 Train for 50 epochs Learning rate decay every 5 epochs

Slide 52

Slide 52 text

Estimate y = f(x) Distribution of characters 1278595 lines of OpenCL Machine Learning

Slide 53

Slide 53 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 54

Slide 54 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 55

Slide 55 text

S = ‘__kernel void A(__global float* a) {’ depth = 1 while depth > 0: c = predict_next_character(S) if c == ‘{’: depth += 1 if c == ‘}’: depth -= 1 S += c return S Kernel Synthesis

Slide 56

Slide 56 text

Demo (you had to be there)

Slide 57

Slide 57 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { int e = get_global_id(0); float f = 0.0; for (int g = 0; g < d; g++) { c[g] = 0.0f; } barrier(1); a[get_global_id(0)] = 2*b[get_global_id(0)]; }

Slide 58

Slide 58 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { int e = get_global_id(0); if (e >= d) { return; } c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4; }

Slide 59

Slide 59 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { unsigned int e = get_global_id(0); float16 f = (float16)(0.0); for (unsigned int g = 0; g < d; g++) { float16 h = a[g]; f.s0 += h.s0; f.s1 += h.s1; /* snip ... */ f.sE += h.sE; f.sF += h.sF; } b[e] = f.s0 + f.s1 + f.s2 + f.s3 + f.s4 + f.s5 + f.s6 + f.s7 + f.s8 + f.s9 + f.sA + f.sB + f.sC + f.sD + f.sE + f.sF; }

Slide 60

Slide 60 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 61

Slide 61 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 62

Slide 62 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { int e = get_global_id(0); if (e >= d) { return; } c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4; }

Slide 63

Slide 63 text

__kernel void A(__global float* a, __global float* b, __global float* c, const int d) { int e = get_global_id(0); if (e >= d) { return; } c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4; }

Slide 64

Slide 64 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 65

Slide 65 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 66

Slide 66 text

_A = random_payload(_A) # generate inputs _B = random_payload(_B) _C = copy(_C) _D = copy(_B) A_ = k(_A) # compute outputs B_ = k(_B) C_ = k(_C) D_ = k(_D) # differential test assert (A_ != _A || B_ != _B) else NO_OUTPUTS assert (A_ != B_ || C_ != D_) else INPUT_INSENSITIVE assert (A_ == C_ && B_ == D_) else NON_DETERMINISTIC

Slide 67

Slide 67 text

CLgen CLdrive Language Corpus GitHub Software Repositories clsmith clsmith Content Files Rejection Filter Search engine Source Normalizer Training parameters Rejection Filter LSTM network Synthesizer Synthesis parameters Argument Extractor Benchmark parameters clsmith clsmith Synthesized Benchmarks Benchmark Driver clsmith clsmith Synthesized Payloads clsmith clsmith Performance Results Dynamic Checker

Slide 68

Slide 68 text

Does it work? (yes)

Slide 69

Slide 69 text

52%

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

7 programs, 1,000 synthetic benchmarks. 1.27x faster

Slide 72

Slide 72 text

71 programs, 1,000 synthetic benchmarks. 2.66x faster

Slide 73

Slide 73 text

Good Things Bad Things Basically* language agnostic. 35 million repos on GitHub. We’re using 0.00004%. Generates 2000 OpenCL benchmarks per machine per day. No support for things declared outside of kernel scope. Undirected almost to a fault. AMD rage.

Slide 74

Slide 74 text

Thanks! More benchmarks = Better models No general way of creating benchmarks I taught a Neural Network to program from GitHub Improves state of the art by 3.38x