$30 off During Our Annual Pro Sale. View Details »

Synthesizing Benchmarks for Predictive Modelling (CGO'17)

Chris Cummins
February 06, 2017

Synthesizing Benchmarks for Predictive Modelling (CGO'17)

Paper: https://github.com/ChrisCummins/paper-synthesizing-benchmarks

Predictive modelling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the quality of learned models, as they have very sparse training data for what are often high-dimensional feature spaces. What is needed is a way to generate an unbounded number of training programs that finely cover the feature space. At the same time the generated programs must be similar to the types of programs that human developers actually write, otherwise the learning will target the wrong parts of the feature space.

We mine open source repositories for program fragments and apply deep learning techniques to automatically construct models for how humans write programs. We sample these models to generate an unbounded number of runnable training programs. The quality of the programs is such that even human developers struggle to distinguish our generated programs from hand-written code.

We use our generator for OpenCL programs, CLgen, to automatically synthesize thousands of programs and show that learning over these improves the performance of a state of the art predictive model by 1.27×. In addition, the fine covering of the feature space automatically exposes weaknesses in the feature design which are invisible with the sparse training examples from existing benchmark suites. Correcting these weaknesses further increases performance by 4.30×.

Chris Cummins

February 06, 2017
Tweet

More Decks by Chris Cummins

Other Decks in Science

Transcript

  1. Synthesizing
    Benchmarks for
    Predictive
    Modeling
    http://chriscummins.cc/cgo17

    View Slide

  2. Chris Cummins
    Pavlos Petoumenos
    Zheng Wang
    Hugh Leather
    Lancaster University
    University of Edinburgh
    University of Edinburgh
    University of Edinburgh

    View Slide

  3. y = f(x)
    Optimisations Features
    Cflags
    # instructions
    Workgroup size Arithmetic density
    CPU or GPU
    Dataset size
    machine learning for compilers

    View Slide

  4. Use a GPU
    Use a CPU
    Target decision
    boundary
    machine learning for compilers
    The idea.

    View Slide

  5. machine learning for compilers
    Use a GPU
    Use a CPU
    Learned decision
    boundary
    Target decision
    boundary
    The reality.

    View Slide

  6. machine learning for compilers
    Use a GPU
    Use a CPU
    Target decision
    boundary
    Learned decision
    boundary
    sparse data leads to inaccurate models!

    View Slide

  7. 1. there aren’t enough benchmarks
    problem statement
    avg compiler paper 17
    Iris dataset 150 10×
    MNIST dataset 60,000 103×
    ImageNet dataset 10,000,000 106×

    View Slide

  8. 1. there aren’t enough benchmarks
    2. more benchmarks = better models
    problem statement

    View Slide

  9. from this to this
    what we need

    View Slide

  10. mine code from web
    our approach

    View Slide

  11. model source distr.
    our approach

    View Slide

  12. sample lang. model
    our approach

    View Slide

  13. 1. a deep learning approach to modeling PL
    semantics & usage
    2. first solution for general purpose
    benchmark synthesis
    3. automatic 1.27× speedup
    4. improved model designs for
    further 4.30× speedup
    contributions

    View Slide

  14. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  15. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker
    8078
    files
    2.8M
    lines

    View Slide

  16. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  17. /* Copyright (C) 2014, Joe Blogs. */
    #define CLAMPING
    #define THRESHOLD_MAX 1.0f
    float myclamp(float in) {
    #ifdef CLAMPING
    return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in;
    #else
    return in;
    #endif // CLAMPING
    }
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    // Do something really flipping cool
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }

    View Slide

  18. /* Copyright (C) 2014, Joe Blogs. */
    #define CLAMPING
    #define THRESHOLD_MAX 1.0f
    float myclamp(float in) {
    #ifdef CLAMPING
    return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in;
    #else
    return in;
    #endif // CLAMPING
    }
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    // Do something really flipping cool
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }
    Is this real, valid OpenCL?
    Can we minimise non-functional variance?

    View Slide

  19. /* Copyright (C) 2014, Joe Blogs. */
    #define CLAMPING
    #define THRESHOLD_MAX 1.0f
    float myclamp(float in) {
    #ifdef CLAMPING
    return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in;
    #else
    return in;
    #endif // CLAMPING
    }
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    // Do something really flipping cool
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }
    Strip comments

    View Slide

  20. #define CLAMPING
    #define THRESHOLD_MAX 1.0f
    float myclamp(float in) {
    #ifdef CLAMPING
    return in > THRESHOLD_MAX ? THRESHOLD_MAX : in < 0.0f ? 0.0f : in;
    #else
    return in;
    #endif}
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }
    Strip comments
    Preprocess

    View Slide

  21. float myclamp(float in) {
    return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in;
    }
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }
    Strip comments
    Preprocess
    Does it compile?
    Does it contain instructions?

    View Slide

  22. float myclamp(float in) {
    return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in;
    }
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }
    Strip comments
    Preprocess
    Does it compile?
    Does it contain instructions?
    (33% discard rate)

    View Slide

  23. float myclamp(float in) {
    return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in;
    }
    __kernel void findAllNodesMergedAabb(__global float* in, __global float* out,
    int num_elems)
    {
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = myclamp(in[id]);
    }
    }
    Strip comments
    Preprocess
    Does it compile?
    Does it contain instructions?
    (33% discard rate)
    Rewrite function names

    View Slide

  24. float A(float in) {
    return in > 1.0f ? 1.0f : in < 0.0f ? 0.0f : in;
    }
    __kernel void B(__global float* in, __global float* out,
    int num_elems)
    {
    int id = get_global_id(0);
    if (id < num_elems)
    {
    out[id] = A(in[id]);
    }
    }
    Strip comments
    Preprocess
    Does it compile?
    Does it contain instructions?
    (33% discard rate)
    Rewrite function names
    Rewrite variable names

    View Slide

  25. float A(float a) {
    return a > 1.0f ? 1.0f : a < 0.0f ? 0.0f : a;
    }
    __kernel void B(__global float* b, __global float* c,
    int d)
    {
    int e = get_global_id(0);
    if (e < d)
    {
    c[e] = A(b[e]);
    }
    }
    Strip comments
    Preprocess
    Does it compile?
    Does it contain instructions?
    (33% discard rate)
    Rewrite function names
    Rewrite variable names
    Enforce code style

    View Slide

  26. float A(float a) {
    return a > 1.0f ? 1.0f : a < 0.0f ? 0.0f : a;
    }
    __kernel void B(__global float* b, __global float* c, int d) {
    int e = get_global_id(0);
    if (e < d) {
    c[e] = A(b[e]);
    }
    }
    Strip comments
    Preprocess
    Does it compile?
    Does it contain instructions?
    (33% discard rate)
    Rewrite function names
    Rewrite variable names
    Enforce code style

    View Slide

  27. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  28. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  29. 2048 node, 3 layer LSTM
    96 char
    y = f(x)
    next char char array
    character level language model

    View Slide

  30. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  31. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  32. string S = ‘__kernel void A(__global float* a) {’
    int depth = 1
    while depth > 0:
    char c = predict_next_character(S)
    if c == ‘{’:
    depth += 1
    if c == ‘}’:
    depth -= 1
    S += c
    return S
    kernel synthesis

    View Slide

  33. $ clgen model.json sampler.json
    {
    “corpus": {
    “path": “~/kernels”
    },
    "model_type": "lstm",
    "rnn_size": 2048,
    "num_layers": 3,
    "max_epochs": 50
    }
    {
    "kernels": {
    "args": [
    "__global float*",
    "__global float*",
    "const int"
    ]
    },
    "sampler": {
    "max_kernels": 1000
    }
    }
    Deep Learning Program Generator.

    View Slide

  34. DEMO
    (you had to be there)

    View Slide

  35. __kernel void A(__global float* a,
    __global float* b,
    __global float* c,
    const int d) {
    int e = get_global_id(0);
    float f = 0.0;
    for (int g = 0; g < d; g++) {
    c[g] = 0.0f;
    }
    barrier(1);
    a[get_global_id(0)] = 2 * b[get_global_id(0)];
    }

    View Slide

  36. __kernel void A(__global float* a,
    __global float* b,
    __global float* c,
    const int d) {
    int e = get_global_id(0);
    if (e >= d) {
    return;
    }
    c[e] = a[e] + b[e] + 2 * a[e] + b[e] + 4;
    }

    View Slide

  37. __kernel void A(__global float* a,
    __global float* b,
    __global float* c,
    const int d) {
    unsigned int e = get_global_id(0);
    float16 f = (float16)(0.0);
    for (unsigned int g = 0; g < d; g++) {
    float16 h = a[g];
    f.s0 += h.s0;
    f.s1 += h.s1;
    /* snip ... */
    f.sE += h.sE;
    f.sF += h.sF;
    }
    b[e] = f.s0 + f.s1 + f.s2 + f.s3 + f.s4 +
    f.s5 + f.s6 + f.s7 + f.s8 + f.s9 + f.sA +
    f.sB + f.sC + f.sD + f.sE + f.sF;
    }

    View Slide

  38. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  39. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker
    =

    View Slide

  40. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  41. __kernel void A(__global float* a,
    __global float* b,
    __global float* c,
    const float d,
    const int e) {
    int f = get_global_id(0);
    if (f >= e) {
    return;
    }
    c[f] = a[f] + b[f] + 2 * c[f] + d + 4;
    }
    Payload for size S:
    rand() * [S]
    __global float* a
    rand() * [S]
    __global float* b
    rand() * [S]
    __global float* c
    rand()
    const float d
    S
    const int e

    View Slide

  42. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  43. CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  44. A
    B
    C
    D
    A
    A else no outputs

    Verify:
    B
    A = else non deterministic
    A ≠ else input insensitive
    C
    Generate
    Inputs
    A
    B
    C
    D
    f(x)
    f(x)
    f(x)
    f(x)
    Compute
    Outputs

    View Slide

  45. 2000 benchmarks
    per machine
    per day
    CLgen
    Host Driver
    Language
    Corpus
    GitHub
    Software
    Repositories
    clsmith
    clsmith
    Content Files Rejection
    Filter
    Search
    engine
    Code
    Rewriter
    Model
    parameters
    Rejection
    Filter
    LSTM network
    Synthesizer
    Synthesis
    parameters
    Argument
    Extractor
    Benchmark
    parameters
    clsmith
    clsmith
    Synthesized
    Benchmarks
    Benchmark
    Driver
    clsmith
    clsmith
    Synthesized
    Payloads
    clsmith
    clsmith
    Performance
    Results
    Dynamic
    Checker

    View Slide

  46. results

    View Slide

  47. 52%
    blind test
    http://humanorrobot.uk

    View Slide

  48. 7 benchmarks, 1,000 synthetic benchmarks. 1.27x faster
    Speedup
    1
    1.6
    2.2
    2.8
    3.4
    AMD NVIDIA
    3.26
    1.57
    2.5
    1.26
    Grewe et. al w. CLgen

    View Slide

  49. 71 benchmarks, 1,000 synthetic benchmarks. 4.30x faster
    Speedup
    1
    2
    3
    4
    5
    AMD NVIDIA
    5.04
    3.56
    3.26
    1.57
    2.5
    1.26
    Grewe et. al w. CLgen w. Better Features

    View Slide

  50. problem: insufficient benchmarks
    use DL to learn PL semantics and usage
    turing tested! ;-)
    improved model performance and
    design
    concluding remarks

    View Slide

  51. Synthesizing Benchmarks for
    Predictive Modeling
    Deep Learning Program Generator.
    http://chriscummins.cc/clgen
    http://chriscummins.cc/cgo17
    For the paper, code and data:
    For the TensorFlow neural network:
    Consist
    ent * Complete
    *
    Well Docume
    nted * Easy to
    R
    euse *
    * Ev
    aluated
    *
    CGO *
    Artifact
    *
    AEC

    View Slide