End-to-end Deep Learning of Optimization Heuristics (PACT'17)

0d2bf4a55ccf5ca212897c2c09e18c94?s=47 Chris Cummins
September 12, 2017

End-to-end Deep Learning of Optimization Heuristics (PACT'17)

Paper: https://github.com/ChrisCummins/paper-end2end-dl

Accurate automatic optimization heuristics are necessary for dealing with the complexity and diversity of modern hardware and software. Machine learning is a proven technique for learning such heuristics, but its success is bound by the quality of the features used. These features must be hand crafted by developers through a combination of expert domain knowledge and trial and error. This makes the quality of the final model directly dependent on the skill and available time of the system architect.

Our work introduces a better way for building heuristics. We develop a deep neural network that learns heuristics over raw code, entirely without using code features. The neural network simultaneously constructs appropriate representations of the code and learns how best to optimize, removing the need for manual feature creation. Further, we show that our neural nets can transfer learning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.

We compare the effectiveness of our automatically generated heuristics against ones with features hand-picked by experts. We examine two challenging tasks: predicting optimal mapping for heterogeneous parallelism and GPU thread coarsening factors. In 89% of the cases, the quality of our fully automatic heuristics matches or surpasses that of state-of-the-art predictive models using hand-crafted features, providing on average 14% and 12% more performance with no human effort expended on designing features.

0d2bf4a55ccf5ca212897c2c09e18c94?s=128

Chris Cummins

September 12, 2017
Tweet

Transcript

  1. Optimization Heuristics Deep Learning http://chriscummins.cc/pact17 End-to-end of

  2. Chris Cummins Lancaster University University of Edinburgh University of Edinburgh

    University of Edinburgh Pavlos Petoumenos Zheng Wang Hugh Leather
  3. compilers are very complex hand-coded heuristics of choices hundreds, thousands,

    millions } { int main( int argc, char** arg) {... _main: .cfi_start proc ## BB#0: pushq %rbp ... (out of date by time of release)
  4. Machine learning in compilers y = f(x) optimization decision model

    features (derived from IR)
  5. Machine learning in compilers Training Programs Driver Feature Extractor Feature

    Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic
  6. Machine learning in compilers Training Programs Driver Feature Extractor Feature

    Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic the human bit! 1. hard to get right 2. time consuming 3. repetitious
  7. Use a GPU Use a CPU Learned Heuristic Feature space

    Feature “Y” Feature “X”
  8. Use a GPU Use a CPU Learned Heuristic Feature space

    Feature “Y” Feature “X” need good features!
  9. irrelevant e.g. not capturing the right information e.g. missing critical

    information incomplete Ways to fail unsuitable e.g. wrong combination of features / model
  10. What we have Training Programs Driver Feature Extractor Feature Vectors

    Best Decisions Training Data Training Data Training Data Predictive Model
  11. Training Programs Driver Best Decisions Training Data Training Data Training

    Data Predictive Model What we need
  12. Heuristics without features Beats expert approach Learning across heuristics Contributions

  13. int main(int argc, char **argv) { ... Our approach Deep

    Learning Optimization Decision Program Code
  14. int main(int argc, char **argv) { ... Our approach Deep

    Learning Optimization Decision Program Code { preprocessing Rewriter Encoder Code in encode as sequence of vocabulary indices Vocabulary table for characters + lang keywords normalize identifiers & code style 1.var/fun names: ‘foo’, ‘bar’, … to ‘a’, ‘b’, … 2.sanitize whitespace 3.consistent use of optional braces
  15. Our approach Deep Learning Optimization Decision Program Code Rewriter Encoder

    Embedding Heuristic Model Language Model Rewriter Encoder Code in map vocab indices into real space summarize sequence as vector (2 layer LSTM network) predict optimization on vector (2 layer DNN)
  16. Our approach Deep Learning Optimization Decision Program Code Embedding Heuristic

    Model Language Model Rewriter Encoder Code in
  17. How does it work?

  18. None
  19. How does it work? well

  20. Heterogeneous Mapping Thread Coarsening Prior Art CGO’13 Grewe et. al

    PACT’14 Magni et. al
  21. Heterogeneous Mapping Thread Coarsening Prior Art Binary classification {CPU, GPU}

    One-of-six classification {1, 2, 4, 8, 16, 32} CGO’13 PACT’14 Decision Space Model Decision Tree Cascading Neural Networks
  22. Heterogeneous Mapping Thread Coarsening Prior Art 4 features Combined from

    7 raw values. Instruction counts / ratios. 7 features Principle Components of 34 raw values. Instruction counts / ratios / relative deltas. CGO’13 PACT’14 Features 2 papers!
  23. Heterogeneous Mapping Thread Coarsening int main(int argc ... int main(int

    argc ... Our Approach 1. Use the same model design for both 2. No tweaking of parameters 3. Minimum change - 3 line diff
  24. Heterogeneous Mapping Thread Coarsening Prior Art 2x CPU-GPU architectures 4x

    GPU architectures CGO’13 PACT’14 Hardware Training Programs 7 Benchmark Suites 3 Benchmark Suites
  25. results

  26. 14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

    2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning
  27. 14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

    2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning 256 benchmarks 17 benchmarks
  28. Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model

    Language Model Embed- ding Heuristic Model Language Model general specialized
  29. Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model

    Language Model Embed- ding Heuristic Model Language Model general specialized initialize with values
  30. 14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

    2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning
  31. Speedup Heterogeneous Mapping 2.38x 2.09x Speedup Thread Coarsening 1.12x 1.06x

    1.01x State-of-the-art DeepTune w. Transfer Learning 14% and 11% improvements over state-of-the-art
  32. Try it for yourself! http://chriscummins.cc/pact17 code and data on GitHub

    runs in the browser Consist ent * Complete * Well Docume nted * Easyto Reuse * * Ev aluated * ACT * Artifact * AEC P
  33. Problem: feature design is hard Featureless heuristics First cross-domain learning

    11-14% speedups Deep Learning Optimisation Heuristics End-to-end of http://chriscummins.cc/pact17