Upgrade to Pro — share decks privately, control downloads, hide ads and more …

End-to-end Deep Learning of Optimization Heuristics (PACT'17)

Chris Cummins
September 12, 2017

End-to-end Deep Learning of Optimization Heuristics (PACT'17)

Paper: https://github.com/ChrisCummins/paper-end2end-dl

Accurate automatic optimization heuristics are necessary for dealing with the complexity and diversity of modern hardware and software. Machine learning is a proven technique for learning such heuristics, but its success is bound by the quality of the features used. These features must be hand crafted by developers through a combination of expert domain knowledge and trial and error. This makes the quality of the final model directly dependent on the skill and available time of the system architect.

Our work introduces a better way for building heuristics. We develop a deep neural network that learns heuristics over raw code, entirely without using code features. The neural network simultaneously constructs appropriate representations of the code and learns how best to optimize, removing the need for manual feature creation. Further, we show that our neural nets can transfer learning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.

We compare the effectiveness of our automatically generated heuristics against ones with features hand-picked by experts. We examine two challenging tasks: predicting optimal mapping for heterogeneous parallelism and GPU thread coarsening factors. In 89% of the cases, the quality of our fully automatic heuristics matches or surpasses that of state-of-the-art predictive models using hand-crafted features, providing on average 14% and 12% more performance with no human effort expended on designing features.

Chris Cummins

September 12, 2017
Tweet

More Decks by Chris Cummins

Other Decks in Science

Transcript

  1. Chris Cummins Lancaster University University of Edinburgh University of Edinburgh

    University of Edinburgh Pavlos Petoumenos Zheng Wang Hugh Leather
  2. compilers are very complex hand-coded heuristics of choices hundreds, thousands,

    millions } { int main( int argc, char** arg) {... _main: .cfi_start proc ## BB#0: pushq %rbp ... (out of date by time of release)
  3. Machine learning in compilers Training Programs Driver Feature Extractor Feature

    Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic
  4. Machine learning in compilers Training Programs Driver Feature Extractor Feature

    Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic the human bit! 1. hard to get right 2. time consuming 3. repetitious
  5. Use a GPU Use a CPU Learned Heuristic Feature space

    Feature “Y” Feature “X”
  6. Use a GPU Use a CPU Learned Heuristic Feature space

    Feature “Y” Feature “X” need good features!
  7. irrelevant e.g. not capturing the right information e.g. missing critical

    information incomplete Ways to fail unsuitable e.g. wrong combination of features / model
  8. What we have Training Programs Driver Feature Extractor Feature Vectors

    Best Decisions Training Data Training Data Training Data Predictive Model
  9. int main(int argc, char **argv) { ... Our approach Deep

    Learning Optimization Decision Program Code
  10. int main(int argc, char **argv) { ... Our approach Deep

    Learning Optimization Decision Program Code { preprocessing Rewriter Encoder Code in encode as sequence of vocabulary indices Vocabulary table for characters + lang keywords normalize identifiers & code style 1.var/fun names: ‘foo’, ‘bar’, … to ‘a’, ‘b’, … 2.sanitize whitespace 3.consistent use of optional braces
  11. Our approach Deep Learning Optimization Decision Program Code Rewriter Encoder

    Embedding Heuristic Model Language Model Rewriter Encoder Code in map vocab indices into real space summarize sequence as vector (2 layer LSTM network) predict optimization on vector (2 layer DNN)
  12. Heterogeneous Mapping Thread Coarsening Prior Art Binary classification {CPU, GPU}

    One-of-six classification {1, 2, 4, 8, 16, 32} CGO’13 PACT’14 Decision Space Model Decision Tree Cascading Neural Networks
  13. Heterogeneous Mapping Thread Coarsening Prior Art 4 features Combined from

    7 raw values. Instruction counts / ratios. 7 features Principle Components of 34 raw values. Instruction counts / ratios / relative deltas. CGO’13 PACT’14 Features 2 papers!
  14. Heterogeneous Mapping Thread Coarsening int main(int argc ... int main(int

    argc ... Our Approach 1. Use the same model design for both 2. No tweaking of parameters 3. Minimum change - 3 line diff
  15. Heterogeneous Mapping Thread Coarsening Prior Art 2x CPU-GPU architectures 4x

    GPU architectures CGO’13 PACT’14 Hardware Training Programs 7 Benchmark Suites 3 Benchmark Suites
  16. 14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

    2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning
  17. 14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

    2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning 256 benchmarks 17 benchmarks
  18. Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model

    Language Model Embed- ding Heuristic Model Language Model general specialized
  19. Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model

    Language Model Embed- ding Heuristic Model Language Model general specialized initialize with values
  20. 14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

    2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning
  21. Speedup Heterogeneous Mapping 2.38x 2.09x Speedup Thread Coarsening 1.12x 1.06x

    1.01x State-of-the-art DeepTune w. Transfer Learning 14% and 11% improvements over state-of-the-art
  22. Try it for yourself! http://chriscummins.cc/pact17 code and data on GitHub

    runs in the browser Consist ent * Complete * Well Docume nted * Easyto Reuse * * Ev aluated * ACT * Artifact * AEC P
  23. Problem: feature design is hard Featureless heuristics First cross-domain learning

    11-14% speedups Deep Learning Optimisation Heuristics End-to-end of http://chriscummins.cc/pact17