End-to-end Deep Learning of Optimization Heuristics (PACT'17)

Optimization Heuristics Deep Learning http://chriscummins.cc/pact17 End-to-end of

Chris Cummins Lancaster University University of Edinburgh University of Edinburgh
University of Edinburgh Pavlos Petoumenos Zheng Wang Hugh Leather

compilers are very complex hand-coded heuristics of choices hundreds, thousands,
millions } { int main( int argc, char** arg) {... _main: .cfi_start proc ## BB#0: pushq %rbp ... (out of date by time of release)

Machine learning in compilers y = f(x) optimization decision model
features (derived from IR)

Machine learning in compilers Training Programs Driver Feature Extractor Feature
Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic

Machine learning in compilers Training Programs Driver Feature Extractor Feature
Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic the human bit! 1. hard to get right 2. time consuming 3. repetitious

Use a GPU Use a CPU Learned Heuristic Feature space
Feature “Y” Feature “X”

Use a GPU Use a CPU Learned Heuristic Feature space
Feature “Y” Feature “X” need good features!

irrelevant e.g. not capturing the right information e.g. missing critical
information incomplete Ways to fail unsuitable e.g. wrong combination of features / model

What we have Training Programs Driver Feature Extractor Feature Vectors
Best Decisions Training Data Training Data Training Data Predictive Model

Training Programs Driver Best Decisions Training Data Training Data Training
Data Predictive Model What we need

Heuristics without features Beats expert approach Learning across heuristics Contributions

int main(int argc, char **argv) { ... Our approach Deep
Learning Optimization Decision Program Code

int main(int argc, char **argv) { ... Our approach Deep
Learning Optimization Decision Program Code { preprocessing Rewriter Encoder Code in encode as sequence of vocabulary indices Vocabulary table for characters + lang keywords normalize identifiers & code style 1.var/fun names: ‘foo’, ‘bar’, … to ‘a’, ‘b’, … 2.sanitize whitespace 3.consistent use of optional braces

Our approach Deep Learning Optimization Decision Program Code Rewriter Encoder
Embedding Heuristic Model Language Model Rewriter Encoder Code in map vocab indices into real space summarize sequence as vector (2 layer LSTM network) predict optimization on vector (2 layer DNN)

Our approach Deep Learning Optimization Decision Program Code Embedding Heuristic
Model Language Model Rewriter Encoder Code in

How does it work?

How does it work? well

Heterogeneous Mapping Thread Coarsening Prior Art CGO’13 Grewe et. al
PACT’14 Magni et. al

Heterogeneous Mapping Thread Coarsening Prior Art Binary classiﬁcation {CPU, GPU}
One-of-six classiﬁcation {1, 2, 4, 8, 16, 32} CGO’13 PACT’14 Decision Space Model Decision Tree Cascading Neural Networks

Heterogeneous Mapping Thread Coarsening Prior Art 4 features Combined from
7 raw values. Instruction counts / ratios. 7 features Principle Components of 34 raw values. Instruction counts / ratios / relative deltas. CGO’13 PACT’14 Features 2 papers!

Heterogeneous Mapping Thread Coarsening int main(int argc ... int main(int
argc ... Our Approach 1. Use the same model design for both 2. No tweaking of parameters 3. Minimum change - 3 line diff

Heterogeneous Mapping Thread Coarsening Prior Art 2x CPU-GPU architectures 4x
GPU architectures CGO’13 PACT’14 Hardware Training Programs 7 Benchmark Suites 3 Benchmark Suites

results

14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x
2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning

2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning 256 benchmarks 17 benchmarks

Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model
Language Model Embed- ding Heuristic Model Language Model general specialized

Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model
Language Model Embed- ding Heuristic Model Language Model general specialized initialize with values

2.09x Speedup Thread Coarsening 1.06x 1.01x State-of-the-art DeepTune w. Transfer Learning

Speedup Heterogeneous Mapping 2.38x 2.09x Speedup Thread Coarsening 1.12x 1.06x
1.01x State-of-the-art DeepTune w. Transfer Learning 14% and 11% improvements over state-of-the-art

Try it for yourself! http://chriscummins.cc/pact17 code and data on GitHub
runs in the browser Consist ent * Complete * Well Docume nted * Easyto Reuse * * Ev aluated * ACT * Artifact * AEC P

Problem: feature design is hard Featureless heuristics First cross-domain learning
11-14% speedups Deep Learning Optimisation Heuristics End-to-end of http://chriscummins.cc/pact17

End-to-end Deep Learning of Optimization Heuris...

End-to-end Deep Learning of Optimization Heuristics (PACT'17)

Chris Cummins

More Decks by Chris Cummins

Other Decks in Science

Featured

Transcript

Optimization Heuristics Deep Learning http://chriscummins.cc/pact17 End-to-end of

Chris Cummins Lancaster University University of Edinburgh University of Edinburgh

compilers are very complex hand-coded heuristics of choices hundreds, thousands,

Machine learning in compilers y = f(x) optimization decision model

Machine learning in compilers Training Programs Driver Feature Extractor Feature

Machine learning in compilers Training Programs Driver Feature Extractor Feature

Use a GPU Use a CPU Learned Heuristic Feature space

Use a GPU Use a CPU Learned Heuristic Feature space

irrelevant e.g. not capturing the right information e.g. missing critical

What we have Training Programs Driver Feature Extractor Feature Vectors

Training Programs Driver Best Decisions Training Data Training Data Training

Heuristics without features Beats expert approach Learning across heuristics Contributions

int main(int argc, char **argv) { ... Our approach Deep

int main(int argc, char **argv) { ... Our approach Deep

Our approach Deep Learning Optimization Decision Program Code Rewriter Encoder

Our approach Deep Learning Optimization Decision Program Code Embedding Heuristic

How does it work?

How does it work? well

Heterogeneous Mapping Thread Coarsening Prior Art CGO’13 Grewe et. al

Heterogeneous Mapping Thread Coarsening Prior Art Binary classiﬁcation {CPU, GPU}

Heterogeneous Mapping Thread Coarsening Prior Art 4 features Combined from

Heterogeneous Mapping Thread Coarsening int main(int argc ... int main(int

Heterogeneous Mapping Thread Coarsening Prior Art 2x CPU-GPU architectures 4x

results

14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model

Heterogeneous Mapping Thread Coarsening Transfer Learning Embed- ding Heuristic Model

14% and 5% improvements over state-of-the-art Speedup Heterogeneous Mapping 2.38x

Speedup Heterogeneous Mapping 2.38x 2.09x Speedup Thread Coarsening 1.12x 1.06x

Try it for yourself! http://chriscummins.cc/pact17 code and data on GitHub

Problem: feature design is hard Featureless heuristics First cross-domain learning