Optimization Heuristics
Deep Learning
http://chriscummins.cc/pact17
End-to-end
of
Slide 2
Slide 2 text
Chris Cummins
Lancaster University
University of Edinburgh
University of Edinburgh
University of Edinburgh
Pavlos Petoumenos
Zheng Wang
Hugh Leather
Slide 3
Slide 3 text
compilers are very complex
hand-coded heuristics
of choices
hundreds,
thousands,
millions
}
{
int main(
int argc,
char** arg)
{...
_main:
.cfi_start
proc
## BB#0:
pushq %rbp
...
(out of date by time of release)
Slide 4
Slide 4 text
Machine learning in compilers
y = f(x)
optimization
decision
model
features
(derived from IR)
Slide 5
Slide 5 text
Machine learning in compilers
Training
Programs
Driver
Feature
Extractor
Feature
Vectors
Best
Decisions
Training Data
Training Data
Training Data
Optimization
Heuristic
Slide 6
Slide 6 text
Machine learning in compilers
Training
Programs
Driver
Feature
Extractor
Feature
Vectors
Best
Decisions
Training Data
Training Data
Training Data
Optimization
Heuristic
the human bit!
1. hard to get right
2. time consuming
3. repetitious
Slide 7
Slide 7 text
Use a GPU
Use a CPU
Learned Heuristic
Feature space
Feature “Y”
Feature “X”
Slide 8
Slide 8 text
Use a GPU
Use a CPU
Learned Heuristic
Feature space
Feature “Y”
Feature “X”
need good
features!
Slide 9
Slide 9 text
irrelevant
e.g. not capturing the right
information
e.g. missing critical
information
incomplete
Ways to fail
unsuitable
e.g. wrong combination of
features / model
Slide 10
Slide 10 text
What we have
Training
Programs
Driver
Feature
Extractor
Feature
Vectors
Best
Decisions
Training Data
Training Data
Training Data
Predictive
Model
Slide 11
Slide 11 text
Training
Programs
Driver Best
Decisions
Training Data
Training Data
Training Data
Predictive
Model
What we need
Slide 12
Slide 12 text
Heuristics without features
Beats expert approach
Learning across heuristics
Contributions
Slide 13
Slide 13 text
int
main(int
argc, char
**argv)
{ ...
Our approach
Deep
Learning
Optimization
Decision
Program
Code
Slide 14
Slide 14 text
int
main(int
argc, char
**argv)
{ ...
Our approach
Deep
Learning
Optimization
Decision
Program
Code
{
preprocessing
Rewriter
Encoder
Code in
encode as sequence of vocabulary indices
Vocabulary table for characters +
lang keywords
normalize identifiers & code style
1.var/fun names: ‘foo’, ‘bar’, … to ‘a’, ‘b’, …
2.sanitize whitespace
3.consistent use of optional braces
Slide 15
Slide 15 text
Our approach
Deep
Learning
Optimization
Decision
Program
Code
Rewriter
Encoder
Embedding Heuristic
Model
Language
Model
Rewriter
Encoder
Code in
map vocab indices
into real space
summarize sequence as vector
(2 layer LSTM network)
predict optimization
on vector (2 layer DNN)
Slide 16
Slide 16 text
Our approach
Deep
Learning
Optimization
Decision
Program
Code
Embedding Heuristic
Model
Language
Model
Rewriter
Encoder
Code in
Slide 17
Slide 17 text
How does it work?
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
How does it work?
well
Slide 20
Slide 20 text
Heterogeneous Mapping Thread Coarsening
Prior Art
CGO’13
Grewe et. al
PACT’14
Magni et. al
Slide 21
Slide 21 text
Heterogeneous Mapping Thread Coarsening
Prior Art
Binary
classification
{CPU, GPU}
One-of-six
classification
{1, 2, 4, 8, 16, 32}
CGO’13 PACT’14
Decision Space
Model
Decision Tree Cascading
Neural Networks
Slide 22
Slide 22 text
Heterogeneous Mapping Thread Coarsening
Prior Art
4 features
Combined from 7 raw
values.
Instruction counts / ratios.
7 features
Principle Components of 34
raw values.
Instruction counts / ratios /
relative deltas.
CGO’13 PACT’14
Features
2 papers!
Slide 23
Slide 23 text
Heterogeneous Mapping Thread Coarsening
int
main(int
argc ...
int
main(int
argc ...
Our Approach
1. Use the same model design for both
2. No tweaking of parameters
3. Minimum change - 3 line diff
14% and 5% improvements over state-of-the-art
Speedup
Heterogeneous Mapping
2.38x
2.09x
Speedup
Thread Coarsening
1.06x
1.01x
State-of-the-art DeepTune w. Transfer Learning
Slide 27
Slide 27 text
14% and 5% improvements over state-of-the-art
Speedup
Heterogeneous Mapping
2.38x
2.09x
Speedup
Thread Coarsening
1.06x
1.01x
State-of-the-art DeepTune w. Transfer Learning
256 benchmarks
17 benchmarks
Slide 28
Slide 28 text
Heterogeneous Mapping Thread Coarsening
Transfer Learning
Embed-
ding
Heuristic
Model
Language
Model
Embed-
ding
Heuristic
Model
Language
Model
general specialized
Slide 29
Slide 29 text
Heterogeneous Mapping Thread Coarsening
Transfer Learning
Embed-
ding
Heuristic
Model
Language
Model
Embed-
ding
Heuristic
Model
Language
Model
general specialized
initialize with values
Slide 30
Slide 30 text
14% and 5% improvements over state-of-the-art
Speedup
Heterogeneous Mapping
2.38x
2.09x
Speedup
Thread Coarsening
1.06x
1.01x
State-of-the-art DeepTune w. Transfer Learning
Slide 31
Slide 31 text
Speedup
Heterogeneous Mapping
2.38x
2.09x
Speedup
Thread Coarsening
1.12x
1.06x
1.01x
State-of-the-art DeepTune w. Transfer Learning
14% and 11% improvements over state-of-the-art
Slide 32
Slide 32 text
Try it for yourself!
http://chriscummins.cc/pact17
code and data on GitHub
runs in the browser
Consist
ent * Complete
*
Well Docume
nted * Easyto
Reuse *
* Ev
aluated
*
ACT *
Artifact
*
AEC
P
Slide 33
Slide 33 text
Problem: feature design is hard
Featureless heuristics
First cross-domain learning
11-14% speedups
Deep Learning Optimisation Heuristics
End-to-end
of
http://chriscummins.cc/pact17