Go inliner: past, present
and the future
Backend
Iskander (Alex) Sharipov
Go contributor, open source enthusiast
DevFest 2018, Novosibirsk, 23-25 November, 2018
Slide 2
Slide 2 text
The only slide
DevFest 2018, Novosibirsk, 23-25 November, 2018
Problem: function calls are slow.
Solution: just inline them.
Fin. Problem solved.
Slide 3
Slide 3 text
The only slide (not really)
DevFest 2018, Novosibirsk, 23-25 November, 2018
Now we have two problems:
1. Overall performance degraded.
2. Binary became larger.
*Inlined everything*
High-level overview
DevFest 2018, Novosibirsk, 23-25 November, 2018
No inlining makes code run slow.
Inlining dichotomy
Too much inlining makes code run slow
and binaries become bigger.
High-level overview
DevFest 2018, Novosibirsk, 23-25 November, 2018
Cost model.
Evaluate the approx “code size” for the function body.
Main parts of the inliner
Decision making.
Tells if function is inlineable in general or on a call site.
Inlining algorithm and trade-offs handling.
How inlining is performed and when it’s not supported.
1)
2)
3)
Slide 11
Slide 11 text
DevFest 2018, Novosibirsk, 23-25 November, 2018
Inliner: the good parts
(Before we start discussing everything else)
Slide 12
Slide 12 text
Inliner: the good parts
DevFest 2018, Novosibirsk, 23-25 November, 2018
Compiler can inline
function calls between
packages. This makes
packages that provide
convenience wrappers
around inlineable
functions almost free.
Slide 13
Slide 13 text
Inliner: the good parts
DevFest 2018, Novosibirsk, 23-25 November, 2018
func canInlineFuncLitCall() int {
add1 := func(x int) int {
return x + 1
}
return add1(10) // Inlined
}
// Optimized to return 11
Slide 14
Slide 14 text
Inliner: the good parts
DevFest 2018, Novosibirsk, 23-25 November, 2018
func canInlineClosureCall() int {
x := 10
add1 := func() int {
return x + 1
}
return add1() // “return 11”
}
Slide 15
Slide 15 text
Inliner: the good parts
DevFest 2018, Novosibirsk, 23-25 November, 2018
func canInlineExplicitPanic() {
panic("can be inlined")
}
// Implicit panics are also
// handled. They can occur in
// slice indexing expressions.
Slide 16
Slide 16 text
Inliner: the good parts
DevFest 2018, Novosibirsk, 23-25 November, 2018
We almost have it...
Mid-stack inlining (inlining of non-leaf calls)
Still causes significant code bloating. But it gets better.
https://golang.org/issue/19348
Slide 17
Slide 17 text
DevFest 2018, Novosibirsk, 23-25 November, 2018
Function “cost” calculation
Slide 18
Slide 18 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
More syntactical elements means
higher cost. Sometimes one can
re-write code so it has lower cost
but operates in the same way.
Cost is a function over syntax
Slide 19
Slide 19 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
Slide 20
Slide 20 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
Slide 21
Slide 21 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
The inliner is very static. If cost is
calculated and it’s out of the
budget, function is never an inlining
candidate.
Static vs dynamic cost
Slide 22
Slide 22 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
Even if function can be further
optimized or even const-folded with
constant arguments, it just won’t
happen.
Static approach disadvantage
Slide 23
Slide 23 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
xs = append(xs, 1) // cost=5
x := 1 // cost=5
// Builtin costs are weird.
//
// Budget of 80 with appends
// leads to >1kb of x86 code!
Slide 24
Slide 24 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
Forwarding function does not have
any execution time costs, but every
inlining increases total cost by
introducing extra AST nodes.
The cost of wrapping
Slide 25
Slide 25 text
Function “cost” calculation
DevFest 2018, Novosibirsk, 23-25 November, 2018
It looks like there are so many things to improve...
Inlining dichotomy
Yet, most of these paths lead to performance
regressions.
https://golang.org/issue/17566
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
Inlining is more significant inside
loops. Constant arguments also
make inlining more eligible.
Hypothesis 1/2
Slide 28
Slide 28 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
Conditional branch that is not
known to be “likely taken” makes
inlining less beneficial.
Hypothesis 2/2
Slide 29
Slide 29 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
Increase maximum inlining budget.
Budget starts from the base and
can be changed through the
program control flow.
Proposed solution
Slide 30
Slide 30 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
func f(x, y int) {
// Does something smart.
}
// Imagine that f has a cost of
// 95. Base budget is 80, so we
// can’t inline it normally.
Slide 31
Slide 31 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
// Base budget is 80.
for _, x := range xs { // 90
for _, y := range ys { // 100
f(x, y) // Can inline!
} // budget is 90 again
} // budget is 80 again
Slide 32
Slide 32 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
for _, x := range xs { // 90
for _, y := range ys { // 100
if x + y < 100 { // 90
f(x, y) // Won’t inline
} // budget is 100 again
} // budget is 90 again
} // budget is 80 again
Slide 33
Slide 33 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
// Constant arguments effect.
var x, y int
f(x, y) // 80, won’t inline
f(10, y) // 90, won’t inline
f(10, 20) // 100, can inline!
Slide 34
Slide 34 text
“Extra budget” experiment
DevFest 2018, Novosibirsk, 23-25 November, 2018
Pros: simple to implement, doesn’t make
compilation slower, avoids code bloating.
Pros & Cons
The real benefits, apart from the
microbenchmarks, are not apparent.
It doesn’t seem to work.
Slide 35
Slide 35 text
DevFest 2018, Novosibirsk, 23-25 November, 2018
Recent improvements
(The present)
Slide 36
Slide 36 text
Recent improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
var total, i int
loop:
if i == len(xs) {
return total
}
total += xs[i]
i++
goto loop
Slide 37
Slide 37 text
Recent improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
func countSum(xs []int) int {
var total int
for _, x := range xs {
total += x
}
return total
}
// Inlineable with CL148777.
Slide 38
Slide 38 text
Recent improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
The budget drops from 80 to 20 for
very large functions.
See https://golang.org/cl/125516.
Less inlining in big functions
Slide 39
Slide 39 text
Recent improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
func wrapper(x int) int {
return nonLeafCall(x)
}
// Inlineable with CL147361.
// Functions with a single non-leaf call
// inside bodies can be inlineable now.
// This is a part of mid-stack enabling
// work that is done by David Chase.
Slide 40
Slide 40 text
DevFest 2018, Novosibirsk, 23-25 November, 2018
Potential improvements
(The future?)
Slide 41
Slide 41 text
Potential improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
Some functions can become
trivially inlineable after SSA
optimizations. Second inlining
round could inline them.
Additional inlining round(s)
Slide 42
Slide 42 text
Potential improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
Collect constant argument control
flow specialization if it fits the
budget. It can solve issues like
#27149.
Arg-based specializations
Slide 43
Slide 43 text
Potential improvements
DevFest 2018, Novosibirsk, 23-25 November, 2018
Constructor functions, for example,
can benefit from being inlined,
since it can make more created
objects stack-allocated.
More special-casing for idioms
Slide 44
Slide 44 text
DevFest 2018, Novosibirsk, 23-25 November, 2018
Fundamental problems
Slide 45
Slide 45 text
Fundamental problems
DevFest 2018, Novosibirsk, 23-25 November, 2018
Even inside one arch, like
AMD64, optimal inliner
parameters can vary. Not
to mention how different it
is for the other arch
(instruction cache is one
of the concerns).
Slide 46
Slide 46 text
Fundamental problems
DevFest 2018, Novosibirsk, 23-25 November, 2018
Lack of compile flags
demands “best fit
defaults”. Compile time is
important in Go, we can’t
perform expensive
operations (and the user
can’t ask for that).
Slide 47
Slide 47 text
Fundamental problems
DevFest 2018, Novosibirsk, 23-25 November, 2018
If Go calling convention changes
from stack-based to register-based
we should reconsider some inliner
heuristics.
Calling convention
Pragmatic hints
DevFest 2018, Novosibirsk, 23-25 November, 2018
If possible, avoid even thinking
about inlining and how it will affect
your application performance. It
can be irrelevant.
Don’t rely on inlining (if you can)
Slide 50
Slide 50 text
Pragmatic hints
DevFest 2018, Novosibirsk, 23-25 November, 2018
Inlining heuristics may depend on
some very widely used Go idioms
to choose the right thing.
Write idiomatic Go code
Slide 51
Slide 51 text
Pragmatic hints
DevFest 2018, Novosibirsk, 23-25 November, 2018
Split big CPU-intensive functions
into several smaller ones
(inlineable) and call viable
specialization on the call site.
Manual function specialization
Slide 52
Slide 52 text
Pragmatic hints
DevFest 2018, Novosibirsk, 23-25 November, 2018
If it is crucial for some functions
to be inlineable, add a test for that.
github.com/Quasilyte/inltest
Write inlining tests
Slide 53
Slide 53 text
Pragmatic hints
DevFest 2018, Novosibirsk, 23-25 November, 2018
You can forbid function inlining.
//go:noinline
func coldPathFunc() T {
// Inlineable, but never
// called from a hot path.
}
Slide 54
Slide 54 text
Pragmatic hints
DevFest 2018, Novosibirsk, 23-25 November, 2018
You can debug inliner decisions.
$ go tool compile -m=2 foo.go
cannot inline coldPathFunc: marked go:noinline
Also works with “go build”:
$ go build -gcflags='-m=2' foo.go
Slide 55
Slide 55 text
Thanks for attention
Iskander (Alex) Sharipov
Go contributor, open source enthusiast
@quasilyte @quasilyte
@quasilyte
@quasilyte