Slide 1

Slide 1 text

HIGH PERFORMANCE RUBY Friday, September 14, 12

Slide 2

Slide 2 text

Hiya • Charles Oliver Nutter • [email protected] • @headius • JVM language guy at Red Hat (JBoss) Friday, September 14, 12

Slide 3

Slide 3 text

Performance? • Writing code • Man hours more expensive than CPU hours • Developer contentedness • Running code • Straight line Friday, September 14, 12

Slide 4

Slide 4 text

High Performance? • Faster than... • ...other Ruby impls? • ...other language runtimes? • ...unmanaged languages, like C? • ...you need it to be? Friday, September 14, 12

Slide 5

Slide 5 text

“Fast Enough” • 1.8.7 was fast enough • 1.9.3 is fast enough • Unless it’s not fast enough • Does it matter? Friday, September 14, 12

Slide 6

Slide 6 text

Performance Wall • Move to a different runtime • Move to a different language • ...in whole or part Friday, September 14, 12

Slide 7

Slide 7 text

If you’re not writing perf- sensitive code in Ruby, you’re giving up too easily. Friday, September 14, 12

Slide 8

Slide 8 text

Native Extensions • Not universally bad • Just bad in MRI • Invasive • Pointers • Few guarantees Friday, September 14, 12

Slide 9

Slide 9 text

What We Want • Faster execution • Better GC • Parallel execution • Big data Friday, September 14, 12

Slide 10

Slide 10 text

What We Can’t Have • Faster execution • Better GC • Parallel execution • Big data Friday, September 14, 12

Slide 11

Slide 11 text

Different Approach • Build our own runtime? • YARV, Rubinius, MacRuby • Use an existing runtime? • JRuby, MagLev, MacRuby, IronRuby Friday, September 14, 12

Slide 12

Slide 12 text

Build or Buy • Making a new VM is “easy” • Making it competitive is really hard • I mean really, really, really hard Friday, September 14, 12

Slide 13

Slide 13 text

JVM • 15+ years of engineering by whole teams • FOSS • Fastest VM available • Best GCs available • Full parallel threading with guarantees • Broad platform support Friday, September 14, 12

Slide 14

Slide 14 text

But Java is Slow! • Java is very, very fast • Literally, C fast in many cases • Java applications can be slow • Oh hey, just like Ruby? • The way you write code is more important than the language you use. Friday, September 14, 12

Slide 15

Slide 15 text

JRuby • Java (and Ruby) impl of Ruby on JVM • Same memory, threading model • JRuby JITs to JVM bytecode • End of story, right? Friday, September 14, 12

Slide 16

Slide 16 text

Long, Hard Road • Interpreter optimization • JVM bytecode compiler • Optimizing core class methods • Lather, rinse, and repeat Friday, September 14, 12

Slide 17

Slide 17 text

Friday, September 14, 12

Slide 18

Slide 18 text

Align with JVM • Individual arguments on call stack • JVM local variables • Avoid artificial framing • Avoid inter-call goo • Eliminate unnecessary work Friday, September 14, 12

Slide 19

Slide 19 text

Unnecessary Work • Modules are maps • Name to method • Name to constant • Name to class var • Instance variables as maps • Wasted cycles without caching Friday, September 14, 12

Slide 20

Slide 20 text

Method Lookup • Inside a class/module • Current class’s methods (a map) • Methods retrieved from class + ancestors • Serial or switch indicates staleness • Weak list of child classes • Class mutation cascades down hierarchy Friday, September 14, 12

Slide 21

Slide 21 text

Thing Person Place Rubyist Other obj.to_s Friday, September 14, 12

Slide 22

Slide 22 text

Thing Person Place Rubyist Other Method lookups go up-hierarchy obj.to_s Friday, September 14, 12

Slide 23

Slide 23 text

Thing Person Place Rubyist Other Method lookups go up-hierarchy obj.to_s to_s Friday, September 14, 12

Slide 24

Slide 24 text

Thing Person Place Rubyist Other Method lookups go up-hierarchy Lookup target caches result obj.to_s to_s Friday, September 14, 12

Slide 25

Slide 25 text

Thing Person Place Rubyist Other Method lookups go up-hierarchy Lookup target caches result obj.to_s to_s Friday, September 14, 12

Slide 26

Slide 26 text

Thing Person Place Rubyist Other Method lookups go up-hierarchy Lookup target caches result Modification cascades down obj.to_s to_s Friday, September 14, 12

Slide 27

Slide 27 text

Thing Person Place Rubyist Other Method lookups go up-hierarchy Lookup target caches result Modification cascades down obj.to_s to_s to_s Friday, September 14, 12

Slide 28

Slide 28 text

Constant Lookup • Cache at lookup site • Global serial/switch indicates staleness • Complexities of lookup, etc • Joy of Ruby interfering with Joy of Opto • Modifying constants triggers invalidation Friday, September 14, 12

Slide 29

Slide 29 text

Instance Vars • Class holds a table of offsets • Object holds array of values • Call site caches offset plus class ID • Same class, no lookup cost • Can be polymorphically chained Friday, September 14, 12

Slide 30

Slide 30 text

Optimizing Ruby • Make calls fast • Make constants free • Make instance variables cheap • Make closures lightweight • TODO Friday, September 14, 12

Slide 31

Slide 31 text

What is invokedynamic? Friday, September 14, 12

Slide 32

Slide 32 text

Invoke? Friday, September 14, 12

Slide 33

Slide 33 text

Invoke? That’s one use, but there are many others Friday, September 14, 12

Slide 34

Slide 34 text

Dynamic? Friday, September 14, 12

Slide 35

Slide 35 text

Dynamic? Dynamic typing is a common reason, but there are many others Friday, September 14, 12

Slide 36

Slide 36 text

JVM 101 Friday, September 14, 12

Slide 37

Slide 37 text

JVM 101 200 opcodes Friday, September 14, 12

Slide 38

Slide 38 text

JVM 101 200 opcodes Ten (or 16) “data endpoints” Friday, September 14, 12

Slide 39

Slide 39 text

JVM 101 200 opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Ten (or 16) “data endpoints” Friday, September 14, 12

Slide 40

Slide 40 text

JVM 101 200 opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Ten (or 16) “data endpoints” Friday, September 14, 12

Slide 41

Slide 41 text

JVM 101 200 opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Array Access *aload *astore b,s,c,i,l,d,f,a Ten (or 16) “data endpoints” Friday, September 14, 12

Slide 42

Slide 42 text

JVM 101 200 opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Array Access *aload *astore b,s,c,i,l,d,f,a Ten (or 16) “data endpoints” All Java code revolves around these endpoints Remaining ops are stack, local vars, flow control allocation, and math/boolean/bit operations Friday, September 14, 12

Slide 43

Slide 43 text

JVM Opcodes Friday, September 14, 12

Slide 44

Slide 44 text

JVM Opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Array Access *aload *astore b,s,c,i,l,d,f,a Friday, September 14, 12

Slide 45

Slide 45 text

JVM Opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Array Access *aload *astore b,s,c,i,l,d,f,a Stack Local Vars Flow Control Allocation Boolean and Numeric Friday, September 14, 12

Slide 46

Slide 46 text

JVM Opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Array Access *aload *astore b,s,c,i,l,d,f,a Stack Local Vars Flow Control Allocation Boolean and Numeric Friday, September 14, 12

Slide 47

Slide 47 text

JVM Opcodes Invocation invokevirtual invokeinterface invokestatic invokespecial Field Access getfield setfield getstatic setstatic Array Access *aload *astore b,s,c,i,l,d,f,a Stack Local Vars Flow Control Allocation Boolean and Numeric Friday, September 14, 12

Slide 48

Slide 48 text

Friday, September 14, 12

Slide 49

Slide 49 text

In Detail • JRuby generates code with indy calls • JVM at first call asks JRuby what to do • JRuby provides function pointers to code • Pointers include guards, invalidation logic • JRuby and JVM cooperate on optimizing Friday, September 14, 12

Slide 50

Slide 50 text

Friday, September 14, 12

Slide 51

Slide 51 text

invokedynamic bytecode Friday, September 14, 12

Slide 52

Slide 52 text

invokedynamic bytecode bootstrap m ethod Friday, September 14, 12

Slide 53

Slide 53 text

method handles invokedynamic bytecode bootstrap m ethod Friday, September 14, 12

Slide 54

Slide 54 text

method handles invokedynamic bytecode bootstrap m ethod target method Friday, September 14, 12

Slide 55

Slide 55 text

method handles invokedynamic bytecode bootstrap m ethod target method Friday, September 14, 12

Slide 56

Slide 56 text

method handles invokedynamic bytecode bootstrap m ethod target method Friday, September 14, 12

Slide 57

Slide 57 text

Dynamic Invocation Target Object Method Table def foo ... def bar ... associated with obj.foo() JVM Friday, September 14, 12

Slide 58

Slide 58 text

VM Operations Dynamic Invocation Target Object Method Table def foo ... def bar ... associated with obj.foo() JVM Call Site Friday, September 14, 12

Slide 59

Slide 59 text

VM Operations Dynamic Invocation Target Object Method Table def foo ... def bar ... associated with obj.foo() JVM Call Site Friday, September 14, 12

Slide 60

Slide 60 text

VM Operations Method Lookup Dynamic Invocation Target Object Method Table def foo ... def bar ... associated with obj.foo() JVM def foo ... Call Site Friday, September 14, 12

Slide 61

Slide 61 text

VM Operations Method Lookup Branch Dynamic Invocation Target Object Method Table def foo ... def bar ... associated with obj.foo() JVM def foo ... Call Site Friday, September 14, 12

Slide 62

Slide 62 text

VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target Object Method Table def foo ... def bar ... associated with obj.foo() JVM def foo ... Call Site Friday, September 14, 12

Slide 63

Slide 63 text

Constants Constant Lookup MY_CONST JVM Call Site Friday, September 14, 12

Slide 64

Slide 64 text

VM Operations Constants Constant Lookup MY_CONST JVM Call Site Friday, September 14, 12

Slide 65

Slide 65 text

VM Operations Constants Constant Lookup MY_CONST JVM Call Site Friday, September 14, 12

Slide 66

Slide 66 text

VM Operations Lookup Value Constants Constant Lookup MY_CONST JVM Call Site value Friday, September 14, 12

Slide 67

Slide 67 text

VM Operations Lookup Value Bind Permanently Constants Constant Lookup MY_CONST JVM Call Site value Friday, September 14, 12

Slide 68

Slide 68 text

Instance Variables Target Object Offset Table “@foo” => 0 “@bar” => 1 associated with @bar JVM Friday, September 14, 12

Slide 69

Slide 69 text

VM Operations Instance Variables Target Object Offset Table “@foo” => 0 “@bar” => 1 associated with @bar JVM Access Site Friday, September 14, 12

Slide 70

Slide 70 text

VM Operations Instance Var Lookup Instance Variables Target Object Offset Table “@foo” => 0 “@bar” => 1 associated with @bar JVM Access Site Friday, September 14, 12

Slide 71

Slide 71 text

VM Operations Instance Var Lookup Offset Cache Instance Variables Target Object Offset Table “@foo” => 0 “@bar” => 1 associated with @bar JVM 1 Access Site Friday, September 14, 12

Slide 72

Slide 72 text

VM Operations Instance Var Lookup Offset Cache Access Object Instance Variables Target Object Offset Table “@foo” => 0 “@bar” => 1 associated with @bar JVM 1 Access Site Friday, September 14, 12

Slide 73

Slide 73 text

VM Operations Instance Var Lookup Offset Cache Access Object Instance Variables Target Object Offset Table “@foo” => 0 “@bar” => 1 associated with @bar JVM 1 Access Site Friday, September 14, 12

Slide 74

Slide 74 text

InvokeDynamic lets JRuby teach the JVM how Ruby works Friday, September 14, 12

Slide 75

Slide 75 text

How Do We Know We’ve Succeeded? • Benchmarking • Monitoring • User reports Friday, September 14, 12

Slide 76

Slide 76 text

Benchmarking is Hard • Runtimes may improve over time • Optimizer may eliminate useless code • Small systems are completely different • Know how your runtime optimizes! Friday, September 14, 12

Slide 77

Slide 77 text

bench_empty_method def foo; self; end i = 0 while i < 10_000_000 foo; foo; foo; foo; foo i += 1 end Friday, September 14, 12

Slide 78

Slide 78 text

0s 1s 2s 3s 4s Ruby 1.9.3 JRuby JRuby + indy ZOMG 40X FASTER! Friday, September 14, 12

Slide 79

Slide 79 text

Observations Friday, September 14, 12

Slide 80

Slide 80 text

One slow runtime screws up the table Friday, September 14, 12

Slide 81

Slide 81 text

Friday, September 14, 12

Slide 82

Slide 82 text

...do comparisons as ratios against a norm Friday, September 14, 12

Slide 83

Slide 83 text

Friday, September 14, 12

Slide 84

Slide 84 text

JRuby calls empty methods really fast!!! Friday, September 14, 12

Slide 85

Slide 85 text

InvokeDynamic does not do much for us? Friday, September 14, 12

Slide 86

Slide 86 text

0s 1s 2s 3s 4s Ruby 1.9.3 JRuby JRuby + indy Friday, September 14, 12

Slide 87

Slide 87 text

JVM Opto 101 • JITs code bodies after 10k calls • No 10k calls, no JIT (generally) • Inlines up to two targets • Optimistic • Early decisions may be wrong • Small code looks drastically different Friday, September 14, 12

Slide 88

Slide 88 text

SMALL CODE IS DIFFERENT THAN LARGE CODE Friday, September 14, 12

Slide 89

Slide 89 text

Inlining • Call site in method A and method B match • JVM treats them as though B lived in A • No call overhead • Variables visible across call boundary • More complete view for optimization Friday, September 14, 12

Slide 90

Slide 90 text

Optimistic • Say we have a system... • The only method dynamically called is “foo” • All logic for dyncall revolves around “foo” • Hotspot thinks all dyncalls will be “foo” Friday, September 14, 12

Slide 91

Slide 91 text

bench_empty_method2 def foo; self; end def bar1; self; end def bar2; self; end i = 0 while i < 10_000_000 bar1; bar1; bar1; bar1; bar1 bar2; bar2; bar2; bar2; bar2 i += 1 end ... Friday, September 14, 12

Slide 92

Slide 92 text

0s 0.175s 0.35s 0.525s 0.7s bench1 bench2 bench1 + indy bench2 + indy Friday, September 14, 12

Slide 93

Slide 93 text

0s 0.1s 0.2s 0.3s 0.4s bench1 + rbx bench2 + rbx bench1 + indy bench2 + indy Friday, September 14, 12

Slide 94

Slide 94 text

What Happened? • An unrelated change slowed our bench? • Not really unrelated • Hotspot optimizes early loop first • Later loop is different...calls “foo” • Assumptions change, perf looks different Friday, September 14, 12

Slide 95

Slide 95 text

Benchmarking is Not Enough • Need to monitor runtime optimization • JIT compilation • Inlining • Eventual native code (x86 ASM) • Fun? Friday, September 14, 12

Slide 96

Slide 96 text

1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) Friday, September 14, 12

Slide 97

Slide 97 text

1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) Friday, September 14, 12

Slide 98

Slide 98 text

1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) Friday, September 14, 12

Slide 99

Slide 99 text

Decoding compiled method 0x000000010549d7d0: Code: [Entry Point] [Verified Entry Point] [Constants] # {method} 'method__0$RUBY$foo' '(Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/ runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;' in 'bench_empty_method' # parm0: rsi:rsi = 'bench_empty_method' # parm1: rdx:rdx = 'org/jruby/runtime/ThreadContext' # parm2: rcx:rcx = 'org/jruby/runtime/builtin/IRubyObject' # parm3: r8:r8 = 'org/jruby/runtime/Block' # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq Friday, September 14, 12

Slide 100

Slide 100 text

Decoding compiled method 0x000000010549d7d0: Code: [Entry Point] [Verified Entry Point] [Constants] # {method} 'method__0$RUBY$foo' '(Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/ runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;' in 'bench_empty_method' # parm0: rsi:rsi = 'bench_empty_method' # parm1: rdx:rdx = 'org/jruby/runtime/ThreadContext' # parm2: rcx:rcx = 'org/jruby/runtime/builtin/IRubyObject' # parm3: r8:r8 = 'org/jruby/runtime/Block' # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq Friday, September 14, 12

Slide 101

Slide 101 text

Decoding compiled method 0x000000010549d7d0: Code: [Entry Point] [Verified Entry Point] [Constants] # {method} 'method__0$RUBY$foo' '(Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/ runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;' in 'bench_empty_method' # parm0: rsi:rsi = 'bench_empty_method' # parm1: rdx:rdx = 'org/jruby/runtime/ThreadContext' # parm2: rcx:rcx = 'org/jruby/runtime/builtin/IRubyObject' # parm3: r8:r8 = 'org/jruby/runtime/Block' # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq Friday, September 14, 12

Slide 102

Slide 102 text

bench_empty_method3 def invoker1 i = 0 while i < 1000 foo; foo; foo; foo; foo i+=1 end end ... i = 0 while i < 10000 invoker1 i+=1 end Friday, September 14, 12

Slide 103

Slide 103 text

0s 0.038s 0.075s 0.113s 0.15s bench1 + indy bench2 + indy bench3 + indy Friday, September 14, 12

Slide 104

Slide 104 text

Moral • Benchmarks are synthetic • Every system is different • Do your own testing Friday, September 14, 12

Slide 105

Slide 105 text

bench_red_black • Pure-Ruby red/black tree impl • Build a 100k tree of rand(999_999) • Delete all nodes • Build it again • Search for elements • In-order walks, min, max Friday, September 14, 12

Slide 106

Slide 106 text

0s 1.25s 2.5s 3.75s 5s bench_red_black Ruby 1.9.3 JRuby - indy JRuby + indy Friday, September 14, 12

Slide 107

Slide 107 text

bench_fractal bench_flipflop_fractal • Mandelbrot generator • Integer loops • Floating-point math • Julia generator using flip-flops • I don’t really understand it. Friday, September 14, 12

Slide 108

Slide 108 text

Friday, September 14, 12

Slide 109

Slide 109 text

def fractal_flipflop w, h = 44, 54 c = 7 + 42 * w a = [0] * w * h g = d = 0 f = proc do |n| a[c] += 1 o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/) puts "\f" + o.map {|l| l.rstrip }.join("\n") d += 1 - 2 * ((g ^= 1 << n) >> n) c += [1, w, -1, -w][d %= 4] end 1024.times do !!(!!(!!(!!(!!(!!(!!(!!(!!(true... f[0])...f[1])...f[2])... f[3])...f[4])...f[5])... f[6])...f[7])...f[8]) end end Friday, September 14, 12

Slide 110

Slide 110 text

def fractal_flipflop w, h = 44, 54 c = 7 + 42 * w a = [0] * w * h g = d = 0 f = proc do |n| a[c] += 1 o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/) puts "\f" + o.map {|l| l.rstrip }.join("\n") d += 1 - 2 * ((g ^= 1 << n) >> n) c += [1, w, -1, -w][d %= 4] end 1024.times do !!(!!(!!(!!(!!(!!(!!(!!(!!(true... f[0])...f[1])...f[2])... f[3])...f[4])...f[5])... f[6])...f[7])...f[8]) end end Friday, September 14, 12

Slide 111

Slide 111 text

Friday, September 14, 12

Slide 112

Slide 112 text

0s 0.375s 0.75s 1.125s 1.5s bench_fractal Ruby 1.9.3 JRuby - indy JRuby + indy Friday, September 14, 12

Slide 113

Slide 113 text

0s 0.375s 0.75s 1.125s 1.5s bench_flipflop_fractal Ruby 1.9.3 JRuby - indy JRuby + indy Friday, September 14, 12

Slide 114

Slide 114 text

Rails? Friday, September 14, 12

Slide 115

Slide 115 text

Rails Perf • Mixed bag right now...some fast some slow • JVM JIT limits need to be bumped up • Significant gains for some folks • Long warmup times for so much code • Work continues! Friday, September 14, 12

Slide 116

Slide 116 text

What Next? Friday, September 14, 12

Slide 117

Slide 117 text

Expand Opto • Mixed-arity (ADD SLIDES ABOUT WHAT WE OPTIMIZE TODAY) • Super calls • Much, much lighter-weight closures • Then what? Friday, September 14, 12

Slide 118

Slide 118 text

Wacky Stuff • define_method methods? • method_missing call-throughs? • respond_to??? • proc tables? • All possible...but worth it? Friday, September 14, 12

Slide 119

Slide 119 text

The Future • JRuby will continue to get faster • Indy improvements at VM-level • Compiler improvements at Ruby level • If you can’t compete with JVM... • Still FOSS from top to bottom • Don’t be afraid! Friday, September 14, 12

Slide 120

Slide 120 text

Q/A Friday, September 14, 12