Slide 1

Slide 1 text

1 Invokedynamic them all par Rémi Forax [email protected]

Slide 2

Slide 2 text

2 Me Rémi Forax Maitre de Conférence Université Paris Est - Marne la Vallée JCP Expert for JSR 292 (invokedynamic) and JSR 335 (lambda) Fall in love with Java a long time ago

Slide 3

Slide 3 text

3 You You want to implement the next gen dynamic language on top of the JVM You want to know how the JVM optimize a method call

Slide 4

Slide 4 text

4 Invokedynamic New opcode added to Java 7 Ease the implementation of a dynamic language runtime Make it faster too !

Slide 5

Slide 5 text

5 public interface I { public abstract void m(); } public class A implements I{ private static int COUNTER; @Override public void m() { COUNTER++; } public static void dump() { System.out.println(COUNTER); } }

Slide 6

Slide 6 text

6 public class Test1 { static void test(I i) { i.m(); // polymorphic call } public static void main(String[] args) { I i = new A(); for(int loop=0; loop< 100_000; loop++) { test(i); } A.dump(); } }

Slide 7

Slide 7 text

7 Compiled from "Test1.java" public class Test1 { static void test(I); Code: 0: aload_0 1: invokeinterface #16, 1 // I.m:()V 6: return ... } Output of javap

Slide 8

Slide 8 text

8 -XX:+PrintCompilation Print compiled method and compiled loop too (On Stack Replacement) 193 1 b Test1::test (7 bytes) 203 2 b A::m (9 bytes) 204 1 % b Test1::main @ 13 (33 bytes) 212 1 % Test1::main @ -2 made not entrant OSR compiler thread Timestamp since VM start (ms) invalidation

Slide 9

Slide 9 text

9 -XX:+PrintInlining Print the inlining tree and type profile informations 190 1 b Test1::test (7 bytes) @ 1 A::m (9 bytes) inline (hot) \-> TypeProfile (6700/6700 counts) = A 202 2 b A::m (9 bytes) 203 1 % b Test1::main @ 13 (33 bytes) @ 14 Test1::test (7 bytes) inline (hot) @ 1 A::m (9 bytes) inline (hot) @ 26 A::dump (10 bytes) never executed @ 14 Test1::test (7 bytes) inline (hot) @ 1 A::m (9 bytes) inline (hot) 211 1 % Test1::main @ -2 made not entrant Loop peeling

Slide 10

Slide 10 text

10 -XX:+PrintAssembly Need hsdis a dis-assembler http://kenai.com/projects/base-hsdis (binaires) must be prefixed by -XX:+UnlockDiagnosticVMOptions https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly

Slide 11

Slide 11 text

11 -XX:+PrintAssembly [Verified Entry Point] [Constants] # {method} 'test' '(LI;)V' in 'Test1' # parm0: rsi:rsi = 'I' # [sp+0x20] (sp of caller) ;; N1: # B1 <- B5 B4 B3 Freq: 1 ;; B1: # B5 B2 <- BLOCK HEAD IS JUNK Freq: 1 0x...5320: mov %eax,-0x16000(%rsp) 0x...5327: push %rbp 0x...5328: sub $0x10,%rsp ; *synchronization entry ; - Test1::test@-1 (line 4) 0x...532c: mov 0x8(%rsi),%r10d ; implicit exception: dispatches to 0x..5365 ;; B2: # B4 B3 <- B1 Freq: 0.999999 0x...5330: cmp $0xefe4a91d,%r10d ; {oop('A')} 0x...5337: jne 0x...5353 ; *invokeinterface m ; - Test1::test@1 (line 4) ;; B3: # N1 <- B2 Freq: 0.999998 0x...5339: mov $0x7d6bade78,%r10 ; {oop(a 'java/lang/Class' = 'A')} 0x...5343: incl 0x70(%r10) ; *synchronization entry ; - Test1::test@-1 (line 4) 0x...5347: add $0x10,%rsp 0x...534b: pop %rbp 0x...534c: test %eax,0x942fcae(%rip) # 0x00007f66da525000 ... ; {poll_return}

Slide 12

Slide 12 text

12 Prolog & Epilog [Verified Entry Point] [Constants] # {method} 'test' '(LI;)V' in 'Test1' # parm0: rsi:rsi = 'I' # [sp+0x20] (sp of caller) 0x...5320: mov %eax,-0x16000(%rsp) 0x...5327: push %rbp 0x...5328: sub $0x10,%rsp ; *synchronization entry ; - Test1::test@-1 (line 4) ... 0x...5347: add $0x10,%rsp 0x...534b: pop %rbp 0x...534c: test %eax,0x942fcae(%rip) ; {poll_return} ... prolog epilog GC safepoint polling

Slide 13

Slide 13 text

13 Body [Verified Entry Point] [Constants] # parm0: rsi:rsi = 'I' ;; B1: # B1 <- B5 B4 B3 Freq: 1 ... 0x...532c: mov 0x8(%rsi),%r10d ; implicit exception: ; dispatches to 0x..5365 ;; B2: # B4 B3 <- B1 Freq: 0.999999 0x...5330: cmp $0xefe4a91d,%r10d ; {oop('A')} 0x...5337: jne 0x...5353 ; *invokeinterface m ; - Test1::test@1 (line 4) ;; B3: # N1 <- B2 Freq: 0.999998 0x...5339: mov $0x7d6bade78,%r10 ; {oop(a 'java/lang/Class' = 'A')} 0x...5343: incl 0x70(%r10) ; *synchronization entry ; - Test1::test@-1 (line 4) ... A::counter++ get the class + implicit NPE cheap class test

Slide 14

Slide 14 text

14 If check cast fails ? ;; B4: # N1 <- B2 Freq: 9.99999e-07 0x...5353: mov %rsi,%rbp 0x...5356: mov $0xffffffde,%esi 0x...f535b: callq 0x00007f66d10ceae0 ; OopMap{rbp=Oop off=64} ; *invokeinterface m ; - Test1::test@1 (line 4) ; {runtime_call} jump to the interpreter => deoptimise !

Slide 15

Slide 15 text

15 VMs Java •Profile the code, record seen types, record branch frequencies •Generate only the necessary code, de-virtualise, inline, re-organize branches, etc •Deoptimise (et reoptimise) when neeeded

Slide 16

Slide 16 text

16 What if we can do the same for any languages (not just Java)

Slide 17

Slide 17 text

17 What if we can do the What if we can do the same for any languages same for any languages

Slide 18

Slide 18 text

18 invokedynamic public class IndyTest { static void test(I); Code: 0: aload_0 1: invokedynamic #20, 0 // foobar:(LI;)V // RT.bootstrap(Lookup, String,MethodType) 6: return ... The linking is done by calling a Java method !

Slide 19

Slide 19 text

19 Linking public class RT { public static CallSite bootstrap( Lookup lookup, String name, MethodType mType) { MethodHandle mh = lookup.findStatic( RT.class, "method", mType); return new ConstantCallSite(mh); } public static void method(I i) { throw new AssertionError("hello indy"); } }

Slide 20

Slide 20 text

20 Bootstrap method aload_0 invokedynamic (I)V public static CallSite bootstrap(lookup, name, mType) { mh = lookup.findStatic(RT.class, "method", mType); return new ConstantCallSite(mh) ; } static void method(I i) { ... } bootstrap Install a callsite target Signature too specific!

Slide 21

Slide 21 text

21 Generic signature = need target adaptation aload_0 invokedynamic (I)V static Object method(MutableCallSite cs, Object[] args) { ... } target Generic signature cast (Object)Object varargs box (Object[])Object bind (MutableCallSite, Object[])Object

Slide 22

Slide 22 text

22 Bind, collect, box, ... public static Object method(MutableCallsite cs, Object[] args) { } private static final MethodHandle METHOD; static { METHOD = MethodHandles.lookup().findStatic( RT.class, "method", methodType(Object.class, MutableCallSite.class, Object[].class)); } public static CallSite bootstrap(lookup, name, mType) { callSite = new MutableCallSite() ; mh = METHOD.bindTo(callSite); mh = mh.asCollector( Object[].class, methodType.parameterCount()); mh = mh.asType(methodType); callSite.setTarget(mh) ; return callSite; }

Slide 23

Slide 23 text

23 Resolving a call getClass() + lookup.find() public static Object method(MutableCallSite callSite, Object[] args) { Class receiverClass = args[0].getClass(); mh = MethodHandles.lookup().findVirtual( receiverClass, "m", callSite.type().dropParameterTypes(0, 1)); mh = mh.asType(callSite.type()); return mh.invokeWithArguments(args); } Ahhh, not effective !

Slide 24

Slide 24 text

24 Inlining cache ? Idea : do what the VM does Test if the receiver is a known class => if Ok, the code can be inlined ! Modify the callsite to insert the test at runtime

Slide 25

Slide 25 text

25 Class check private static final MethodHandle CLASS_CHECK; static { lookup = MethodHandles.lookup(); METHOD = ... CLASS_CHECK = lookup.findStatic(RT.class, "classCheck", methodType(boolean.class, Class.class, Object.class)); } public static boolean classCheck(Class type, Object o) { return o.getClass() == type; }

Slide 26

Slide 26 text

26 With a guard aload_0 invokedynamic (I)V static Object method(MutableCallSite cs, Object[] args) if then else cast + varargs box + bind static boolean checkCast(Class, Object) bind (Class, I)Z cast (Class, Object)Z void m(A a) { counter++ ; } cast (A)V

Slide 27

Slide 27 text

27 Inlining cache ! public static Object method(MutableCallSite callSite, Object[] args) { Class receiverClass = args[0].getClass(); MethodType type = callSite.type(); MethodHandle mh = MethodHandles.lookup().findVirtual( receiverClass, "m", type.dropParameterTypes(0, 1)); mh = mh.asType(type); MethodHandle test = CLASS_CHECK; test = test.bindTo(receiverClass); test = test.asType(type.changeReturnType(boolean.class)); MethodHandle guard = MethodHandles.guardWithTest( test, mh, callSite.getTarget()); callSite.setTarget(guard); return mh.invokeWithArguments(args); }

Slide 28

Slide 28 text

28 Resulting inlining tree Method handle calls are inlined ! 263 1 n j.l.Object::getClass (0 bytes) 264 2 b IndyTest::test (7 bytes) @ 1 j.l.i.MethodHandle::invokeExact inline (hot) @ 3 j.l.i.MethodHandle::invokeExact inline (hot) @ 3 RT::classCheck inline (hot) @ 1 j.l.Object::getClass (intrinsic) @ 10 j.l.i.MethodHandle::invokeExact inline (hot) @ 21 j.l.i.MethodHandle::invokeExact inline (hot)

Slide 29

Slide 29 text

29 In assembler 0x...d26c: mov 0x8(%rsi),%r11d ; implicit exception 0x...d270: mov 0x78(%r12,%r11,8),%rbp ; *invokevirtual getClass 0x...d275: mov $0x7d6bb4228,%r10 ; {oop(a 'j/l/Class' = 'A')} 0x...d27f: cmp %r10,%rbp 0x...d282: jne 0x00007f594911d2c9 ; *if_acmpne ;; B3: # B6 B4 <- B2 Freq: 0.999999 0x...d284: mov 0x40(%r12,%r11,8),%r10 0x...d289: mov $0x77f280d58,%r11 ; {oop('A')} 0x...d293: cmp %r11,%r10 0x...d296: jne 0x00007f594911d2b4 ; *checkcast ;; B4: # B8 B5 <- B3 Freq: 0.999997 0x...d298: nop 0x...d299: mov $0xffffffffffffffff,%rax ; {oop(NULL)} 0x...d2a3: callq 0x00007f59490cdf60 ; OopMap{off=72} ;*invokevirtual m Not neede ! Direct call but no inline r12 : base heap with compressed oops

Slide 30

Slide 30 text

30 Conclusion You can ask the VM to print the inlining tree or the assembler invokedynamic allows anyone to generate its own optimization code that will be inlined by the VM Implementation of invokedynamic is still young, performance will improve