Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Invokedynamic them all

forax
April 22, 2012

Invokedynamic them all

An English translation of my presentation at Devoxx-Fr 2012 (Devoxx-France)

forax

April 22, 2012
Tweet

More Decks by forax

Other Decks in Programming

Transcript

  1. 2 Me Rémi Forax Maitre de Conférence Université Paris Est

    - Marne la Vallée JCP Expert for JSR 292 (invokedynamic) and JSR 335 (lambda) Fall in love with Java a long time ago
  2. 3 You You want to implement the next gen dynamic

    language on top of the JVM You want to know how the JVM optimize a method call
  3. 4 Invokedynamic New opcode added to Java 7 Ease the

    implementation of a dynamic language runtime Make it faster too !
  4. 5 public interface I { public abstract void m(); }

    public class A implements I{ private static int COUNTER; @Override public void m() { COUNTER++; } public static void dump() { System.out.println(COUNTER); } }
  5. 6 public class Test1 { static void test(I i) {

    i.m(); // polymorphic call } public static void main(String[] args) { I i = new A(); for(int loop=0; loop< 100_000; loop++) { test(i); } A.dump(); } }
  6. 7 Compiled from "Test1.java" public class Test1 { static void

    test(I); Code: 0: aload_0 1: invokeinterface #16, 1 // I.m:()V 6: return ... } Output of javap
  7. 8 -XX:+PrintCompilation Print compiled method and compiled loop too (On

    Stack Replacement) 193 1 b Test1::test (7 bytes) 203 2 b A::m (9 bytes) 204 1 % b Test1::main @ 13 (33 bytes) 212 1 % Test1::main @ -2 made not entrant OSR compiler thread Timestamp since VM start (ms) invalidation
  8. 9 -XX:+PrintInlining Print the inlining tree and type profile informations

    190 1 b Test1::test (7 bytes) @ 1 A::m (9 bytes) inline (hot) \-> TypeProfile (6700/6700 counts) = A 202 2 b A::m (9 bytes) 203 1 % b Test1::main @ 13 (33 bytes) @ 14 Test1::test (7 bytes) inline (hot) @ 1 A::m (9 bytes) inline (hot) @ 26 A::dump (10 bytes) never executed @ 14 Test1::test (7 bytes) inline (hot) @ 1 A::m (9 bytes) inline (hot) 211 1 % Test1::main @ -2 made not entrant Loop peeling
  9. 10 -XX:+PrintAssembly Need hsdis a dis-assembler http://kenai.com/projects/base-hsdis (binaires) must be

    prefixed by -XX:+UnlockDiagnosticVMOptions https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly
  10. 11 -XX:+PrintAssembly [Verified Entry Point] [Constants] # {method} 'test' '(LI;)V'

    in 'Test1' # parm0: rsi:rsi = 'I' # [sp+0x20] (sp of caller) ;; N1: # B1 <- B5 B4 B3 Freq: 1 ;; B1: # B5 B2 <- BLOCK HEAD IS JUNK Freq: 1 0x...5320: mov %eax,-0x16000(%rsp) 0x...5327: push %rbp 0x...5328: sub $0x10,%rsp ; *synchronization entry ; - Test1::test@-1 (line 4) 0x...532c: mov 0x8(%rsi),%r10d ; implicit exception: dispatches to 0x..5365 ;; B2: # B4 B3 <- B1 Freq: 0.999999 0x...5330: cmp $0xefe4a91d,%r10d ; {oop('A')} 0x...5337: jne 0x...5353 ; *invokeinterface m ; - Test1::test@1 (line 4) ;; B3: # N1 <- B2 Freq: 0.999998 0x...5339: mov $0x7d6bade78,%r10 ; {oop(a 'java/lang/Class' = 'A')} 0x...5343: incl 0x70(%r10) ; *synchronization entry ; - Test1::test@-1 (line 4) 0x...5347: add $0x10,%rsp 0x...534b: pop %rbp 0x...534c: test %eax,0x942fcae(%rip) # 0x00007f66da525000 ... ; {poll_return}
  11. 12 Prolog & Epilog [Verified Entry Point] [Constants] # {method}

    'test' '(LI;)V' in 'Test1' # parm0: rsi:rsi = 'I' # [sp+0x20] (sp of caller) 0x...5320: mov %eax,-0x16000(%rsp) 0x...5327: push %rbp 0x...5328: sub $0x10,%rsp ; *synchronization entry ; - Test1::test@-1 (line 4) ... 0x...5347: add $0x10,%rsp 0x...534b: pop %rbp 0x...534c: test %eax,0x942fcae(%rip) ; {poll_return} ... prolog epilog GC safepoint polling
  12. 13 Body [Verified Entry Point] [Constants] # parm0: rsi:rsi =

    'I' ;; B1: # B1 <- B5 B4 B3 Freq: 1 ... 0x...532c: mov 0x8(%rsi),%r10d ; implicit exception: ; dispatches to 0x..5365 ;; B2: # B4 B3 <- B1 Freq: 0.999999 0x...5330: cmp $0xefe4a91d,%r10d ; {oop('A')} 0x...5337: jne 0x...5353 ; *invokeinterface m ; - Test1::test@1 (line 4) ;; B3: # N1 <- B2 Freq: 0.999998 0x...5339: mov $0x7d6bade78,%r10 ; {oop(a 'java/lang/Class' = 'A')} 0x...5343: incl 0x70(%r10) ; *synchronization entry ; - Test1::test@-1 (line 4) ... A::counter++ get the class + implicit NPE cheap class test
  13. 14 If check cast fails ? ;; B4: # N1

    <- B2 Freq: 9.99999e-07 0x...5353: mov %rsi,%rbp 0x...5356: mov $0xffffffde,%esi 0x...f535b: callq 0x00007f66d10ceae0 ; OopMap{rbp=Oop off=64} ; *invokeinterface m ; - Test1::test@1 (line 4) ; {runtime_call} jump to the interpreter => deoptimise !
  14. 15 VMs Java •Profile the code, record seen types, record

    branch frequencies •Generate only the necessary code, de-virtualise, inline, re-organize branches, etc •Deoptimise (et reoptimise) when neeeded
  15. 16 What if we can do the same for any

    languages (not just Java)
  16. 17 What if we can do the What if we

    can do the same for any languages same for any languages
  17. 18 invokedynamic public class IndyTest { static void test(I); Code:

    0: aload_0 1: invokedynamic #20, 0 // foobar:(LI;)V // RT.bootstrap(Lookup, String,MethodType) 6: return ... The linking is done by calling a Java method !
  18. 19 Linking public class RT { public static CallSite bootstrap(

    Lookup lookup, String name, MethodType mType) { MethodHandle mh = lookup.findStatic( RT.class, "method", mType); return new ConstantCallSite(mh); } public static void method(I i) { throw new AssertionError("hello indy"); } }
  19. 20 Bootstrap method aload_0 invokedynamic (I)V public static CallSite bootstrap(lookup,

    name, mType) { mh = lookup.findStatic(RT.class, "method", mType); return new ConstantCallSite(mh) ; } static void method(I i) { ... } bootstrap Install a callsite target Signature too specific!
  20. 21 Generic signature = need target adaptation aload_0 invokedynamic (I)V

    static Object method(MutableCallSite cs, Object[] args) { ... } target Generic signature cast (Object)Object varargs box (Object[])Object bind (MutableCallSite, Object[])Object
  21. 22 Bind, collect, box, ... public static Object method(MutableCallsite cs,

    Object[] args) { } private static final MethodHandle METHOD; static { METHOD = MethodHandles.lookup().findStatic( RT.class, "method", methodType(Object.class, MutableCallSite.class, Object[].class)); } public static CallSite bootstrap(lookup, name, mType) { callSite = new MutableCallSite() ; mh = METHOD.bindTo(callSite); mh = mh.asCollector( Object[].class, methodType.parameterCount()); mh = mh.asType(methodType); callSite.setTarget(mh) ; return callSite; }
  22. 23 Resolving a call getClass() + lookup.find() public static Object

    method(MutableCallSite callSite, Object[] args) { Class<?> receiverClass = args[0].getClass(); mh = MethodHandles.lookup().findVirtual( receiverClass, "m", callSite.type().dropParameterTypes(0, 1)); mh = mh.asType(callSite.type()); return mh.invokeWithArguments(args); } Ahhh, not effective !
  23. 24 Inlining cache ? Idea : do what the VM

    does Test if the receiver is a known class => if Ok, the code can be inlined ! Modify the callsite to insert the test at runtime
  24. 25 Class check private static final MethodHandle CLASS_CHECK; static {

    lookup = MethodHandles.lookup(); METHOD = ... CLASS_CHECK = lookup.findStatic(RT.class, "classCheck", methodType(boolean.class, Class.class, Object.class)); } public static boolean classCheck(Class<?> type, Object o) { return o.getClass() == type; }
  25. 26 With a guard aload_0 invokedynamic (I)V static Object method(MutableCallSite

    cs, Object[] args) if then else cast + varargs box + bind static boolean checkCast(Class, Object) bind (Class, I)Z cast (Class, Object)Z void m(A a) { counter++ ; } cast (A)V
  26. 27 Inlining cache ! public static Object method(MutableCallSite callSite, Object[]

    args) { Class<?> receiverClass = args[0].getClass(); MethodType type = callSite.type(); MethodHandle mh = MethodHandles.lookup().findVirtual( receiverClass, "m", type.dropParameterTypes(0, 1)); mh = mh.asType(type); MethodHandle test = CLASS_CHECK; test = test.bindTo(receiverClass); test = test.asType(type.changeReturnType(boolean.class)); MethodHandle guard = MethodHandles.guardWithTest( test, mh, callSite.getTarget()); callSite.setTarget(guard); return mh.invokeWithArguments(args); }
  27. 28 Resulting inlining tree Method handle calls are inlined !

    263 1 n j.l.Object::getClass (0 bytes) 264 2 b IndyTest::test (7 bytes) @ 1 j.l.i.MethodHandle::invokeExact inline (hot) @ 3 j.l.i.MethodHandle::invokeExact inline (hot) @ 3 RT::classCheck inline (hot) @ 1 j.l.Object::getClass (intrinsic) @ 10 j.l.i.MethodHandle::invokeExact inline (hot) @ 21 j.l.i.MethodHandle::invokeExact inline (hot)
  28. 29 In assembler 0x...d26c: mov 0x8(%rsi),%r11d ; implicit exception 0x...d270:

    mov 0x78(%r12,%r11,8),%rbp ; *invokevirtual getClass 0x...d275: mov $0x7d6bb4228,%r10 ; {oop(a 'j/l/Class' = 'A')} 0x...d27f: cmp %r10,%rbp 0x...d282: jne 0x00007f594911d2c9 ; *if_acmpne ;; B3: # B6 B4 <- B2 Freq: 0.999999 0x...d284: mov 0x40(%r12,%r11,8),%r10 0x...d289: mov $0x77f280d58,%r11 ; {oop('A')} 0x...d293: cmp %r11,%r10 0x...d296: jne 0x00007f594911d2b4 ; *checkcast ;; B4: # B8 B5 <- B3 Freq: 0.999997 0x...d298: nop 0x...d299: mov $0xffffffffffffffff,%rax ; {oop(NULL)} 0x...d2a3: callq 0x00007f59490cdf60 ; OopMap{off=72} ;*invokevirtual m Not neede ! Direct call but no inline r12 : base heap with compressed oops
  29. 30 Conclusion You can ask the VM to print the

    inlining tree or the assembler invokedynamic allows anyone to generate its own optimization code that will be inlined by the VM Implementation of invokedynamic is still young, performance will improve