Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What Lies Beneath

What Lies Beneath

What really happens when your Java program runs? After the transformation from Java source through bytecode and machine code to microcode, and the various optimizations that take place along the way, the instructions that are actually executed may be very different from what you imagined when you wrote the program. This session shows you tools and techniques for tracing that path. You’ll see what a simple program actually looks like when it really hits the hardware!

Level:Intermediate

mauricen

May 28, 2019
Tweet

More Decks by mauricen

Other Decks in Technology

Transcript

  1. What Lies Beneath • What, another ”Hello, World” talk? •

    Yes! We have a tiny class Computer. We’ll see what happens when we type
 • It’s amazingly complicated — and interesting! java Computer.java
  2. Java to Bytecode private int add(int value) { return value

    + 254; }
 public int compute(int value) { return add(value / 0xdeadbeef); }
 public static void main(String[] args) { System.out.println(new Computer().compute(0xcafebabe)); } Result: 0xff 1 value / 0xdeadbeef
  3. private int add(int value) { return value + 254; }


    public int compute(int value) { return add(value / 0xdeadbeef); }
 public static void main(String[] args) { System.out.println(new Computer().compute(0xcafebabe)); } Java to Bytecode javac Bytecode javap -verbose Constant pool: #1 = ... #2 = Integer -559038737 // 0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I }
  4. Constant pool: #1 = ... #2 = Integer -559038737 //

    0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I } Java to Bytecode public int compute(int value) { return add(value / 0xdeadbeef); } Constant pool: LocalVariableTable: Code:
  5. Constant pool: #1 = ... #2 = Integer -559038737 //

    0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I } Executing Bytecode public int compute(int value) { return add(value / 0xdeadbeef); } Operand Stack this
  6. Constant pool: #1 = ... #2 = Integer -559038737 //

    0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I } Executing Bytecode public int compute(int value) { return add(value / 0xdeadbeef); } Operand Stack this 0xcafebabe
  7. Constant pool: #1 = ... #2 = Integer -559038737 //

    0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I } Executing Bytecode public int compute(int value) { return add(value / 0xdeadbeef); } Operand Stack 0xdeadbeef this 0xcafebabe
  8. Constant pool: #1 = ... #2 = Integer -559038737 //

    0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I } Executing Bytecode public int compute(int value) { return add(value / 0xdeadbeef); } Operand Stack this 0xcafebabe 0xdeadbeef 1
  9. Constant pool: #1 = ... #2 = Integer -559038737 //

    0xdeadbeef #3 = Methodref #5.#30 // wlb/Computer.add:(I)I { public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int -559038737 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn LocalVariableTable: Start Length Slot Name Signature 0 9 0 this Lwlb/Computer; 0 9 1 value I } Executing Bytecode public int compute(int value) { return add(value / 0xdeadbeef); } Operand Stack this 1 255
  10. Java to Bytecode private int add250(int value) { return value

    + 250; // FF: 250 + 5 } 
 public int caffeinate(int value) { return 0xCAFEBABE * add3(value); } 
 public static void main(String[] args) { WhatLiesBeneath wlb = new WhatLiesBeneath(); System.out.println(new Adder().doubleAdd3(5)); } public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc #2 // int 0xdeadbeef 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn javac idiv
  11. public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc

    #2 // int 0xdeadbeef 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn idiv 108 idiv [0x00007f8f8f72ecc0, 0x00007f8f8f72ed00] 64 bytes 0x00007f8f8f72ecc0: mov (%rsp),%eax 0x00007f8f8f72ecc3: add $0x8,%rsp 0x00007f8f8f72ecc7: mov %eax,%ecx 0x00007f8f8f72ecc9: mov (%rsp),%eax 0x00007f8f8f72eccc: add $0x8,%rsp 0x00007f8f8f72ecd0: cmp $0x80000000,%eax 0x00007f8f8f72ecd6: jne 0x00007f8f8f72ece7 0x00007f8f8f72ecdc: xor %edx,%edx 0x00007f8f8f72ecde: cmp $0xffffffff,%ecx 0x00007f8f8f72ece1: je 0x00007f8f8f72ecea 0x00007f8f8f72ece7: cltd 0x00007f8f8f72ece8: idiv %ecx 0x00007f8f8f72ecea: movzbl 0x1(%r13),%ebx 0x00007f8f8f72ecef: inc %r13 0x00007f8f8f72ecf2: movabs $0x7f8fadbe4d80,%r10 0x00007f8f8f72ecfc: jmpq *(%r10,%rbx,8) java … -XX:+PrintInterpreter dividend ➞ eax divisor ➞ ecx template code
  12. public int compute(int); Code: 0: aload_0 1: iload_1 2: ldc

    #2 // int 0xdeadbeef 4: idiv 5: invokespecial #3 // Method add:(I)I 8: ireturn idiv 108 idiv [0x00007f8f8f72ecc0, 0x00007f8f8f72ed00] 64 bytes 0x00007f8f8f72ecc0: mov (%rsp),%eax 0x00007f8f8f72ecc3: add $0x8,%rsp 0x00007f8f8f72ecc7: mov %eax,%ecx 0x00007f8f8f72ecc9: mov (%rsp),%eax 0x00007f8f8f72eccc: add $0x8,%rsp 0x00007f8f8f72ecd0: cmp $0x80000000,%eax 0x00007f8f8f72ecd6: jne 0x00007f8f8f72ece7 0x00007f8f8f72ecdc: xor %edx,%edx 0x00007f8f8f72ecde: cmp $0xffffffff,%ecx 0x00007f8f8f72ece1: je 0x00007f8f8f72ecea 0x00007f8f8f72ece7: cltd 0x00007f8f8f72ece8: idiv %ecx 0x00007f8f8f72ecea: movzbl 0x1(%r13),%ebx 0x00007f8f8f72ecef: inc %r13 0x00007f8f8f72ecf2: movabs $0x7f8fadbe4d80,%r10 0x00007f8f8f72ecfc: jmpq *(%r10,%rbx,8) java … -XX:+PrintInterpreter dividend == MIN_VALUE edx = 0 N Y divisor == -1 divide eax by ecx N Y
  13. Improving Our Program…? private int add(int value) { return value

    + 254; }
 public int compute(int value) { return add(value / 0xdeadbeef); }
 public static void main(String[] args) { System.out.println(new Computer().compute(0xcafebabe)); }
  14. private int add(int value) { return value + 254; }


    public int compute(int value) { return (value / 0xdeadbeef) + 254; }
 public static void main(String[] args) { System.out.println(new Computer().compute(0xcafebabe)); } Improving Our Program…?
  15. Let’s Benchmark! @Fork(3) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ComputerBenchmark {

    private Computer computer = new Computer(); @CompilerControl(CompilerControl.Mode.DONT_INLINE) @Benchmark public int compute() { return computer.compute(0xcafebabe); } }
  16. Manual Optimisation Benchmark Mode Cnt Score Error Units ComputerBenchmark.compute (-Xint)

    avgt 5 178.124 ± 8.136 ns/op ComputerBenchmark.compute (-Xint, improved) avgt 5 139.749 ± 9.503 ns/op
  17. Without Inlining (-XX:-Inline) ComputerBenchmark::compute ... mov $0xcafebabe,%edx callq 0x00007f0568fb3c60 ;*invokevirtual

    Computer.compute ... Computer::compute ... imul $0x7aeca299,%r10,%r10 sar $0x3c,%r10 mov %r10d,%r10d sub %r10d,%edx ;*idiv callq 0x00007f0568fb40e0 ;*invokevirtual Computer.add ... Computer::add ... add $0xfe,%edx ...
  18. JIT Optimisation Benchmark Mode Cnt Score Error Units ComputerBenchmark.compute (-Xint)

    avgt 5 178.124 ± 8.136 ns/op ComputerBenchmark.compute (-Xint, modified) avgt 5 139.749 ± 9.503 ns/op ComputerBenchmark.compute (-XX:-Inline) avgt 5 6.697 ± 2.253 ns/op ComputerBenchmark.compute avgt 5 3.571 ± 0.106 ns/op
  19. Processor L1 data: 64-byte cache lines 4K pages Translation Buffer

    (TLB) Page index Page address Virtual address (page index + offset) Physical address Data-Dependent Loads Prefetching doesn’t work with data-dependent loads!
  20. Measuring Cache-Friendliness @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(1024) public void linkedList() { if

    (iterator == null || !iterator.hasNext()) { iterator = linkedList.iterator(); } for (int i = 0; i < 1024 && iterator.hasNext(); i++) { sink(iterator.next()); } } @Benchmark @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(1024) public void primitiveArray() { if (index == intArray.length) { index = 0; } for (int i = 0; i < 1024 && index < intArray.length; i++) { sink(intArray[index++]); } }
  21. list length (K) 1 7 63 511 performance (ns/op) 7.25

    9.03 20.87 29.07 CPI (clockticks/instrn) 0.32 0.41 0.93 1.33 events/operation cycles 17.97 22.77 51.32 72.66 instructions 56.08 55.88 54.96 54.49 L1-dcache-load-misses 1.18 1.83 1.87 2.65 L1-dcache-loads 18.94 19.39 18.88 18.22 L1-dcache-stores 12.00 12.18 11.99 11.15 LLC-load-misses 0 0 0.41 1.31 LLC-loads 0 0.72 1.33 1.56 LinkedList
  22. Primitive Array list length (K) 1 7 63 511 performance

    (ns/op) 3.62 3.65 3.65 3.66 CPI 0.30 0.30 0.30 0.31 events/operation cycles 9.09 9.16 9.10 9.13 instructions 30.24 30.13 29.94 29.85 L1-dcache-load-misses 0.00 0.01 0.06 0.06 L1-dcache-loads 12.00 12.00 11.97 12.14 L1-dcache-stores 6.00 6.02 6.02 6.04 LLC-load-misses 0.00 0.00 0.00 0.00 LLC-loads 0.00 0.00 0.00 0.00
  23. Conclusions From javac soup to hardware nuts, there’s a lot

    of advanced technology here. And as everyone knows, to the ignorant – “Any sufficiently advanced technology is indistinguishable from magic.” Don’t believe in magic!