Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JUGNsk Meetup #5. Владимир Воскресенский: "Медленная Java? Проблемы производительности, которые не списать на GС".

jugnsk
December 15, 2018

JUGNsk Meetup #5. Владимир Воскресенский: "Медленная Java? Проблемы производительности, которые не списать на GС".

Azul Zing JVM известна благодаря уникальному C4 (Continuously Concurrent Compacting Collector), в котором изначально решена проблема остановок Java приложений для сборки мусора. На этой сессии мы рассмотрим проблемы производительности Java приложений, которые нельзя списать на Garbage Collection. Обсудим как некоторые из них пытаются решить различные AOT (ahead-of-time) технологии. И в частности заглянем под капот технологии ReadyNow - реализации Profile Guided "AOT" для Zing JVM.

jugnsk

December 15, 2018
Tweet

More Decks by jugnsk

Other Decks in Programming

Transcript

  1. ABOUT AZUL ZING Derived from HotSpot Known for C4 “Pauseless”

    GC ReadyNow! Falcon – LLVM based JIT Compile Stashing (JIT cache) R
  2. A practical real-world example: Improve Digital (Video Advertising) • Cassandra

    cluster running on 6x AWS i3.2xlarge ◦ Approx. 80/20 write/read split ◦ Data read and written with quorum consistency ◦ 6 client machines sending requests collocated in the same AZs • Service Level Agreement (SLA) for read operations: ◦ 20 ms at 99.9% ◦ 50ms at 99.99% ◦ 100ms at 99.998% (not a typo, last 9 hard to maintain on AWS)
  3. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000
  4. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000 31'536'000 – 600 = 31'535'400
  5. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000 31'536'000 – 600 ÷ 365 = 86'398.3562
  6. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000 31'536'000 – 600 ÷ 365 ÷ 24 = 3'599.93151
  7. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000 31'536'000 – 600 ÷ 365 ÷ 24 ÷ 60 = 59.9988585
  8. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000 31'536'000 – 600 ÷ 365 ÷ 24 ÷ 60 ÷ 60 = 0.999980975
  9. 1 tick/sec x 60(per/min) x 60 (per/hour) x 24 (per/day)

    x 365 = 31'536'000 0.99998 ~= 1.00000 31'536'000 – 600 ÷ 365 ÷ 24 ÷ 60 ÷ 60 = 0.999980975 99.998%
  10. • Hardware ◦ Cache misses ◦ False sharing ◦ Volatile

    CPU Frequency • Virtualization • Guest V-Resources vs Host • V-CPU Throttling • OS ◦ Context switch ◦ Network ◦ I/O ◦ System Malloc contention SLOW? IN CLOUD?
  11. • Hardware ◦ Cache misses ◦ False sharing ◦ Volatile

    CPU Frequency • Virtualization • Guest V-Resources vs Host • V-CPU Throttling • OS ◦ Context switch ◦ Network ◦ I/O ◦ System Malloc contention SLOW? IN CLOUD? ◦ Garbage Collection ◦ Java Runtime ◦ Just-in-Time Compiler
  12. JHICCUP ◦ How Java Go Hiccups ◦ https://www.azul.com/giltene-how-java-got-the-hiccups ◦ Welcome

    to jHiccup ◦ https://www.azul.com/jhiccup ◦ OpenSource ◦ https://github.com/giltene/jHiccup ◦ -javaagent:/PATH/jHiccup.jar=-d,5000,-i,1000,-s,3,-l,hiccuplog%date.%pid.hlog • -XX:+PrintGCDetails -XX:+PrintGCDateStamps • -XX:+PrintGCApplicationStoppedTime • -XX:+PrintGCApplicationConcurrentTime • -Xloggc:gclog.log
  13. A practical real-world example: Improve Digital (Video Advertising) • Cassandra

    cluster running on 6x AWS i3.2xlarge ◦ Approx. 80/20 write/read split ◦ Data read and written with quorum consistency ◦ 6 client machines sending requests collocated in the same AZs • SLA requirements for read operations: ◦ 20 ms at 99.9% ◦ 50ms at 99.99% ◦ 100ms at 99.998% (not a typo, last 9 hard to maintain on AWS) • HotSpot+G1: can maintain ~4K TPS before SLA breach • Zing: can maintain ~21K TPS before SLA breach • 5x - Q.E.D.
  14. START WARMED-UP 0 400 800 1200 1600 w arm -

    up 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ops/min XML validation 5 mins
  15. java -XX:+PrintCompilation 01 PRINT COMPILATION 61 1 3 java.lang.String::hashCode (55

    bytes) 64 2 3 java.lang.String::charAt (29 bytes) 64 4 3 java.lang.String::indexOf (70 bytes) 64 3 3 java.lang.String::length (6 bytes) 65 5 3 java.lang.Object::<init> (1 bytes) … 68 12 3 java.lang.String::getChars (62 bytes) 69 13 1 java.lang.ref.Reference::get (5 bytes) 78 14 3 java.lang.String::indexOf (7 bytes) 78 15 1 java.lang.Object::<init> (1 bytes) 81 16 % 3 …SimpleProgram::main @ 15 (48 bytes) 81 17 3 …SimpleProgram::main (48 bytes) 81 18 % 4 …SimpleProgram::main @ 15 (48 bytes)
  16. java -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation <task compile_id='2' level='3' stamp=‘0.126’ method='java/lang/String charAt (I)C'

    bytes='29' count='1955' iicount='1955' > … <task_done success='1' stamp=‘0.127' nmsize='616' count='2269'/> </task> <task_queued compile_id=‘5’ level=‘3' stamp='0.130' method='java/lang/Object &lt;init&gt; ()V' bytes='1' count='1536' iicount='1536' comment='tiered' hot_count='1536'/> LOG COMPILATION XML
  17. 0: Interpreter 1-3: C1 4: C2 1: No overhead 2:

    Counters 3: Counters + Profiling 0 0 0 3 3 3 1 2 4 4 OPENJDK TIERED COMPILATION normal flow tier 4 busy trivial method
  18. ZING TIERING SYSTEM Interpreter C1 C2 / Falcon 0 3

    4 OpenJDK Normal Flow Zing Flow Interpreter Baseline Optimized
  19. 79 18 3 …Unreached::hotMethod (26 bytes) 83 33 4 …Unreached::hotMethod

    (26 bytes) 5089 33 4 …Unreached::hotMethod (26 bytes) made not entrant 5089 36 3 …Unreached::hotMethod (26 bytes) 5090 38 4 …Unreached::hotMethod (26 bytes)
  20. public static volatile Object thing = null; public static void

    main(final String[] args) { for ( int i = 0; i < 20_000; ++i ) { hotMethod(); } Thread.sleep(5_000); // wait for JIT thing = new Object(); for ( int i = 0; i < 20_000; ++i ) { hotMethod(); } Thread.sleep(5_000); // wait for JIT again } static final void hotMethod() { if ( thing == null ) System.out.print("-"); else System.out.print("+"); } UNREACHED/UNSTABLE 'IF'
  21. public static volatile Object thing = null; public static void

    main(final String[] args) { for ( int i = 0; i < 20_000; ++i ) { hotMethod(); <=== } Thread.sleep(5_000); // wait for JIT thing = new Object(); for ( int i = 0; i < 20_000; ++i ) { hotMethod(); } Thread.sleep(5_000); // wait for JIT again } static final void hotMethod() { if ( thing == null ) System.out.print("-"); <=== else System.out.print("+"); }
  22. static final void hotMethod() { if ( thing == null

    ) System.out.print(""); else System.out.print(""); } static final void hotMethod() { if ( thing == null ) System.out.print(""); else uncommon_trap(:unreached); } <bc code='199' bci='3'/> <branch target_bci='17' taken='0' not_taken='5800' cnt='5800' prob='never'/> <uncommon_trap bci='3' reason='unstable_if' action=‘reinterpret' comment='taken never'/>
  23. public static volatile Object thing = null; public static void

    main(final String[] args) { for ( int i = 0; i < 20_000; ++i ) { hotMethod(); } Thread.sleep(5_000); // wait for JIT thing = new Object(); for ( int i = 0; i < 20_000; ++i ) { hotMethod(); } Thread.sleep(5_000); // wait for JIT again } static final void hotMethod() { if ( thing == null ) System.out.print("-"); else System.out.print("+"); }
  24. public static volatile Object thing = null; public static void

    main(final String[] args) { for ( int i = 0; i < 20_000; ++i ) { hotMethod(); } Thread.sleep(5_000); // wait for JIT thing = new Object(); for ( int i = 0; i < 20_000; ++i ) { hotMethod(); <=== } Thread.sleep(5_000); // wait for JIT again } static final void hotMethod() { if ( thing == null ) System.out.print("-"); else System.out.print("+"); <=== } UNREACHED/UNSTABLE 'IF'
  25. static final void hotMethod() { if ( thing == null

    ) System.out.print(""); else System.out.print(""); } static final void hotMethod() { if ( thing == null ) System.out.print(""); else uncommon_trap(:unreached); } <uncommon_trap thread='7171' stamp=‘5.104’ compile_id='29' compiler='C2' level=‘4' reason='unstable_if' action='reinterpret' > <jvms method=‘…Unreached hotMethod ()V’ bci=‘3’…/> </uncommon_trap> <bc code='199' bci='3'/> <branch target_bci='17' taken='0' not_taken='5800' cnt='5800' prob='never'/> <uncommon_trap bci='3' reason='unstable_if' action=‘reinterpret' comment='taken never'/>
  26. Bail To Interpreter + Lock Interpreter C1 C2 new call

    after trap uncommon trap + bail to interpreter
  27. Bail To Interpreter + Lock Interpreter C1 C2 new call

    after trap bail to interpreter uncommon trap + bail to interpreter
  28. JIT DOESN’T KNOW Fields Methods Parent Class Interfaces Anything …

    if ( always ) { return Class1.getStatic(); } else { return Class2.getStatic(); } if ( always ) { return Class1.getStatic(); } else { uncommon_trap(:unloaded); } Give Up! UNLOADED https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/
  29. Interpreter C1 C2 / Falcon LOG COMPILATION https://wiki.openjdk.java.net/display/HotSpot/LogCompilation+overview https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/ uncommon

    trap + bail to interpreter + Initialize class <uncommon_trap thread='7171' compile_id=’24' compile_kind='osr' compiler=‘C2' level='4' reason=‘uninitialized' action=‘reinterpret' stamp='1.038'> … </uncommon_trap> BAIL TO INTERPRETER
  30. new callers Interpreter C1 C2 / Falcon LOG COMPILATION BAIL

    TO INTERPRETER OpenJDK Zing https://wiki.openjdk.java.net/display/HotSpot/LogCompilation+overview https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/ make not entrant <make_not_entrant thread='7171' compile_id=’24' compile_kind='osr' compiler=‘C2' level=‘4' stamp=‘1.038' /> + MAKE NOT ENTRANT uncommon trap + bail to interpreter + Initialize class
  31. Not all traps make not entrant, but most do 0

    75 150 225 0 60 120 180 240 300 360 420 480 540 600 traps tier 4 not entrant 660 time (seconds) count num phase change phase change MAKE NOT ENTRANT
  32. 10-100ms running slow — but not stopped Not visible in

    GC logs Visible in LogCompilation, but only hottest methods matter HARD TO IDENTIFY DISRUPTION
  33. Func apply(double): double Square apply(double): double 03 System.out.println("Using Square..."); Func

    func = new Square(); ….. for ( int i = 0; i < 20_000; ++i ) { apply1(func, i); apply2(func, i); … apply7(func, i); apply8(func, i); } Thread.sleep(5_000); System.out.printf( “Loading %s to Deoptimize Now!%n”, Sqrt.class); Thread.sleep(25_000); … void apply*(Func func, int x) { func.apply(x); }
  34. Func apply(double): double Square apply(double): double 03 func.apply(x); square.apply(x); //

    no type check Devirtualize CLASS HIERARCHY ANALYSIS (CHA) Inline x * x // no type check UNGUARDED DEVIRTUALIZATION
  35. LOG COMPILATION XML ... System.out.printf( “Loading %s to Deoptimize Now!%n”,

    Sqrt.class); … <dependency_failed stamp=‘5.226’ type=‘abstract_with_unique_concrete_subtype’ ctxk=‘…Func’ x=‘…Square’ witness=‘…Sqrt’ /> <make_not_entrant stamp=‘5.227' thread='13571' compile_id=’31’ compiler='C2' level='4'/>
  36. -XX:+PrintCompilation -XX:+PrintSafepointStatistics -XX:+PrintSafepointStatisticsCount=1 03 85 29 4 …ClassDevirtualization::apply1 (7 bytes)

    85 30 4 …ClassDevirtualization::apply2 (7 bytes) … 87 36 4 …ClassDevirtualization::apply8 (7 bytes) 87 35 4 …ClassDevirtualization::apply7 (7 bytes) 5091 35 4 …ClassDevirtualization::apply7 (7 bytes) made not entrant 5091 36 4 …ClassDevirtualization::apply8 (7 bytes) made not entrant 5091 31 4 …ClassDevirtualization::apply3 (7 bytes) made not entrant … 5091 33 4 …ClassDevirtualization::apply5 (7 bytes) made not entrant 5091 29 4 …ClassDevirtualization::apply1 (7 bytes) made not entrant 5091 32 4 …ClassDevirtualization::apply4 (7 bytes) made not entrant vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count 5.099: Deoptimize [ 9 0 0 ] [ 0 0 0 0 0 ] 0
  37. 660 time (seconds) count num 0 30 60 90 120

    0 60 120 180 240 300 360 420 480 540 600 failed dependencies tier 4 not entrant phase change phase change DEPENDENCY FAILURES
  38. Traps 163 STW Deopts 33 Not Entrant 213 Recompiled 84

    Making a Hot Method Not Entrant 10-100ms Latency Event DEOPT STATISTICS
  39. non-sampling period sampling period full-speed period C2 C1 Interpreter C1

    C2 additional sampling period stable period Trap + Bail to Interpreter 5x 10-100+x REVISED MENTAL MODEL
  40. 5 mins transaction time / latency Invidual XML validation results

    10 100 1000 0 C1 C2 / Falcon C2 / Falcon Not Entrant count
  41. When class may be initialized? 77 JVMLS for Java 8.

    Section 5.5: Initialization Initialization of a class or interface consists of executing its class or interface initialization method (§2.9). A class or interface C may be initialized only as a result of: • The execution of any one of the Java Virtual Machine instructions new, getstatic, putstatic, or invokestatic that references C (§new, §getstatic, §putstatic, §invokestatic). These instructions reference a class or interface directly or indirectly through either a field reference or a method reference ….. https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf
  42. TROUBLESOME FOR TRANSACTIONS if ( MySystem.isWarmingUp() ) { transaction.abort(); }

    else { transaction.commit(); } HOT! COLD? Trap on first real use!*?
  43. MyClass.getStatic() if ( !vm.is_init(MyClass) ) { vm.init(MyClass); } MyClass.getStatic() Slows

    down by 10+% http://joeduffyblog.com/2015/12/19/safe-native-code/ INITIALIZATION CHECKING Bloats size by 20%
  44. -30% 0 400 800 1200 1600 w arm - up

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Speculative Optimizations Enabled Disabled ≅80 Discarded Compilations * * out of 800 optimized tier compilations ops/min 5 mins START WARMED-UP XML validation
  45. INITIALLY, WE SEE JIT... transaction time / latency 5 mins

    Original Just-in-Time is Slightly Too Late
  46. EVENTUALLY, WE’LL SEE... transaction time / latency 5 mins Original

    With ReadyNow Need Something… Proactive Not Reactive
  47. EVENTUALLY, WE’LL SEE... transaction time / latency 5 mins Original

    With ReadyNow! Need Something… Proactive Not Reactive
  48. MORE LIKE TRADITIONAL PROFILE GUIDED OPTIMIZATION hpp GCC Compile cpp

    prof hpp cpp GCC 0 prof + instrumented exe instrumented exe optimized exe Execute & Profile Profile Guided Compile
  49. AOT JIT Cache READYNOW ISN’T... Compilation Log Profile Cache Add-on

    to existing compilers: C1 & C2 READYNOW IS...
  50. ZING TIERING SYSTEM Interpreter C1 C2 / Falcon 0 3

    4 OpenJDK Normal Flow Zing Flow Interpreter Baseline Optimized
  51. Concatenative transaction log UPSERT-s into a database Grows with each

    compilation Plain text - uncompressed Backwards Compatible Accidentally similar to CI Replay 10-100 MiB on disk CONCATENATIVE TRANSACTION LOG 10-50 MiB from native memory
  52. Interpreter C1 Current Profile C2 / Falcon Out In Prior

    Profile Merged Profile CI READYNOW PROFILE CACHE
  53. Foo 7500 Bar 1500 Baz 1000 Quux 7500 Bar 1500

    Foo 1000 Live Call Histogram Recorded Call Histogram Foo max(7500, 1000) Bar max(1500, 1500) Baz 1000 Merged Call Histogram Quux 7500 Prefer Live Data over Recorded Data Idempotent MERGE PROFILES
  54. JIT DOESN’T KNOW Fields Methods Parent Class Interfaces Anything …

    if ( always ) { return Class1.getStatic(); } else { return Class2.getStatic(); } if ( always ) { return Class1.getStatic(); } else { uncommon_trap(:unloaded); } Give Up! UNLOADED https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/
  55. INITIALIZE CLASS AT FIRST Static Field Access Static Method Call

    New Initialization of Child ... LAZY INITIALIZED http://cr.openjdk.java.net/~twisti/slides/JVMLS%202015%20-%20Java%20Goes%20AOT.pdf
  56. MyClass.getStatic() if ( !vm.is_init(MyClass) ) { vm.init(MyClass); } MyClass.getStatic() Slows

    down by 10+% http://joeduffyblog.com/2015/12/19/safe-native-code/ INITIALIZATION CHECKING Bloats size by 20%
  57. JIT RUNS LATE MOST CLASSES ARE ALREADY INITIALIZED MyClass.getStatic() MyClass.getStatic()

    vm.is_init(MyClass) initialized uncommon_trap(:uninitialized) uninitialized At compile time… https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/
  58. SPECULATION ALSO LEADS TO TRAPS https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/ static final void hotMethod()

    { if ( thing == null ) System.out.print("always"); else System.out.print(“never"); } static final void hotMethod() { if ( thing == null ) System.out.print(“always"); else uncommon_trap(:unstable_if); } Branch comp methodId 17 5800 0 5800 Branch comp methodId 17 5800 1 5800 1 update untaken branch trap + bail
  59. PICK LAST COMPILE / PROFILE Profile Foo::foo @ time 1

    Profile Bar::bar @ time 1 Profile Bar::bar @ time 2 Profile Foo::foo @ time 3 Profile Bar::bar @ time 3 Reuse last profile for each method Last profile contains any updates from failed speculations Produces stable compilations Compile Foo::foo Compile Bar::bar Recompile Foo::foo
  60. MyClass.getStatic() requires MyClass init Class 23 SomeClass Class 80 MyClass

    C2Compilation foo.Bar::baz { 23, 80 } { 23 } loading prerequisites* initialization prerequisites * actually linking COMPILATION PREREQUISITES
  61. AOT READYNOW ... CONSIDER Check Code is Unchanged Check Code

    is Unchanged Install Initialization Guards Wait for Initialization
  62. AOT READYNOW ... CONSIDER Check Code is Unchanged Check Code

    is Unchanged Install Initialization Guards Wait for Initialization No speculative optimizations Speculative optimizations (if profile matches)
  63. AOT JIT CACHE READYNOW ... CONSIDER Check Code is Unchanged

    Check Code is Unchanged Check Code is Unchanged Install Initialization Guards Wait for Initialization Wait for Initialization No speculative optimizations Speculative optimizations (if profile matches) Speculative optimizations (if profile matches)
  64. foo.Bar::baz Loading Initialization SomeClass MyClass MyClass notify Class Loading C2

    / Falcon Queue notify C2 / Falcon Live SomeClass Recorded SomeClass
  65. foo.Bar::baz Loading Initialization SomeClass MyClass MyClass notify notify enqueue Class

    Load/Init C2 / Falcon Merged Profile Current Profile Prior Profile Live MyClass Recorded MyClass C2 / Falcon Queue
  66. Original With ReadyNow 5 mins transaction time / latency Peak

    Perf -0% Traps -60% STW Deopts -67% Not Entrant -75% Recompiled -67% ASAP COMPILATION
  67. foo.Bar::baz Loading Initialization SomeClass MyClass MyClass eager load notify enqueue

    C2 / Falcon Queue notify C2/Falcon ... AT START- UP Eager Load Classes Eager Initialize Most Classes Resolve Constant Pool Entries Live SomeClass Recorded SomeClass
  68. 5 mins transaction time / latency Peak Perf +2% Traps

    -49% STW Deopts -67% Not Entrant -72% Recompiled -55% Original ReadyNow v1 Start-up Compilation
  69. 10 100 1000 0 count 5 mins transaction time /

    latency C1 C2 / Falcon C2 / Falcon Not Entrant start main Peak Perf +2% Traps -49% STW Deopts -67% Not Entrant -72% Recompiled -55%
  70. ENUMS … enum MyEnum { FOO, BAR, BAZ; } final

    enum MyEnum extends Enum<MyEnum> { public static final FOO = new MyEnum(“FOO”, 0); public static final BAR = new MyEnum(“BAR”, 1); public static final BAZ = new MyEnum(“BAZ”, 2); private static final MyEnum[] ENUM$VALUES = { FOO, BAR, BAZ }; private MyEnum(String name, int ordinal) { super(name, ordinal); } public static MyEnum[] values() { return ENUM$VALUES; } } …
  71. ASSERTS class Assert { void foo(int x) { assert(x >

    0); } } class Assert { static final boolean $assertionsDisabled; static { $assertionsDisabled = Assert.class.desiredAssertionStatus(); } void foo(int x) { if ( !$assertionsDisabled ) { if ( !(x > 0) ) throw new AssertionError(); } } }
  72. DON’T WAIT FOR INITIALIZATION foo.Bar::baz Loading Initialization SomeClass MyClass MyClass

    MyClass.getStatic() if (!vm.is_init(MyClass)) { vm.init(MyClass); } MyClass.getStatic() Falcon Queue
  73. Bloats size by 20+% Slows down by 10+% Slows down

    by 10+% Allows for Earlier Stable Compilation INLINE INITIALIZATION CHECKS MyClass.getStatic() if ( !vm.is_init(MyClass)) { vm.init(MyClass); } MyClass.getStatic()
  74. 5 mins transaction time / latency Peak Perf -7% Traps

    +40% STW Deopts -67% Not Entrant +169% Recompiled +456% Original Start-up Compilation Quick Start + Init Checking AOT-LIKE COMPILATION
  75. 10 100 1000 0 count 5 mins transaction time /

    latency C1 C2 / Falcon C2 / Falcon Not Entrant start main Peak Perf -7% Traps +40% STW Deopts -67% Not Entrant +169% Recompiled +456%
  76. FOLLOW-UP COMPILATION foo.Bar::baz Loading SomeClass MyClass foo.Bar::baz Loading Initialization SomeClass

    MyClass MyClass Quick Start Checklist Maximum Performance Checklist Falcon Queue
  77. KEEP ENTRANT + REPLACEMENT new callers Interpreter C1 C2 /

    Falcon C2 / Falcon Queue uncommon trap + bail to interpreter C2 / Falcon Replacement
  78. AND WITH ONE MORE ROUND OF PROFILING Original Quick Start

    2nd Gen Profile Quick Start 1st Gen Profile transaction time / latency 5 mins
  79. 60-80% Peak Performance at Start-up 1.5ms - 150ms Initial Transactions

    3rd Transaction Usually at Peak -10 to +10% Impact on Peak Performance RESULTS VARY
  80. Not an AOT Not a JIT Cache Yet “Natural” for

    OpenJDK Matching Engine Compilation Log ASAP Compilation Profile Cache READYNOW ISN’T READYNOW IS WHAT IS READYNOW?
  81. ABOUT AZUL ZING Derived from HotSpot Known for C4 “Pauseless”

    GC ReadyNow! Falcon – LLVM based JIT Compile Stashing (JIT cache) R Get Started with Zing® Feature Preview http://docs.azul.com/zing/zing-quick-start-fp.htm
  82. ReadyNow drives start-up — hits JIT cache to minimize waiting

    ReadyNow’s profile cache backs up JIT cache “EASY” TO ADD A JIT CACHE FUTURE BENEFITS OF FALCON
  83. JIT caches misses on code change, code generators Profile cache

    still hits on unchanged methods BEST OF BOTH WORLDS ADDING A JIT CACHE
  84. REFERENCES Java AOTs https://www.ibm.com/support/knowledgecenter/en/SSYKE2_7.1.0/com.ibm.java.aix.71.doc/diag/understanding/aot.html IBM J9 AOT https://www.excelsiorjet.com/tutorials/ Excelsior JET

    Java 9 JAOTC http://openjdk.java.net/jeps/295 http://cr.openjdk.java.net/~twisti/slides/JVMLS%202015%20-%20Java%20Goes%20AOT.pdf Java Goes AOT By Christian Thalinger
  85. REFERENCES JVM Logging & Inspection Tools PrintCompilation http://blog.joda.org/2011/08/printcompilation-jvm-flag.html LogCompilation By

    Stephen Colebourne https://wiki.openjdk.java.net/display/HotSpot/LogCompilation+overview From OpenJDK Wiki https://github.com/AdoptOpenJDK/jitwatch JITWatch Created by Chris Newland
  86. REFERENCES JVM Compilation & Speculative Optimization Tiered Compilation By Igor

    Veresov https://www.slideshare.net/maddocig/tiered Black Magic Method Dispatch By Aleksey Shipilev https://shipilev.net/blog/2015/black-magic-method-dispatch/ Safepoints in HotSpot JVM By Alexey Ragozin http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html https://blog.takipi.com/java-on-steroids-5-super-useful-jit-optimization-techniques/ 5 Super Useful JIT Optimization Techniques By Alex Zhinitsky