Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gerrit Grunwald on What the CRaC... SUPERFAST JVM STARTUP

Gerrit Grunwald on What the CRaC... SUPERFAST JVM STARTUP

Gerrit Grunwald's presentation at the eJUG Event 28.9.2023

More Decks by Enterprise Java User Group Austria

Other Decks in Technology

Transcript

  1. EXECUTION ENGINE Interpreter C1 JIT Compiler (client) C2 JIT Compiler

    (server) 􀊫 Pro fi ler 􀈒 Garbage Collector
  2. EXECUTION ENGINE Interpreter C1 JIT Compiler (client) C2 JIT Compiler

    (server) 􀊫 Pro fi ler 􀈒 Garbage Collector Tiered compilia ti on DEFAULT SINCE JDK 8
  3. INTERPRETER Converts ByteCode into instruction set of CPU Detects hot

    spots by counting method calls and loop back edges JVM THRESHOLD REACHED
  4. Pass the "hot" code to C1 JIT Compiler JVM C1

    JIT COMPILER Compiles code as quickly as possible with low optimisation
  5. C1 JIT COMPILER Compiles code as quickly as possible with

    low optimisation Pro fi les the running code (detecting hot code) JVM THRESHOLD REACHED
  6. Pass the "hot" code to C2 JIT Compiler JVM C2

    JIT COMPILER Compiles code with best optimisation possible (slower)
  7. CO MPILING C1 PROFILING INTERPRETA TIO N EXECUTION CYCLE Fast

    compile, low optimisation (Execution Level 3) Finding "hot spots" Slow (Execution Level 0)
  8. PROFILIN G CO MPILING C1 PROFILING INTERPRETA TIO N EXECUTION

    CYCLE Finding "hot code" Fast compile, low optimisation (Execution Level 3) Finding "hot spots" Slow (Execution Level 0)
  9. COMPILING C2 PROFILIN G CO MPILING C1 PROFILING INTERPRETA TIO

    N EXECUTION CYCLE Finding "hot code" Fast compile, low optimisation (Execution Level 3) Finding "hot spots" Slow (Execution Level 0) Slower compile, high optimisation (Execution Level 4)
  10. D EO PTIMISATION COMPILING C2 PROFILIN G CO MPILING C1

    PROFILING INTERPRETA TIO N EXECUTION CYCLE Can happen (performance hit) Slower compile, high optimisation (Execution Level 4) Finding "hot code" Fast compile, low optimisation (Execution Level 3) Finding "hot spots" Slow (Execution Level 0)
  11. DEOPTIMISATION e.g. BRANCH ANALYSIS value > 9 bias = compute(value)

    bias = 1 Math.log10(bias + 99) TRUE FALSE int computeMagnitude(int value) { int bias; if (value > 9) { bias = compute(value); } else { bias = 1: } return Math.log10(bias + 99); }
  12. DEOPTIMISATION e.g. BRANCH ANALYSIS value > 9 bias = compute(value)

    bias = 1 Math.log10(bias + 99) TRUE FALSE int computeMagnitude(int value) { int bias; if (value > 9) { bias = compute(value); } else { bias = 1: } return Math.log10(bias + 99); } Value was never greater than 9
  13. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { if (value

    > 9) { uncommonTrap(); } int bias = 1; return Math.log10(bias + 99); } value > 9 deoptimise Math.log10(1 + 99) TRUE FALSE
  14. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { if (value

    > 9) { uncommonTrap(); } int bias = 1; return Math.log10(bias + 99); } value > 9 deoptimise Math.log10(1 + 99) TRUE FALSE
  15. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { if (value

    > 9) { uncommonTrap(); } return Math.log10(100); } value > 9 deoptimise Math.log10(100) TRUE FALSE
  16. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { if (value

    > 9) { uncommonTrap(); } return 2; } value > 9 deoptimise return 2 TRUE FALSE
  17. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { if (value

    > 9) { uncommonTrap(); } return 2; } value > 9 deoptimise return 2 TRUE FALSE
  18. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { if (value

    > 9) { uncommonTrap(); } return 2; } value > 9 deoptimise return 2 TRUE FALSE
  19. DEOPTIMISATION e.g. BRANCH ANALYSIS int computeMagnitude(int value) { int bias;

    if (value > 9) { bias = compute(value); } else { bias = 1: } return Math.log10(bias + 99); } INTERPRETER C1 C2 value > 9 bias = compute(value) bias = 1 Math.log10(bias + 99) TRUE FALSE
  20. JVM STARTUP JVM Load & Initialize Optimization JVM Load application

    classes Initialize all resources Kick off application speci fi c logic Optimization FAST TAKES A BIT JVM START APPLICATION START
  21. JVM STARTUP JVM Load & Initialize Optimization JVM Load application

    classes Initialize all resources Kick off application speci fi c logic Optimization FAST TAKES A BIT Generally referred to as JVM Startup (Time to fi rst response) JVM START APPLICATION START
  22. JVM STARTUP JVM Load & Initialize Optimization JVM Load application

    classes Initialize all resources Kick off application speci fi c logic Optimization JVM Optimizing (Compile/Decompile) FAST TAKES A BIT TAKES SOME TIME Generally referred to as JVM Startup (Time to fi rst response) App Apply application speci fi c workloads JVM START APPLICATION START APPLICATION WARMUP
  23. JVM STARTUP JVM Load & Initialize Optimization JVM Load application

    classes Initialize all resources Kick off application speci fi c logic Optimization JVM Optimizing (Compile/Decompile) FAST TAKES A BIT TAKES SOME TIME Generally referred to as JVM Startup (Time to fi rst response) Generally referred to as JVM Warmup (Time to n operations) App Apply application speci fi c workloads JVM START APPLICATION START APPLICATION WARMUP
  24. MICROSERVICE ENVIRONMENT FIRST RUN JVM STARTUP Performance SECOND RUN JVM

    STARTUP Performance THIRD RUN JVM STARTUP Performance
  25. WOULDN'T IT BE GREAT...? FIRST RUN JVM STARTUP Performance SECOND

    RUN NO STARTUP OVERHEAD Performance THIRD RUN NO STARTUP OVERHEAD Performance
  26. WHAT ABOUT CDS? Dump internal class representations into fi le

    Shared on each JVM start (CDS) No optimization or hotspot detection
  27. WHAT ABOUT CDS? Dump internal class representations into fi le

    Shared on each JVM start (CDS) No optimization or hotspot detection Only reduces class loading time
  28. WHAT ABOUT CDS? Dump internal class representations into fi le

    Shared on each JVM start (CDS) No optimization or hotspot detection Only reduces class loading time Startup up to 2 seconds faster
  29. WHAT ABOUT CDS? Dump internal class representations into fi le

    Shared on each JVM start (CDS) No optimization or hotspot detection Only reduces class loading time Startup up to 2 seconds faster Good info from Ionut Balosin
  30. WHY NOT USE AOT? No interpreting bytecodes No analysis of

    hotspots No runtime compilation of code
  31. WHY NOT USE AOT? No interpreting bytecodes No analysis of

    hotspots No runtime compilation of code Start at 'full speed', straight away
  32. WHY NOT USE AOT? No interpreting bytecodes No analysis of

    hotspots No runtime compilation of code Start at 'full speed', straight away GraalVM native image does that PROBLEM SOLVED...?
  33. NOT SO FAST... AOT is, by de fi nition, static

    Code is compiled before it is run
  34. NOT SO FAST... AOT is, by de fi nition, static

    Code is compiled before it is run Compiler has no knowledge of how the code will actually run
  35. NOT SO FAST... AOT is, by de fi nition, static

    Code is compiled before it is run Compiler has no knowledge of how the code will actually run Pro fi le Guided Optimisation (PGO) can partially help
  36. JVM PERFORMANCE GRAPH AOT Compiled Code AOT Compiled Code with

    Profile Guided Optimisation (not in GraalVM Community) Needs to run once for pro fi ling Performance
  37. AOT VS JIT Limited use of method inlining No runtime

    bytecode generation Re fl ection is possible but complicated Unable to use speculative optimisations Must be compiled for least common denominator Overall performance will typically be lower Deployed env != Development env. 'Full speed' from the start No overhead to compile code at runtime Small memory footprint Can use aggressive method inlining at runtime Can use runtime bytecode generation Re fl ection is simple Can use speculative optimisations Can even optimise for Haswell, Skylake, Ice Lake etc. Overall performance will typically be higher Deployed env. == Development env. Requires more time to start up (but will be faster) Overhead to compile code at runtime Larger memory footprint AOT JIT
  38. JIT DISADVANTAGES Requires more time to start up (requires many

    slow operations to happen before optimisation and faster execution can happen)
  39. JIT DISADVANTAGES Requires more time to start up (requires many

    slow operations to happen before optimisation and faster execution can happen) CPU overhead to compile code at runtime
  40. JIT DISADVANTAGES Requires more time to start up (requires many

    slow operations to happen before optimisation and faster execution can happen) CPU overhead to compile code at runtime Larger memory footprint
  41. Linux project Part of kernel >= 3.11 (2013) Freeze a

    running container/application CRIU
  42. Linux project Part of kernel >= 3.11 (2013) Freeze a

    running container/application Checkpoint its state to disk CRIU
  43. Linux project Part of kernel >= 3.11 (2013) Freeze a

    running container/application Checkpoint its state to disk Restore the container/application from the saved data. CRIU
  44. Linux project Part of kernel >= 3.11 (2013) Freeze a

    running container/application Checkpoint its state to disk Restore the container/application from the saved data. Used by/integrated in OpenVZ, LXC/LXD, Docker, Podman and others CRIU
  45. Heavily relies on /proc fi le system It can checkpoint:

    Processes and threads Application memory, memory mapped fi les and shared memory Open fi les, pipes and FIFOs Sockets Interprocess communication channels Timers and signals CRIU
  46. Heavily relies on /proc fi le system It can checkpoint:

    Processes and threads Application memory, memory mapped fi les and shared memory Open fi les, pipes and FIFOs Sockets Interprocess communication channels Timers and signals Can rebuild TCP connection from one side only CRIU
  47. Restart from saved state on another machine (open fi les,

    shared memory etc.) CRIU CHALLENGES
  48. Restart from saved state on another machine (open fi les,

    shared memory etc.) Start multiple instances of same state on same machine (PID will be restored which will lead to problems) CRIU CHALLENGES
  49. Restart from saved state on another machine (open fi les,

    shared memory etc.) Start multiple instances of same state on same machine (PID will be restored which will lead to problems) A Java Virtual Machine would assume it was continuing its tasks (very dif fi cult to use effectively, e.g. running applications might have open fi les etc.) CRIU CHALLENGES
  50. RUNNING APPLICATION Aware of checkpoint being created RUNNING APPLICATION Aware

    of restore happening CRaC A way to solve the problems when checkpointing a JVM (e.g. no open fi les, sockets etc.)
  51. CRaC Comes with a simple API Creates checkpoints using code

    or jcmd Throws CheckpointException (in case of open fi les/sockets)
  52. CRaC Comes with a simple API Creates checkpoints using code

    or jcmd Throws CheckpointException (in case of open fi les/sockets) Heap is cleaned, compacted (using JVM safepoint mechanism -> JVM is in a safe state)
  53. <<interface>> Resource beforeCheckpoint() afterRestore() Resource interface (can be noti fi

    ed about a Checkpoint and Restore) Classes in application code implement the Resource interface CRaC API
  54. <<interface>> Resource beforeCheckpoint() afterRestore() Resource interface (can be noti fi

    ed about a Checkpoint and Restore) Classes in application code implement the Resource interface Application receives callbacks during checkpointing and restoring CRaC API
  55. <<interface>> Resource beforeCheckpoint() afterRestore() Resource interface (can be noti fi

    ed about a Checkpoint and Restore) Classes in application code implement the Resource interface Application receives callbacks during checkpointing and restoring Makes it possible to close/restore resources (e.g. open fi les, sockets) CRaC API
  56. Resource objects need to be registered with a Context so

    that they can receive noti fi cations CRaC API
  57. Resource objects need to be registered with a Context so

    that they can receive noti fi cations There is a global Context accessible via the static method Core.getGlobalContext() CRaC API
  58. Start your app with -XX:+PrintCompilation Apply typical workload to your

    app Observe the moment the compilations are ramped down WHEN TO CHECKPOINT ?
  59. Start your app with -XX:+PrintCompilation Apply typical workload to your

    app Observe the moment the compilations are ramped down Create the checkpoint WHEN TO CHECKPOINT ?
  60. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() Register resources in global context
  61. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() Warmup the application 􀣔
  62. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() JVM noti fi es the resources
  63. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() Application closes open resources
  64. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() JVM stores checkpoint to disc 􀤄
  65. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() Restore from checkpoint java -XX:CRaCRestoreFrom
  66. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() JVM noti fi es the resources
  67. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() Application re-open resources
  68. CRaC OVERVIEW JVM APPLICATION RESOURCE 1 RESOURCE 2 beforeCheckpoint() afterRestore()

    beforeCheckpoint() afterRestore() No JVM startup and no application warmup !!!
  69. Run app in a docker container Create checkpoint (store in

    container or external volume) TYPICAL USAGE...
  70. Run app in a docker container Create checkpoint (store in

    container or external volume) Commit the state of container (only if checkpoint in container) TYPICAL USAGE...
  71. Run app in a docker container Create checkpoint (store in

    container or external volume) Commit the state of container (only if checkpoint in container) Start the container (point jvm to container or external volume) TYPICAL USAGE...
  72. Designed to provide smooth CRaC adoption Total mirror of jdk.crac

    api at compile-time Can be used with any OpenJDK implementation ORG.CRAC
  73. Designed to provide smooth CRaC adoption Total mirror of jdk.crac

    api at compile-time Can be used with any OpenJDK implementation Detects CRaC implementation at runtime ORG.CRAC
  74. Designed to provide smooth CRaC adoption Total mirror of jdk.crac

    api at compile-time Can be used with any OpenJDK implementation Detects CRaC implementation at runtime No CRaC support -> won't call CRaC speci fi c code ORG.CRAC
  75. Designed to provide smooth CRaC adoption Total mirror of jdk.crac

    api at compile-time Can be used with any OpenJDK implementation Detects CRaC implementation at runtime No CRaC support -> won't call CRaC speci fi c code CRaC support -> will forward all CRaC speci fi c calls to jdk.crac ORG.CRAC
  76. Upgrade (Haswell -> restore: Ice Lake, no problem) Downgrade (Ice

    Lake -> restore: Haswell, problematic) COMPATIBILITY...
  77. Upgrade (Haswell -> restore: Ice Lake, no problem) Downgrade (Ice

    Lake -> restore: Haswell, problematic) Solved in CRaC by speci fi c fl ag (little drop in performance) COMPATIBILITY...
  78. Upgrade (Haswell -> restore: Ice Lake, no problem) Downgrade (Ice

    Lake -> restore: Haswell, problematic) Solved in CRaC by speci fi c fl ag (little drop in performance) Node groups stick to same cpu architecture COMPATIBILITY...
  79. Upgrade (Haswell -> restore: Ice Lake, no problem) Downgrade (Ice

    Lake -> restore: Haswell, problematic) Solved in CRaC by speci fi c fl ag (little drop in performance) Node groups stick to same cpu architecture Virtualized Linux environments work on all OS's (as long as cpu architecture is x64/aarch64) COMPATIBILITY...
  80. DEMO Name service Load 258 000 names from json fi

    le at startup Return 5 random names for girls
  81. DEMO Name service Load 258 000 names from json fi

    le at startup Return 5 random names for girls Return 5 random names for boys
  82. public class Main implements Resource { public Main() { System.out.println("Start

    without CRaC"); Core.getGlobalContext().register(Main.this); init(); printRandomGirlNames(5); printRandomBoyNames(5); System.out.println("Time to first response: " + ((System.nanoTime() - startTime) / MILLISECOND_IN_NS) + "ms"); } private void init() { allNames = loadNames(); } @Override public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {} @Override public void afterRestore(Context<? extends Resource> context) throws Exception {} DEMO CALLED AT FIRST STARTUP
  83. public class Main implements Resource { public Main() {} private

    void init() {} @Override public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {} @Override public void afterRestore(Context<? extends Resource> context) throws Exception { System.out.println("Start using CRaC"); startTime = System.nanoTime(); printRandomGirlNames(5); printRandomBoyNames(5); System.out.println("Time to first response: " + ((System.nanoTime() - startTime) / MILLISECOND_IN_NS) + "ms"); } DEMO CALLED AFTER RESTORE
  84. >docker run -it --rm --name crac8 hansolo/crac8 java -jar /opt/

    app/crac8-17.0.0.jar >docker run -it --privileged --rm --name crac8 hansolo/ crac8:checkpoint java -XX:CRaCRestoreFrom=/opt/crac-files DEMO SHELL 1 SHELL 2 NORMAL START AFTER RESTORE
  85. >docker run -it --rm --name crac8 hansolo/crac8 java -jar /opt/

    app/crac8-17.0.0.jar DEMO SHELL 1 SHELL 2 >docker run -it --privileged --rm --name crac8 hansolo/ crac8:checkpoint java -XX:CRaCRestoreFrom=/opt/crac-files Folder where the checkpoint will be stored NORMAL START AFTER RESTORE
  86. DEMO SHELL 1 SHELL 2 >docker run -it --rm --name

    crac6 hansolo/crac6 java -jar /opt/ app/crac6-17.0.0.jar JVM startup time -> 20ms Start without CRaC Loading 258000 names took 1292ms 5 random names for girls: Colleen Dedra Elisabeth Frankie Samantha 5 random names for boys: Clayton Cliff Hollis Johnnie Winfield Time to frist response: 1360ms >docker run -it --privileged --rm --name hansolo/crac6:checkpoint java -XX:CRaCRestoreFrom=/opt/crac-files Start using CRaC 5 random names for girls: Adelaide Angelina Christa Kathleen Rebecca 5 random names for boys: Antione Burl Jerel Trenton Wyatt Time to frist response: 48ms NORMAL START AFTER RESTORE
  87. Time to fi rst opera ti on Spring-Boot Micronaut Quarkus

    xml-transform [ms] 0 1250 2500 3750 5000 4,352 980 1,001 3,898 OpenJDK
  88. Time to fi rst opera ti on Spring-Boot Micronaut Quarkus

    xml-transform [ms] 0 1250 2500 3750 5000 53 33 46 38 4,352 980 1,001 3,898 OpenJDK OpenJDK on CRaC
  89. CRaC is a way to pause and restore a JVM

    based application It doesn't require a closed world as with a native image SUMMARY...
  90. CRaC is a way to pause and restore a JVM

    based application It doesn't require a closed world as with a native image Extremely fast time to full performance level SUMMARY...
  91. CRaC is a way to pause and restore a JVM

    based application It doesn't require a closed world as with a native image Extremely fast time to full performance level No need for hotspot identi fi cation, method compiles, recompiles and deoptimisations SUMMARY...
  92. CRaC is a way to pause and restore a JVM

    based application It doesn't require a closed world as with a native image Extremely fast time to full performance level No need for hotspot identi fi cation, method compiles, recompiles and deoptimisations Improved throughput from start SUMMARY...
  93. CRaC is a way to pause and restore a JVM

    based application It doesn't require a closed world as with a native image Extremely fast time to full performance level No need for hotspot identi fi cation, method compiles, recompiles and deoptimisations Improved throughput from start CRaC is an OpenJDK project SUMMARY...
  94. CRaC is a way to pause and restore a JVM

    based application It doesn't require a closed world as with a native image Extremely fast time to full performance level No need for hotspot identi fi cation, method compiles, recompiles and deoptimisations Improved throughput from start CRaC is an OpenJDK project CRaC can save infrastructure cost SUMMARY...
  95. CPU Utilization 0 % 25 % 50 % 75 %

    100 % Time INFRASTRUCTURE COST Checkpoint JVM startup time Interpretation + Compilation Overhead Start after restore Eliminates startup time Eliminates cpu overhead
  96. No pre-build/early access Getting updates (17.0.9, 17.0.10, 17.0.11...) Commercial support

    (Patches, Hotline, etc.) Support for x64 and aarch64 SUPPORTED VERSION...
  97. No pre-build/early access Getting updates (17.0.9, 17.0.10, 17.0.11...) Commercial support

    (Patches, Hotline, etc.) Support for x64 and aarch64 Possibility to get JDK8 or JDK11 with CRaC (If high demand) SUPPORTED VERSION...