Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JCConf Taiwan 2024

JCConf Taiwan 2024

Slides for JCConf Taiwan 2024, held in Taipei.

Akihiro Nishikawa

September 27, 2024
Tweet

More Decks by Akihiro Nishikawa

Other Decks in Technology

Transcript

  1. Quick off the blocks! rapid start options for your Java

    application NISHIKAWA, Akihiro (@logico_jp) Cloud Solution Architect Microsoft
  2. Who am I? { "name": "Akihiro Nishikawa", "country": "Japan", "working-for":

    "Microsoft", "favourites": [ "JVM", "GraalVM", "Azure" ], "expertise": [ "Application integration", "Container and Serverless" ] }
  3. Survey in JCConf Taiwan 2024 (as of 13:50) Which JDK

    version are you using now? Which framework are you using mainly? 0 10 20 30 40 50 Spring Boot Struts 2 Hibernate Quarkus Ktor Armeria React # of users 0 5 10 15 20 25 30 23 or later 21 17 11 8 6 or ealier # of users
  4. Survey at JJUG night seminar in Tokyo... As of September

    12 2024 1 4 38 18 0 2 0 5 10 15 20 25 30 35 40 23 or later 22 LTS (11/17/21) 8 7 6 or earlier
  5. In the serverless and container era, short-lived applications are favoured

    over resident ones. Startup Latency Throughput Footprint Java apps running on application server △ ◎[1] ◎[1] ◦ Expectations from serverless container perspective ◎ ◎ ◎ ◎ [1] This is improved gradually over time.
  6. Startup What happens in starting Java applications? JVM • Load

    and Initialize • Generate bytecode templates JVM • Load application classes • Initialize application classes • Application specific initialization JVM • Compile/deoptimize/recompile Application • Process specific workloads JVM Startup Application Startup Application Warmup Fast Quick Long time
  7. Tiered compilation C1 (a.k.a. client compiler)  Shorter time for

    compilation  Not so highly optimized  Not so better throughput C2 (a.k.a. server compiler)  Longer time for compilation  Highly optimized  Better throughput
  8. C1 Interpreter C2 Compilation Level C1 full optimization (no profiling)

    C1 with invocation and back-edge counters C1 full profiling (level2 + MDO: MethodDataOop) 0 1 2 3 4
  9. 0 1 2 3 4 C1 Interpreter C2 Compilation Level

    Normal path Delayed due to C2 capacity Deoptimization 0 1 2 3 4
  10. Custom JRE CDS Archive Native Image Warm up in advance

    Code caching CRaC/CRIU C1 only JIT Centralization Leyden
  11. Benchmark environment Allocated resources per container vCore: 2 RAM: 4GiB

    JDK 21 (21.0.4) GC : G1 Max heap : 75% allocation Application framework Micronaut 4.6.2 Option +UseStringDeduplication Other options might be specified in each case. Measurement Run 1000 times Average / Percentile (50, 90, 95, 99)
  12. Result Average time: (JAR) 931ms, (Extracted) 888ms JAR file extraction

    100.00% 92.86% 80.00% 85.00% 90.00% 95.00% 100.00% JAR Extracted JAR 99P 95P 90P 50P Average
  13. Custom JRE Reduce the number of classes to be loaded.

    jdeps jdeps –R \ -cp "target/lib/[" \ -[print-module-deps \ -[ignore-missing-deps \ -[multi-release 21 \ target/App.jar jlink jlink -[module-path $JAVA_HOME/jmods \ -[add-modules ${MODULE_LIST} \ -[no-header-files \ -[no-man-pages \ -[output linked # -[compress=0/1/2 is deprecated
  14. Result Custom JRE 100.00% 92.86% 94.85% 80.00% 85.00% 90.00% 95.00%

    100.00% JAR Extracted JAR jlink 99P 95P 90P 50P Average
  15. Benefits and cautions Benefits  Startup time and memory footprint

    are improved since the number of classes to be loaded is decreased. Cautions  A little bit efforts are required to create custom JRE (e.g., Multi-stage build to create container image).  Note that jdeps sometimes does not find dependency modules like jdk.crypto.ec.  From JDK 22, jdk.crypto.ec is included in java.base.  Reduces JRE size only. Custom JRE
  16. CDS Archive Change the way to load classes  App

    CDS (JEP 310 / JDK 10)  Application Class Data Sharing (AppCDS) stores classes used by your applications in an archive file. (The java Command (oracle.com))  Default CDS (JEP 341 / JDK 12)  Created at the JDK build time by running -Xshare:dump, using G1 GC and 128M Java heap (Oracle JDK / Class Data Sharing (oracle.com))  Dynamic CDS (JEP 350 / JDK 13)  Dynamic CDS archive extends application class-data sharing (AppCDS) to allow dynamic archiving of classes when a Java application exits. (Class Data Sharing (oracle.com)) CDS Archive
  17. # Create Static CDS archive $java -Xshare:off \ -XX:DumpLoadedClassList=<ClassFileList> -jar

    app.jar $java -Xshare:dump -XX:SharedArchiveFile=<Archive> \ -XX:SharedClassListFile=<ClassFileList> # Create Dynamic CDS archive at exiting application $ java -XX:ArchiveClassesAtExit=<Archive> -jar app.jar # Use the CDS archive with application $ java -XX:SharedArchiveFile=<Archive> -jar app.jar # Create CDS Archive automatically (since JDK 19) $ java -XX:+AutoCreateSharedArchive \ –XX:SharedArchiveFile=<Archive> -jar app.jar CDS Archive Other options are found in The java Command (oracle.com)
  18. Result (Static CDS only) CDS Archive 100.00% 92.86% 94.85% 43.52%

    40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% JAR Extracted JAR jlink CDS 99P 95P 90P 50P Average
  19. Result (Static & Dynamic CDS w/ training) CDS Archive 100.00%

    92.86% 94.85% 43.52% 42.55% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined 99P 95P 90P 50P Average
  20. Benefits and cautions Benefits  Improve time to load classes.

     Available in any platforms  Can coexist Dynamic CDS and static CDS.  Can also use CDS archives with custom JRE. Cautions  As applications are updated, we must recreate CDS archive. CDS Archive
  21. If using CDS based on extracted JAR? CDS Archive 100.00%

    43.52% 42.55% 88.30% 89.33% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% JAR CDS CDS Combined CDS extracted CDS extracted combined 99P 95P 90P 50P Average
  22. Use only C1 without profiling -XX:TieredStopAtLevel=1  JVM selects C2

    by default when CPU on the platform is multi-core processors or using 64-bit VMs.  With just C1, there is no profiling overhead, so could we get better performance than with profiling enabled?  According to some cloud vendors’ document, C1 is one of the options to improve startup time. Customize Java runtime startup behavior for Lambda functions - AWS Lambda (amazon.com) C1 only
  23. Result C1 only 100.00% 92.86% 43.52% 42.55% 83.90% 40.00% 50.00%

    60.00% 70.00% 80.00% 90.00% 100.00% JAR Extracted JAR CDS CDS Combined C1 99P 95P 90P 50P Average
  24. Benefits and cautions Benefits  Short-live applications can gain benefits.

     As no profiling occurs, startup time will be reduced.  Custom JRE, CDS archive, and this can be used together. Cautions  This setting is not useful for long running applications, since such applications should leverage code generated by C2, which is highly optimized. C1 only
  25. AOT (Ahead of time) compilation Resolve dependencies and compile codes

    at build time. JDK 9-17: experimental (deprecated and removed) JDK Support GraalVM (Native Image) OpenJ9 OpenJDK (Project Leyden) etc.
  26. GraalVM Native Image Generic Micronaut Spring Boot $ mvn -Pnative

    spring-boot:build-image $ gradle bootBuildImage # Using Native Build Tools $ mvn -Pnative native:compile $ gradle nativeCompile $ native-image App.class $ native-image -jar App.jar $ mvn package -Dpackaging=native-image $ gradle nativeCompile
  27. Result Native Image 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91%

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native 99P 95P 90P 50P Average
  28. Result Native Image 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91%

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native 99P 95P 90P 50P Average 0.91% 0.80% 0.85% 0.90% 0.95% 1.00% 1.05% Native
  29. Benefits and cautions Benefits  Applications can start rapidly. 

    Lower memory footprint and other advantages. Cautions  Hardware/Platform (CPU/OS) specific  Longer build time  A little bit effort is required for reflection support. Native Image
  30. Use checkpoints  CRIU (Checpoint/Restore in Userspace) CRIU support -

    (eclipse.dev)  CRaC (Coordinate Restore at Checkpoint) Java on CRaC - Optimize JVM Start-Up | Azul CRaC/CRIU
  31. Please note that... "CRaC implementation creates the checkpoint only if

    the whole Java instance state can be stored in the image. Resources like open files or sockets are cannot, so it is required to release them when checkpoint is made. CRaC emits notifications for an application to prepare for the checkpoint and return to operating state after restore." https://github.com/CRaC/docs CRaC/CRIU
  32. CRaC # 1. Start an application in the checkpoint mode.

    $JAVA_HOME/bin/java \ -XX:CRaCCheckpointTo=<CheckPointFileDir> -jar App.jar # 2. After warm up, Request checkpoint jcmd App.jar JDK.checkpoint # 3. Restore the snapshot $JAVA_HOME/bin/java -XX:CRaCRestoreFrom=<CheckPointFileDir> CRaC/CRIU
  33. Result CRaC/CRIU 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91% 2.39%

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native CraC 99P 95P 90P 50P Average
  34. Result CRaC/CRIU 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91% 2.39%

    0.00% 20.00% 40.00% 60.00% 80.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native CraC 99P 95P 90P 50P Average 0.91% 2.39% 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% Native CraC
  35. Benefits and cautions Benefits  Work well for containers. 

    Startup time is quite short. Cautions  Strictly same dependencies and environment between executions is required.  Project is undergoing.  Privileged operation is required.  Some efforts to capture checkpoint (Automation is a key...) CRaC/CRIU
  36. 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91% 2.39% 0.00% 20.00%

    40.00% 60.00% 80.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native CraC Startup time ratio (Base=100, smaller is better) 99P 95P 90P 50P Average
  37. Project Leyden https://openjdk.org/projects/leyden/  Goal  Improve the startup time,

    time to peak performance, and footprint of Java programs.  Focus  Standardize AOT for Hotspot JVM  Start native, but support and optimize dynamic stuff later  Resources  Project Leyden (openjdk.org)
  38. Concept: Shifting computation (1/2) from runtime to earlier experimental executions,

    known as training runs. Unified Cache Data Store (CDS) Archive Store class metadata, heap objects, profiling data, and compiled code. -XX:CacheDataStore Loaded Classes in CDS Archives Preload classes as soon as the application starts. -XX:+PreloadSharedClasses Method Profiles in CDS Archives Store method profiles from training runs in the CDS archive, allowing the Just-In-Time (JIT) compiler to start compiling earlier during warm-up. -XX:+RecordTraining -XX:+ReplayTraining AOT Resolution of Constant Pool Entries Resolve many constant pool entries during the training run, improving start-up times and enabling better code generation by the AOT compiler. -XX:+ArchiveFieldReferences -XX:+ArchiveMethodReferences -XX:+ArchiveInvokeDynamic
  39. Concept: Shifting computation (2/2) from runtime to earlier experimental executions,

    known as training runs. AOT Compilation of Java Methods Identify frequently used methods during the training run, compiles them, and stores them with the CDS archive. -XX:+StoreCachedCode -XX:+LoadCachedCode -XX:CachedCodeFile AOT Generation of Dynamic Proxies and Reflection Data Reduce start-up times by generating dynamic proxies and reflection data. -XX:+ArchiveDynamicProxies -XX:+ArchiveReflectionData Class Loader Lookup Cache Speed up repeated class lookups, which are common in application frameworks, by caching them. -XX:+ArchiveLoaderLookupCache
  40. Benchmark environment for Leyden Allocated resources per container vCore: 2

    RAM: 4GiB JDK 21 (21.0.4) Leyden Early-Access Builds (java.net) GC : G1 Max heap : 75% allocation Application framework Micronaut 4.6.2 Option +UseStringDeduplication -XX:CacheDataStore=<cds_file> Measurement Run 1000 times Average / Percentile (50, 90, 95, 99)
  41. At first, call the following command for training apps and

    generating AOT compiled code. java -XX:+UseG1GC \ -XX:MaxRAMPercentage=75 \ -XX:InitialRAMPercentage=75 \ -XX:+UseStringDeduplication \ -XX:CacheDataStore=<cds_file> \ -jar App.jar
  42. (Currently) two files are generated. <cds_file> The file contains classes,

    heap objects and profiling data harvested from the training run. <cds_file>.code The file contains AOT-compiled methods, optimized for the execution behaviors observed during the training run. [NOTE] Data in this file will be merged into <cds_file> in a future release.
  43. Next, call the same command to run with the generated

    CDS (Cache Data Store) file. java -XX:+UseG1GC \ -XX:MaxRAMPercentage=75 \ -XX:InitialRAMPercentage=75 \ -XX:+UseStringDeduplication \ -XX:CacheDataStore=<cds_file> \ -jar App.jar
  44. Result Leyden 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91% 2.39%

    33.10% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native CraC Leyden 99P 95P 90P 50P Average
  45. Result Leyden 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91% 2.39%

    33.10% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native CraC Leyden 99P 95P 90P 50P Average 0.91% 2.39% 33.10% 0.00% 10.00% 20.00% 30.00% 40.00% Native CraC Leyden
  46. 100.00% 92.86% 94.85% 43.52% 42.55% 83.90% 0.91% 2.39% 33.10% 0.00%

    20.00% 40.00% 60.00% 80.00% 100.00% JAR Extracted JAR jlink CDS CDS Combined C1 Native CraC Leyden Startup time ratio (Base=100, smaller is better) 99P 95P 90P 50P Average
  47. Takeaways  We have several options to improve startup time.

     Updating Java version is also another option.  Several projects such as CRaC and Leyden to improve startup time are now on-going.  Please note that we should choose the most suitable technique based on characteristics and requirements of applications.