Slide 1

Slide 1 text

May the force be with your Java applications - they can start more rapidly and run faster! NISHIKAWA, Akihiro (@logico_jp) Cloud Solution Architect Microsoft

Slide 2

Slide 2 text

Who am I? { "name": "Akihiro Nishikawa", "country": "Japan", "favourites": [ "JVM", "GraalVM", "Azure" ], "expertise": [ "Application integration", "Container and Serverless" ] }

Slide 3

Slide 3 text

Agenda  Why startup gets important  Startup procedure of Java Applications  Options to improve startup time  Future

Slide 4

Slide 4 text

Why startup gets important

Slide 5

Slide 5 text

Performance... Startup CPU Usage Throughput Latency Size Memory footprint

Slide 6

Slide 6 text

Situation changed I understand all aspects are important, of course  Startup Latency Throughput Footprint Java applications (Run on App Servers)) △ ◎ ◎ ○ Serverless applications and autoscaling containers ◎ ◎ ◎ ◎

Slide 7

Slide 7 text

Serverless adoption Source : The Future of Java by Mark Little – YouTube [Devoxx UK 2022] 1. Node.js (62.9%) 2. Python (20.8%) 3. Go (6.4%) 4. Java (6.1%) 5. C# (3.8%)

Slide 8

Slide 8 text

Startup procedure

Slide 9

Slide 9 text

Startup What happens in starting Java applications? JVM • Load and Initialize • Generate bytecode templates JVM • Load application classes • Initialize application classes • Application specific initialization JVM • Compile/deoptimize/recompile Application • Process specific workloads JVM Startup Application Startup Application Warmup Fast Quick Long time

Slide 10

Slide 10 text

Life cycle (image) CL: class loading JIT: JIT compilation GC: garbage collection

Slide 11

Slide 11 text

Tiered compilation C1 (a.k.a. client compiler)  Shorter time to compile  Not so highly optimized  Not so better throughput C2 (a.k.a. server compiler)  Longer time to compile  Highly optimized  Better throughput

Slide 12

Slide 12 text

Compilation Level C1 full optimization (no profiling) C1 with invocation and back-edge counters C1 full profiling (level2 + MDO: MethodDataOop) 0 1 2 3 4 C1 Interpreter C2

Slide 13

Slide 13 text

Compilation Level 0 1 2 3 4 Normal path Delayed due to C2 capacity C1 Interpreter C2 0 1 2 3 4 Deoptimization

Slide 14

Slide 14 text

Method compilation life cycle Run Interpreter C1 C2 Code cache Save C1 compiled code(s) Save C2 compiled code(s) Profiling Profiling Deoptimization Deoptimize compiled code Interpret and profile

Slide 15

Slide 15 text

Thresholds static bool apply_scaled(const methodHandle& method, CompLevel cur_level, int i, int b, double scale) { double threshold_scaling; if (CompilerOracle::has_option_value(method, CompileCommand::CompileThresholdScaling, threshold_scaling)) { scale *= threshold_scaling; } switch(cur_level) { case CompLevel_none: case CompLevel_limited_profile: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); case CompLevel_full_profile: return (i >= Tier4InvocationThreshold * scale) || (i >= Tier4MinInvocationThreshold * scale && i + b >= Tier4CompileThreshold * scale); default: return true; } } jdk/src/hotspot/share/compiler/compilationPolicy.cpp at jdk-21+35 · openjdk/jdk (github.com)

Slide 16

Slide 16 text

Thresholds static bool apply_scaled(const methodHandle& method, CompLevel cur_level, int i, int b, double scale) { double threshold_scaling; if (CompilerOracle::has_option_value(method, CompileCommand::CompileThresholdScaling, threshold_scaling)) { scale *= threshold_scaling; } switch(cur_level) { case CompLevel_none: case CompLevel_limited_profile: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); case CompLevel_full_profile: return (i >= Tier4InvocationThreshold * scale) || (i >= Tier4MinInvocationThreshold * scale && i + b >= Tier4CompileThreshold * scale); default: return true; } } jdk/src/hotspot/share/compiler/compilationPolicy.cpp at jdk-21+35 · openjdk/jdk (github.com) case CompLevel_limited_profile: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); case CompLevel_full_profile: return (i >= Tier4InvocationThreshold * scale) || (i >= Tier4MinInvocationThreshold * scale && i + b >= Tier4CompileThreshold * scale);

Slide 17

Slide 17 text

Thresholds # of Executions > TierXInvocationThreshold * Scale # of Executions > TierXMinInvocationThreshold * Scale AND # of Executions + # of Iterations > TierXCompileThreshold * Scale OR Level 3 Level 4 TierXInvocationThreshold 200 5_000 TierXMinInvocationThreshold 100 600 TierXCompileThreshold 2_000 15_000

Slide 18

Slide 18 text

$ java -XX:+PrintFlagsFinal \ –version | grep Threshold

Slide 19

Slide 19 text

$ java -XX:+PrintFlagsFinal -version | grep Threshold intx CompileThreshold = 10000 {pd product} {default} double CompileThresholdScaling = 1.000000 {product} {default} double G1PeriodicGCSystemLoadThreshold = 0.000000 {manageable} {default} uintx G1SATBBufferEnqueueingThresholdPercent = 60 {product} {default} uintx IncreaseFirstTierCompileThresholdAt = 50 {product} {default} uintx InitialTenuringThreshold = 7 {product} {default} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} uintx MaxTenuringThreshold = 15 {product} {default} size_t PretenureSizeThreshold = 0 {product} {default} uint StringDeduplicationAgeThreshold = 3 {product} {default} double SweeperThreshold = 15.000000 {product} {default} uintx ThresholdTolerance = 10 {product} {default} intx Tier2BackEdgeThreshold = 0 {product} {default} intx Tier2CompileThreshold = 0 {product} {default} intx Tier3BackEdgeThreshold = 60000 {product} {default} intx Tier3CompileThreshold = 2000 {product} {default} intx Tier3InvocationThreshold = 200 {product} {default} intx Tier3MinInvocationThreshold = 100 {product} {default} intx Tier4BackEdgeThreshold = 40000 {product} {default} intx Tier4CompileThreshold = 15000 {product} {default} intx Tier4InvocationThreshold = 5000 {product} {default} intx Tier4MinInvocationThreshold = 600 {product} {default} openjdk version "21" 2023-09-19 OpenJDK Runtime Environment (build 21+35-2513) OpenJDK 64-Bit Server VM (build 21+35-2513, mixed mode, sharing)

Slide 20

Slide 20 text

Startup time and performance - Fibonacci numbers // java Fib.java 45 --> 45th number is 1_134_903_170 // -XX:+UseG1GC -Xmx2g -Xms2g -XX:+UseStringDeduplication public class Fib { public static void main(String... args) { long num = 0; if(args.length != 1) return; num = Long.valueOf(args[0]); System.out.printf("%d(st/nd/rd/th) >> %d\n", num, fib(num)); } static long fib(long n) { if(n < 2) return n; return fib(n - 2) + fib(n - 1); } }

Slide 21

Slide 21 text

Results (seconds) AMD 3rd EPYCTM 7763v (8 vcpus, 32 GiB memory) Java 17.0.8 Java 21 GraalVM 23.1 (Java 21) Compile Run Compile Run Compile Run Interpreter only N/A 250.315 N/A 143.185 N/A 143.750 C1 Only (no profiling in Interpreter) 0.390 4.597 0.381 5.183 0.163 4.945 C2 Only (no profiling in C1) 3.961 7.803 5.585 10.117 3.655 6.941 Tiered compilation (Interpreter  C1) 0.009 4.141 0.012 4.717 0.022 4.758 Tiered compilation (Interpreter  C1  C2) C1: 0.009 C2: 0.002 3.542 C1: 0.011 C2: 0.005 4.094 C1: 0.052 C2*: 0.053 3.254 (*) Regarding GraalVM, not C2 but JVMCI-native compiler is used.

Slide 22

Slide 22 text

Options to improve startup time

Slide 23

Slide 23 text

Custom JRE C1 only AOT compilation Warm up in advance Code cache Checkpoint CDS Archive JIT Centralization

Slide 24

Slide 24 text

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native CRaC Startup time ratio (Base=100, smaller is better) 99P 95P 90P 50P Average

Slide 25

Slide 25 text

Benchmark environment Allocation per container vCore: 2 RAM: 4GiB JDK 17 (17.0.8.1) GC : G1 Max heap : 75% allocation Application framework Micronaut 4.1.3 Option +UseStringDeduplication Other options might be specified in each case. Measurement Run 1000 times Average / Percentile (50, 90, 95, 99)

Slide 26

Slide 26 text

Baseline java -XX:+UseG1GC \ -XX:MaxRAMPercentage=75 \ -XX:InitialRAMPercentage=75 \ -XX:+UseStringDeduplication \ -jar App.jar

Slide 27

Slide 27 text

Improve class loading Custom JRE CDS Archive

Slide 28

Slide 28 text

Custom JRE Reduce the number of classes to be loaded. jdeps jdeps –R \ -cp "target/lib/*" \ --print-module-deps \ --ignore-missing-deps \ --multi-release 17 \ target/App.jar # java.base,java.compiler, # java.desktop,java.management, # java.naming,java.sql,java.xml, # jdk.unsupported jlink jlink --compress=2 \ --module-path $JAVA_HOME/jmods \ --add-modules \ java.base, java.compiler,\ java.desktop, java.management,\ java.naming, java.sql, \ java.xml, jdk.unsupported \ --no-header-files \ --no-man-pages \ --output linked Custom JRE

Slide 29

Slide 29 text

Result 99% 100% 101% Base jlink 99P 95P 90P 50P Average Custom JRE

Slide 30

Slide 30 text

Benefits and cautions Benefits  Startup time and memory footprint are improved since the number of classes to be loaded is decreased. Cautions  A little bit efforts are required to create custom JRE (e.g., Multi-stage build to create container image).  Note that jdeps sometimes does not find dependency modules like jdk.crypto.ec. Custom JRE

Slide 31

Slide 31 text

CDS Archive Change the way to load classes  App CDS (JEP 310 / JDK 10)  Application Class Data Sharing (AppCDS) stores classes used by your applications in an archive file. (The java Command (oracle.com))  Default CDS (JEP 341 / JDK 12)  Created at the JDK build time by running -Xshare:dump, using G1 GC and 128M Java heap (Oracle JDK / Class Data Sharing (oracle.com))  Dynamic CDS (JEP 350 / JDK 13)  Dynamic CDS archive extends application class-data sharing (AppCDS) to allow dynamic archiving of classes when a Java application exits. (Class Data Sharing (oracle.com)) CDS Archive

Slide 32

Slide 32 text

# Create Static CDS archive $java -Xshare:off \ -XX:DumpLoadedClassList= -jar app.jar $java -Xshare:dump -XX:SharedArchiveFile= \ -XX:SharedClassListFile= # Create Dynamic CDS archive at exiting application $ java -XX:ArchiveClassesAtExit= -jar app.jar # Use the CDS archive with application $ java -XX:SharedArchiveFile= -jar app.jar # Create CDS Archive automatically (since JDK 19) $ java -XX:+AutoCreateSharedArchive \ –XX:SharedArchiveFile= -jar app.jar CDS Archive Other options are found in The java Command (oracle.com)

Slide 33

Slide 33 text

Result (Static CDS only) 75% 80% 85% 90% 95% 100% 105% Base jlink CDS 99P 95P 90P 50P Average CDS Archive

Slide 34

Slide 34 text

Result (Static & Dynamic CDS w/ training) 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined 99P 95P 90P 50P Average CDS Archive

Slide 35

Slide 35 text

Benefits and cautions Benefits  Improve time to load classes.  Available in any platforms  Can coexist Dynamic CDS and static CDS.  Can use CDS archives with custom JRE. Cautions  As applications are updated, we have to recreate CDS archive. CDS Archive

Slide 36

Slide 36 text

Note  Neither -Xverify:none nor –noverify is used  Deprecated since JDK 13 and will be removed in the future release. [JDK-8218003] Release Note: Deprecated Java Options -Xverify:none and -noverify - Java Bug System (openjdk.org)  For users who need to run without startup verification  AppCDS allows us to archive their classes. The classes are verified during archiving and avoid verification at runtime.

Slide 37

Slide 37 text

Use only C1 without profiling -XX:TieredStopAtLevel=1  JVM selects C2 by default when CPU on the platform is multi-core processors or using 64-bit VMs.  If using only C1,  There is no profiling overhead.  You will get better performance than when profiling is enabled. C1 only

Slide 38

Slide 38 text

Result 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 99P 95P 90P 50P Average C1 only

Slide 39

Slide 39 text

Benefits and cautions Benefits  Short-live applications can gain benefits.  As no profiling occurs, startup time will be reduced.  Custom JRE, CDS archive, and this can be used together. Cautions  This setting is not useful for long running applications, since such applications should leverage code generated by C2, which is highly optimized. C1 only

Slide 40

Slide 40 text

Offloading JIT compilation AOT compilation JIT Centralization

Slide 41

Slide 41 text

AOT (Ahead of time) compilation  Resolve dependencies and compile codes at build time.  JDK 9-17: experimental (deprecated and removed)  JDK Support  GraalVM (Native Image)  Azul Zulu  OpenJ9, etc.  Development framework support  Micronaut  Spring Boot AOT compilation

Slide 42

Slide 42 text

Generic Micronaut Spring Boot $ mvn -Pnative spring-boot:build-image $ gradle bootBuildImage # Using Native Build Tools $ mvn -Pnative native:compile $ gradle nativeCompile $ native-image App.class $ native-image -jar App.jar $ mvn package -Dpackaging=native-image $ gradle nativeCompile

Slide 43

Slide 43 text

Result GraalVM Native Image 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native 99P 95P 90P 50P Average AOT compilation

Slide 44

Slide 44 text

Result GraalVM Native Image 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native 99P 95P 90P 50P Average AOT compilation 1.50% 1.55% 1.60% 1.65%

Slide 45

Slide 45 text

Result AOT enabled Framework 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base Base+framework AOT support jlink CDS CDS Combined C1 Native Native+ framework AOT support 99P 95P 90P 50P Average AOT compilation

Slide 46

Slide 46 text

Result AOT enabled Framework (Base) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base Base+framework AOT support jlink CDS CDS Combined C1 Native Native+ framework AOT support 99P 95P 90P 50P Average AOT compilation 99.0% 99.2% 99.4% 99.6% 99.8% 100.0% 100.2% Base Base+framework AOT support

Slide 47

Slide 47 text

Result AOT enabled Framework (Native) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base Base+framework AOT support jlink CDS CDS Combined C1 Native Native+ framework AOT support 99P 95P 90P 50P Average AOT compilation 1.35% 1.40% 1.45% 1.50% 1.55% 1.60% 1.65% 1.70% Native Native+ framework AOT support

Slide 48

Slide 48 text

Benefits and cautions Benefits  Applications can start rapidly.  Lower memory footprint and other advantages Cautions  AOT compilation support  Neither all application development frameworks nor all distributions support AOT.  Especially GraalVM Native Image,  Hardware/Platform (CPU/OS) specific  Long build time  As of now, generated executables are not suitable for long running.  A little bit effort is required for reflection support. AOT compilation

Slide 49

Slide 49 text

2) Centralized JIT  Offloading JIT compilation to other environment, which returns compiled codes to runtime environment (e.g., containers), to improve startup time of applications.  OpenJ9 JITServer (Eclipse OpenJ9)  JITServer technology - (eclipse.dev)  Azul Cloud Native Compiler  Java Compilation in the Cloud | Cloud Native Compiler (azul.com) JIT Centralization

Slide 50

Slide 50 text

Concept  Ordinally JIT compilation runs in each JVM. VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers VMs or Containers Java Application JVM JIT Compilation JIT Centralization

Slide 51

Slide 51 text

Concept  JIT compilation runs in dedicated JVM instance.  Each JVM instance communicates with the JIT JVM instance. VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers VMs or Containers Java Application JVM JIT Compilation Dedicated JVM instance(s) for JIT compilation JIT Compilation Request compilation   Return generated codes JIT Centralization

Slide 52

Slide 52 text

Benefits and cautions Benefits  Java applications could run on smaller resources.  Especially useful for apps running on containers.  Might allocate smaller CPU core and memory to each container  With caching compiled code in the dedicated JIT server instance, JIT compilation might be faster. Cautions  Network latency (recommends to use along with Kubernetes)  Might not be suitable for super short-live applications  Not all distributions are supported. JIT Centralization

Slide 53

Slide 53 text

Short-cut to reach peak performance Warm up in advance Code cache Checkpoint

Slide 54

Slide 54 text

Use profiled data to warmup applications  JWarmup (Alibaba Dragonwell) JEP draft: JWarmup precompile java hot methods at application startup (openjdk.org)  Azul ReadyNow! (Azul) ReadyNow!® - Azul | Better Java Performance, Superior Java Support Train applications

Slide 55

Slide 55 text

Benefits and cautions Benefits  No code change is required since characteristics of Java JIT compiler are leveraged to increase startup time. Cautions  Not all distributions are supported.  Depending upon distributions, how to provide/collect profile log/data is different. [ReadyNow!]  -XX:ProfileLogIn=  -XX:ProfileLogOut= [JWarmup]  -XX:CompilationWarmUpLogFile= Train applications

Slide 56

Slide 56 text

Use code cache  Compile Stashing (Azul) Using Compile Stashing (azul.com)  Dynamic AOT and Shared Class Cache (OpenJ9) AOT Compiler - (eclipse.dev) Introduction - (eclipse.dev) Use code cache

Slide 57

Slide 57 text

Benefits and cautions Benefits  Reduce startup time, especially compilation time.  Code cache along with warm up feature might allow us to run applications faster and gain optimized codes. Cautions  Not all distributions are supported. Use code cache

Slide 58

Slide 58 text

Use checkpoints  CRIU (Checpoint/Restore in Userspace) CRIU support - (eclipse.dev)  CRaC (Coordinate Restore at Checkpoint) Java on CRaC - Optimize JVM Start-Up | Azul Use checkpoint

Slide 59

Slide 59 text

CRaC (Coordinate Restore at Checkpoint) Bypassed Application start point

Slide 60

Slide 60 text

Please note that... "CRaC implementation creates the checkpoint only if the whole Java instance state can be stored in the image. Resources like open files or sockets are cannot, so it is required to release them when checkpoint is made. CRaC emits notifications for an application to prepare for the checkpoint and return to operating state after restore." https://github.com/CRaC/docs Use checkpoint

Slide 61

Slide 61 text

CRaC # 1. Start an application in the checkpoint mode. $JAVA_HOME/bin/java \ -XX:CRaCCheckpointTo= -jar App.jar # 2. After warm up, Request checkpoint jcmd App.jar JDK.checkpoint # 3. Restore the snapshot $JAVA_HOME/bin/java -XX:CRaCRestoreFrom= Use checkpoint

Slide 62

Slide 62 text

Result 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native CRaC 99P 95P 90P 50P Average Use checkpoint

Slide 63

Slide 63 text

Result 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS C1 CRaC Native 99P 95P 90P 50P Average Use checkpoint 0.00% 2.00% 4.00% 6.00% 8.00% Native CRaC

Slide 64

Slide 64 text

Benefits and cautions Benefits  Work well for containers.  Startup time is quite short. Cautions  Strictly same dependencies and environment between executions is required.  Project is undergoing.  Privilege operation is required.  Some efforts to capture checkpoint (Automation is a key...) Use checkpoint

Slide 65

Slide 65 text

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native CRaC Startup time ratio (Base=100, smaller is better) 99P 95P 90P 50P Average

Slide 66

Slide 66 text

Future

Slide 67

Slide 67 text

Project Leyden openjdk.org/projects/leyden  Goal  Improve the startup time, time to peak performance, and footprint of Java programs.  Focus  Standardize AOT for Hotspot JVM  Start native, but support and optimize dynamic stuff later  Resources  Project Leyden - Capturing Lightning in a Bottle - YouTube  202308-Leyden-JVMLS.pdf (openjdk.org)  leyden-premain-petclinic-2023-09-12.pdf (openjdk.org)  Project Leyden By Brian Goetz - YouTube

Slide 68

Slide 68 text

Concept: Shifting computation Using both the existing features and newly added ones Dynamic CDS Archive Cached Code Archive classes and heap snapshot Static CDS Archive training data pre-compiled machine code We can use these techniques now!

Slide 69

Slide 69 text

Takeaways

Slide 70

Slide 70 text

Takeaways  We have several options to improve startup time.  Updating Java version is also another option.  Several projects to improve startup time are now on-going.  It is the most important to choose the most suitable technique based on characteristics and requirements of applications.

Slide 71

Slide 71 text

No content