Slide 1

Slide 1 text

Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneurs JVM Redux

Slide 2

Slide 2 text

Orateurs Brice Dutheil @BriceDutheil Jean-Philippe Bempel @jpbempel

Slide 3

Slide 3 text

Agenda My container gets oomkilled How does the memory actually work Some case in hands Container gets respawned Things that slow down startup Break

Slide 4

Slide 4 text

My container gets oomkilled

Slide 5

Slide 5 text

The containers are restarting. What’s going on ? $ kubectl get pods NAME READY STATUS RESTARTS AGE my-pod-5759f56c55-cjv57 3/3 Running 7 3d1h $ kubectl describe pod my-pod-5759f56c55-cjv57 ... State: Running Started: Mon, 06 Jun 2020 13:39:40 +0200 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Thu, 06 Jun 2020 09:20:21 +0200 Finished: Mon, 06 Jun 2020 13:39:38 +0200

Slide 6

Slide 6 text

πŸ”₯ πŸ”₯ πŸ”₯πŸ”₯ πŸ”₯ πŸ”₯ πŸ”₯ 🚨 Crisis mode 🚨 If containers are oomkilled Just increase the container memory limits and investigate later

Slide 7

Slide 7 text

Linux Oomkiller ● Out Of Memory Killer Linux mechanism employed to kill processes when the memory is critically low ● For regular processes the oomkiller selects the bad ones. ● Within a restrained container, i.e. with memory limits, β—‹ If available memory reaches 0 in this container then the oomkiller terminates all processes β—‹ There is usually a single process in container

Slide 8

Slide 8 text

Linux oomkiller and containers Monitor it ● kube_pod_container_status_last_terminated_reason, if the exit code is 137, the attached reason label will be set to OOMKilled ● Trigger an alert by coupling with kube_pod_container_status_restarts_total Synthetic oomkill docker run --memory-swap=100m --memory=100m \ --rm -it azul/zulu-openjdk:11 \ java -Xms100m -XX:+AlwaysPreTouch --version

Slide 9

Slide 9 text

Monitor the resident memory of a process Eg with micrometer you may have those ● Heap Max : jvm_memory_max_bytes ● Heap Live : jvm_memory_bytes_used ● Process RSS : process_memory_rss_bytes And system ones, eg Kubernetes metrics ● Container RSS : container_memory_rss ● Memory limit : kube_pod_container_resource_limi ts_memory_bytes RSS Heap Max Heap Liveset memory limit

Slide 10

Slide 10 text

Memory of a JVM process

Slide 11

Slide 11 text

Memory As JVM based developers ● Used to think about JVM Heap sizing, mostly -Xms, -Xmx, … ● Possibly some deployment use container-aware flags: -XX:MinRAMPercentage, -XX:MaxRAMPercentage, … JVM Heap Xmx or MaxRAMPercentage

Slide 12

Slide 12 text

Why should I be concerned by native? Still have no idea what is happening off-heap How can you size properly the heap or the container ? JVM Heap

Slide 13

Slide 13 text

JVM Memory Breakdown Running A JVM requires memory: ● The Java Heap ● The Meta Space (pre-JDK 8 the Permanent Generation) ● Direct byte buffers ● Code cache (compiled code) ● Garbage Collector (like card table) ● Compiler (C1/C2) ● Threads ● Symbols ● etc. JVM subsystems

Slide 14

Slide 14 text

JVM Memory Breakdown Except a few flags for meta space, code cache, or direct memory There’s no control over memory consumption of the other components But It is possible to get their size at runtime.

Slide 15

Slide 15 text

By monitoring Eg with micrometer Time series ● jvm_memory_used_bytes ● jvm_memory_committed_bytes ● jvm_memory_max_bytes Dimensions ● area : heap or nonheap ● id : memory zone, depends on GC and JVM (e.g. G1 Eden Space, Compressed Class Space…)

Slide 16

Slide 16 text

Monitoring is only as good as data is there Observability metrics rely on MBean to get memory areas Most JVM don’t export metrics for everything that uses memory

Slide 17

Slide 17 text

Time to investigate the footprint With diagnostic tools

Slide 18

Slide 18 text

jcmd – a swiss knife Get the actual flag values $ jcmd $(pidof java) VM.flags | tr ' ' '\n' 6: ... -XX:InitialHeapSize=4563402752 -XX:InitialRAMPercentage=85.000000 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=4563402752 -XX:MaxNewSize=2736783360 -XX:MaxRAMPercentage=85.000000 -XX:MinHeapDeltaBytes=2097152 -XX:NativeMemoryTracking=summary ... PID Xms Xmx

Slide 19

Slide 19 text

JVM’s Native Memory Tracking 1. Start the JVM with -XX:NativeMemoryTracking=summary 2. Later run jcmd $(pidof java) VM.native_memory Modes ● summary ● detail ● baseline/ diff

Slide 20

Slide 20 text

$ jcmd $(pidof java) VM.native_memory 6: Native Memory Tracking: Total: reserved=7168324KB, committed=5380868KB - Java Heap (reserved=4456448KB, committed=4456448KB) (mmap: reserved=4456448KB, committed=4456448KB) - Class (reserved=1195628KB, committed=165788KB) (classes #28431) ( instance classes #26792, array classes #1639) (malloc=5740KB #87822) (mmap: reserved=1189888KB, committed=160048KB) ( Metadata: ) ( reserved=141312KB, committed=139876KB) ( used=135945KB) ( free=3931KB) ( waste=0KB =0.00%) ( Class space:) ( reserved=1048576KB, committed=20172KB) ( used=17864KB) ( free=2308KB) ( waste=0KB =0.00%) - Thread (reserved=696395KB, committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347) - Code (reserved=251877KB, committed=105201KB) (malloc=4189KB #11718) (mmap: reserved=247688KB, committed=101012KB) - GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB) - Compiler (reserved=5914KB, committed=5914KB) (malloc=6143KB #3281) (arena=180KB #5) - Internal (reserved=24460KB, committed=24460KB) (malloc=24460KB #13140) - Other (reserved=267034KB, committed=267034KB)

Slide 21

Slide 21 text

$ jcmd $(pidof java) VM.native_memory 6: Native Memory Tracking: Total: reserved=7168324KB, committed=5380868KB - Java Heap (reserved=4456448KB, committed=4456448KB) (mmap: reserved=4456448KB, committed=4456448KB) - Class (reserved=1195628KB, committed=165788KB) (classes #28431) ( instance classes #26792, array classes #1639) (malloc=5740KB #87822) (mmap: reserved=1189888KB, committed=160048KB) ( Metadata: ) ( reserved=141312KB, committed=139876KB) ( used=135945KB) ( free=3931KB) ( waste=0KB =0.00%) ( Class space:) ( reserved=1048576KB, committed=20172KB) ( used=17864KB) ( free=2308KB) ( waste=0KB =0.00%) - Thread (reserved=696395KB, committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347) - Code (reserved=251877KB, committed=105201KB) (malloc=4189KB #11718) (mmap: reserved=247688KB, committed=101012KB) - GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB) - Compiler (reserved=5914KB, committed=5914KB) (malloc=6143KB #3281) (arena=180KB #5) - Internal (reserved=24460KB, committed=24460KB) (malloc=24460KB #13140) - Other (reserved=267034KB, committed=267034KB) (classes #28431) (thread #674) Java Heap (reserved=4456448KB, committed=4456448KB)

Slide 22

Slide 22 text

$ jcmd $(pidof java) VM.native_memory 6: Native Memory Tracking: Total: reserved=7168324KB, committed=5380868KB - Java Heap (reserved=4456448KB, committed=4456448KB) (mmap: reserved=4456448KB, committed=4456448KB) - Class (reserved=1195628KB, committed=165788KB) (classes #28431) ( instance classes #26792, array classes #1639) (malloc=5740KB #87822) (mmap: reserved=1189888KB, committed=160048KB) ( Metadata: ) ( reserved=141312KB, committed=139876KB) ( used=135945KB) ( free=3931KB) ( waste=0KB =0.00%) ( Class space:) ( reserved=1048576KB, committed=20172KB) ( used=17864KB) ( free=2308KB) ( waste=0KB =0.00%) - Thread (reserved=696395KB, committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347) - Code (reserved=251877KB, committed=105201KB) (malloc=4189KB #11718) (mmap: reserved=247688KB, committed=101012KB) - GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB) - Compiler (reserved=5914KB, committed=5914KB) (malloc=6143KB #3281) (arena=180KB #5) - Internal (reserved=24460KB, committed=24460KB) (malloc=24460KB #13140) - Other (reserved=267034KB, committed=267034KB) (malloc=267034KB #631) - Symbol (reserved=28915KB, committed=28915KB) (malloc=25423KB #330973) (arena=3492KB #1) - Native Memory Tracking (reserved=8433KB, committed=8433KB) (malloc=117KB #1498) (tracking overhead=8316KB) - Arena Chunk (reserved=217KB, committed=217KB) (malloc=217KB) - Logging (reserved=7KB, committed=7KB) (malloc=7KB #266) - Arguments (reserved=19KB, committed=19KB) (malloc=19KB #521) Total: reserved=7168324KB, committed=5380868KB Class (reserved=1195628KB, committed=165788KB) Thread (reserved=696395KB, committed=85455KB) Code (reserved=251877KB, committed=105201KB) GC (reserved=230739KB, committed=230739KB) Compiler (reserved=5914KB, committed=5914KB) Internal (reserved=24460KB, committed=24460KB) Other (reserved=267034KB, committed=267034KB)

Slide 23

Slide 23 text

Direct byte buffers Those are the memory segments that are allocated outside the Java heap. Unused buffers are only freed upon GC. Netty for example use them. ● < JDK 11, they are reported in the Internal section ● β‰₯ JDK 11, they are reported in the Other section Internal (reserved=24460KB, committed=24460KB) Other (reserved=267034KB, committed=267034KB)

Slide 24

Slide 24 text

Garbage Collection Full blown memory management for the Java Heap (and Metaspace) Requires memory for its internal data structures (like G1 regions, remembered sets, etc.) On small containers this might be a thing to consider GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB)

Slide 25

Slide 25 text

πŸ’‘Memory mapped files They are not reported by Native Memory Tracking, at this time But are be accounted in the RSS.

Slide 26

Slide 26 text

Native Memory Tracking πŸ‘ Good insights on the JVM sub-systems, but Does NMT show everything ? Nope Is NMT data correct ? Yes, but not for resident memory usage ⚠ Careful about the overhead! Measure if this is important for you !

Slide 27

Slide 27 text

Huh what virtual, committed, reserved memory? virtual memory : memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large memory". reserved memory : Contiguous chunk of memory from the virtual memory that the program requested to the OS. committed memory : writable subset of reserved memory, might be backed by physical storage

Slide 28

Slide 28 text

Xmx = 4,25 GiB theory reality Touched pages of Java heap = 3 GiB Touched off-heap Xmx RSS Memory limit = 5 GiB Total: reserved=7168324KB, committed=5380868KB

Slide 29

Slide 29 text

RSS is the real footprint Only dirty memory page count for the oomkiller $ ps o pid,rss -p $(pidof java) PID RSS 6 4701120

Slide 30

Slide 30 text

What does it means with JVM flags ? -Xms / -Xmx for the Java heap -XX:MaxPermSize, -XX:MaxMetaspaceSize, -Xss, -XX:MaxDirectMemorySize β‡’ indication of how much memory heap memory is/can be reserved ⚠ These flags do have a big impact on JVM subsystems, bad settings may trigger unwanted behaviors : - GC if metaspace is too small - Heap resizing ig Xms β‰  Xmx - … Be careful

Slide 31

Slide 31 text

Avoid -XX:*RAMPercentage Sets the Java heap size as a function of the available physical memory. It works. ⚠ Small containers -XX:InitialRAMPercentage ⟹ -Xms mem < 96MiB -XX:MinRAMPercentage mem > 96MiB -XX:MaxRAMPercentage ⟹ -Xmx

Slide 32

Slide 32 text

Avoid -XX:*RAMPercentage Different environments β‡’ different load, different subsystem behavior E.g. preprod : 100 req/s πŸ‘‰ 40 java threads total πŸ‘‰ mostly liveness endpoints πŸ‘‰ low versatility in data πŸ‘‰ low GC activity requirement prod-us : 1000 req/s πŸ‘‰ 200 java threads total πŸ‘‰ mostly business endpoints πŸ‘‰ variance in data πŸ‘‰ higher GC requirements

Slide 33

Slide 33 text

Avoid -XX:*RAMPercentage 1 GiB 4 GiB prod-us preprod Java heap ≃ 850 MiB Java heap ≃ 3.4 GiB MaxRAMPercentage = 85 ~ 150 MiB for every subsystems Maybe OK for quiet workloads ~ 600 MiB for every subsystems Likely not enough for loaded systems ⟹ leads to oomkill

Slide 34

Slide 34 text

Avoid -XX:*RAMPercentage Traffic, Load are not linear, and do not have linear effects ● MaxRAMPercentage is a linear function of the container available RAM ● Too low MaxRAMPercentage ⟹ waste of space ● Too high MaxRAMPercentage ⟹ risk of oomkills ● Requires to find the sweet spot for all deployments ● Requires to adjust if load changes ● Need to convert back a percentage to raw value

Slide 35

Slide 35 text

Avoid -XX:*RAMPercentage -XX:*RAMPercentage flags sort of works, Its drawbacks don’t make this quite compelling. βœ… Prefer -Xms / -Xmx

Slide 36

Slide 36 text

How to configure memory requirement Off heap memory consumption is hard to estimate with prod load ● Require to understand the each JVM subsystem and make predictions. ● Fixing some of these values may have side effect, and it needs to be maintained. We propose instead an empiric approach

Slide 37

Slide 37 text

How to configure memory requirement In our experience it’s best to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. Heap Container memory limit at 5 GiB

Slide 38

Slide 38 text

How to configure memory requirement In our experience it’s best to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. 1. Observe the RSS evolution Heap RSS Container memory limit at 5 GiB

Slide 39

Slide 39 text

How to configure memory requirement In our experience it’s best to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. 1. Observe the RSS evolution 2. If RSS stabilizes after some time Heap RSS RSS stabilizing Container memory limit at 5 GiB

Slide 40

Slide 40 text

How to configure memory requirement In our experience it’s best to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. 1. Observe the RSS evolution 2. If RSS stabilizes after some time 3. Set the new memory limit with enough leeway (eg 200 MiB) Heap RSS New memory limit With some leeway for RSS increase Container memory limit at 5 GiB

Slide 41

Slide 41 text

How to configure memory requirement If RSS less than Heap size at start, it’s the trap of virtual memory. If actual heap usage grows up to Xmx (i.e touched pages), this will lead to oomkill RSS Touched pages of Java heap = 3 GiB Touched off-heap Theoric Java heap memory available Actual available memory

Slide 42

Slide 42 text

How to configure memory requirement To avoid the virtual memory pitfall use -XX:+AlwaysPreTouch All Java heap pages touched RSS

Slide 43

Slide 43 text

Other things to consider ● Growing number of Direct byte buffers (e.g. with Netty, and used in other libraries like gRPC) ● Native allocator, glibc’s malloc can be fragmenting the native memory. Slow but steady increase of memory Can represent Gigabytes on long running containers ● Memory mapped files In theory reclaimable memory, but still account in RSS

Slide 44

Slide 44 text

My Container gets re-spawn

Slide 45

Slide 45 text

Container Restarted

Slide 46

Slide 46 text

Quick Fix Increase Probe liveness timeout (either initial delay or interval) livenessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 60 periodSeconds: 10

Slide 47

Slide 47 text

JIT Compilation

Slide 48

Slide 48 text

Troubleshooting Compile Time Use jstat -compiler to see cumulated compilation time (in s) $ jstat -compiler 1 Compiled Failed Invalid Time FailedType FailedMethod 6002 0 0 101.16 0

Slide 49

Slide 49 text

Troubleshooting using JFR Use java -XX:StartFlightRecording jcmd 1 JFR.dump name=1 filename=petclinic.jfr jfr print --events jdk.CompilerStatistics petclinic.jfr

Slide 50

Slide 50 text

Troubleshooting using JFR

Slide 51

Slide 51 text

Measuring startup time docker run --cpus= -ti spring-petclinic CPUs JVM Startup time (s) Compile time (s) 4 8.402 17.36 2 8.458 10.17 1 15.797 20.22 0.8 20.731 21.71 0.4 41.55 46.51 0.2 86.279 92.93

Slide 52

Slide 52 text

C1 vs C2 C1 + C2 C1 only # compiled methods 6,117 5,084 # C1 compiled methods 5,254 5,084 # C2 compiled methods 863 0 Total Time (ms) 21,678 1,234 Total Time in C1 (ms) 2,071 1,234 Total Time in C2 (ms) 19,607 0

Slide 53

Slide 53 text

Compiler Settings To only use C1 JIT compiler: -XX:TieredStopAtLevel=1 To adjust C2 compiler threads: -XX:CICompilerCount=

Slide 54

Slide 54 text

Measuring startup time docker run --cpus= -ti spring-petclinic CPUs JVM Startup time (s) Compile time (s) JVM Startup time (s) XX:TieredStopAtLevel=1 Compile time (s) 4 8.402 17.36 6.908 (-18%) 1.47 2 8.458 10.17 6.877 (-19%) 1.41 1 15.797 20.22 8.821 (-44%) 1.74 0.8 20.731 21.71 10.857 (-48%) 2.08 0.4 41.55 46.51 22.225 (-47%) 3.67 0.2 86.279 92.93 45.706 (-47%) 6.95

Slide 55

Slide 55 text

GC

Slide 56

Slide 56 text

Troubleshooting GC Use -Xlog:gc / -XX:+PrintGCDetails

Slide 57

Slide 57 text

Troubleshooting GC with JFR/JMC

Slide 58

Slide 58 text

Settings properly GC: Metadata Threshold To avoid Full GC for loading more class anre Metaspace resize: Set initial Metaspace size high enough to load all your required classes -XX:MetaspaceSize=512M

Slide 59

Slide 59 text

Settings properly GC Use a fixed heap size : -Xms = -Xmx -XX:InitialHeapSize = -XX:MaxHeapSize Heap resize done during Full GC for SerialGC & Parallel GC. G1 is able to resize without FullGC (regions, not the metaspace)

Slide 60

Slide 60 text

GC ergonomics: GC selection To verify in log GC (-Xlog:gc): CPU Memory GC < 2 < 2GB Serial β‰₯ 2 < 2GB Serial < 2 β‰₯ 2GB Serial β‰₯ 2 β‰₯ 2 GB Parallel(

Slide 61

Slide 61 text

GC ergonomics: # threads selection -XX:ParallelGCThreads= Used for Parallelizing work during STW phases -XX:ConcGCThreads= Used for concurrent work while application is running

Slide 62

Slide 62 text

CPU resource tuning

Slide 63

Slide 63 text

CPU Resources shares, quotas ?

Slide 64

Slide 64 text

CPU shares Sharing cpu among containers of a node Correspond to Requests for Kubernetes Allow to use all the CPUs if needed sharing with all others containers $ cat /sys/fs/cgroup/cpu.weight 20 $ cat /sys/fs/cgroup/cpu.weight 10 resources: requests: cpu: 500m resources: requests: cpu: 250m

Slide 65

Slide 65 text

CPU quotas Fixing limits of CPU used by a container Correspond to Limits to kubernetes resources: limits: cpu: 500m resources: limits: cpu: 250m $ cat /sys/fs/cgroup/cpu.max 50000 100000 $ cat /sys/fs/cgroup/cpu.max 25000 100000

Slide 66

Slide 66 text

Shares / Quotas Shares and quota have nothing to do with a hardware socket resources: limits: cpu: 1 limits: cpu: 1

Slide 67

Slide 67 text

availableProcessors ergonomics Setting CPU shares/quotas have a direct impact on Runtime.availableProcessors() API Runtime.availableProcessors() API is used to : ● size some concurrent structures ● ForkJoinPool, used for Parallel Streams, CompletableFuture, …

Slide 68

Slide 68 text

Tuning CPU Trade-off cpu needs for startup time VS request time ● Adjust CPU shares / CPU quotas ● Adjust liveness timeout ● Use readiness / startup probes

Slide 69

Slide 69 text

Conclusion

Slide 70

Slide 70 text

Memory ● JVM memory is not only Java heap ● Native parts are less known, and difficult to monitor and estimate ● Yet they are important moving part to account to avoid OOMKills ● Bonus revise virtual memory πŸ’‘ Careful with the units, KB β‰  KiB

Slide 71

Slide 71 text

Startup ● Containers with <2 cpus are an constraint environment for JVM ● Need to keep in mind that JVM subsystems like JIT or GC need to be adjusted for requirements ● To be aware of these subsystems helps to find the balance between resources and requirements of your application

Slide 72

Slide 72 text

References

Slide 73

Slide 73 text

References Using Jdk Flight Recorder and JDK Mission Control MaxRAMPercentage is not what I wished for Off-Heap reconnaissance Startup, Containers and TieredCompilation Hotspot JVM performance tuning guidelines Application Dynamic Class Data Sharing in HotSpot JVM Jdk18 G1 Parallel GC Changes Unthrottled fixing cpu limits in the cloud Best Practices Java single-core containers Containerize your Java applications UniversitΓ© Γ  Devoxx 2022