Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevoxxFr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneurs JVM

DevoxxFr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneurs JVM

Mes conteneurs JVM sont en prod, oups ils se font _oomkill_, _oups_ le démarrage traîne en longueur, _oups_ ils sont lent en permanence. Nous avons vécu ces situations.

Ces problèmes émergent parce qu’un conteneur est par nature un milieu restreint. Sa configuration a un impact sur le process Java cependant ce process a lui aussi des besoins pour fonctionner.

Il y a un espace entre la heap Java et le RSS : c’est la mémoire off-heap et elle se décompose en plusieurs zones. À quoi servent-elles ? Comment les prendre en compte ? La configuration du CPU impacte la JVM sur divers aspects : Quelles sont les influences entre le GC et le CPU ? Que choisir entre la rapidité ou la consommation CPU au démarrage ?

Au cours de cette université nous verrons comment diagnostiquer, comprendre et remédier à ces problèmes.

Brice Dutheil

April 20, 2022
Tweet

More Decks by Brice Dutheil

Other Decks in Programming

Transcript

  1. Agenda My container gets oomkilled How does the memory actually

    work Some case in hands Container gets respawned Things that slow down startup Break
  2. The containers are restarting. What’s going on ? $ kubectl

    get pods NAME READY STATUS RESTARTS AGE my-pod-5759f56c55-cjv57 3/3 Running 7 3d1h
  3. The containers are restarting. What’s going on ? On Kubernetes,

    one should inspect the suspicious pod $ kubectl describe pod my-pod-5759f56c55-cjv57 ... State: Running Started: Mon, 06 Jun 2020 13:39:40 +0200 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Thu, 06 Jun 2020 09:20:21 +0200 Finished: Mon, 06 Jun 2020 13:39:38 +0200
  4. 🔥 🔥 🔥🔥 🔥 🔥 🔥 🚨 Crisis mode 🚨

    If containers are oomkilled Just increase the container memory limits and investigate later
  5. Monitor the oomkills In Kubernetes cluster monitor the terminations with

    metrics • kube_pod_container_status_last_terminated_reason, if the exit code is 137, the attached reason label will be set to OOMKilled • Trigger an alert by coupling with kube_pod_container_status_restarts_total
  6. Monitor the resident memory of a process Depending on the

    telemetry libraries (eg Micrometer) you may have those • Heap Max : jvm_memory_max_bytes • Heap Live : jvm_memory_bytes_used • Process RSS : process_memory_rss_bytes And system ones, eg Kubernetes metrics • Container RSS : container_memory_rss • Memory limit : kube_pod_container_resource_limits_memory_bytes
  7. 💡 Pay attention to the unit The SI notation, decimal

    based : 1 MB reads as megabyte and means 1000² bytes The IEC notation, binary based : 1 MiB reads as mebibyte and means 1024² bytes ⚠ The JVM uses the binary notation, but uses the legacy units KB, MB, etc. OS command line tools generally use the binary notation. https://en.wikipedia.org/wiki/Binary_prefix#/media/File:Binaryvdecimal.svg At gigabyte scale the difference is almost 7% 1GB ≃ 0.93 GiB
  8. Linux Oomkiller • Out Of Memory Killer Linux mechanism employed

    to kill processes when the memory is critically low • For regular processes the oomkiller selects the bad ones. • Within a restrained container, i.e. with memory limits, ◦ If available memory reaches 0 in this container then the oomkiller terminates all processes ◦ There is usually a single process in container
  9. Linux oomkiller Oomkills can be reproduced synthetically docker run --memory-swap=100m

    --memory=100m \ --rm -it azul/zulu-openjdk:11 \ java -Xms100m -XX:+AlwaysPreTouch --version
  10. Linux oomkiller And in the system logs $ tail -50

    -f $HOME/Library/Containers/com.docker.docker/Data/log/vm/console.log ... [ 6744.445271] java invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 ... [ 6744.451951] Memory cgroup out of memory: Killed process 4379 (java) total-vm:3106656kB, anon-rss:100844kB, file-rss:15252kB, shmem-rss:0kB, UID:0 pgtables:432kB oom_score_adj:0 [ 6744.473995] oom_reaper: reaped process 4379 (java), now anon-rss:0kB, file-rss:32kB, shmem-rss:0kB ...
  11. Oomkilled ? Is it a memory leak ? or …

    Is it misconfiguration ?
  12. Memory As JVM based developers • Used to think about

    JVM Heap sizing, mostly -Xms, -Xmx, … • Possibly some deployment use container-aware flags: -XX:MinRAMPercentage, -XX:MaxRAMPercentage, … JVM Heap Xmx or MaxRAMPercentage
  13. Why should I be concerned by native? • But ignoring

    other memory zones 💡 Referred to as native memory, Or on the JVM as off-heap memory
  14. Why should I be concerned by native? JDK9 landed container

    support! Still need to need to make the cross multiplication yourself.
  15. Why should I be concerned by native? Still have no

    idea what is happening off-heap JVM Heap ❓
  16. Why should I be concerned by native? Still have no

    idea what is happening off-heap JVM Heap https://giphy.com/gifs/bitcoin-crypto-blockchain-trN9ht5RlE3Dcwavg2
  17. If you don’t know what’s there, … How can you

    size properly the heap or the container ? JVM Heap
  18. JVM Memory Breakdown Running A JVM requires memory: • The

    Java Heap • The Meta Space (pre-JDK 8 the Permanent Generation) • …
  19. JVM Memory Breakdown Running A JVM requires memory: • The

    Java Heap • The Meta Space (pre-JDK 8 the Permanent Generation) • Direct byte buffers • Code cache (compiled code) • Garbage Collector (like card table) • Compiler (C1/C2) • Symbols • etc.
  20. JVM Memory Breakdown Running A JVM requires memory: • The

    Java Heap • The Meta Space (pre-JDK 8 the Permanent Generation) • Direct byte buffers • Code cache (compiled code) • Garbage Collector (like card table) • Compiler (C1/C2) • Threads • Symbols • etc. JVM subsystems
  21. JVM Memory Breakdown Except a few flags for meta space,

    code cache, or direct memory There’s no control over memory consumption of the other components But It is possible to get their size at runtime.
  22. Let’s try first to monitor Eg with micrometer time series

    • jvm_memory_used_bytes • jvm_memory_committed_bytes • jvm_memory_max_bytes Dimensions • area : heap or nonheap • id : memory zone, depends on GC and JVM jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 8231168.0 jvm_memory_used_bytes{area="heap",id="G1 Survivor Space",} 5242880.0 jvm_memory_used_bytes{area="heap",id="G1 Old Gen",} 1.164288E7 jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 4.180964E7 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 1233536.0 jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 1.2582912E7 jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 5207416.0 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1590528.0
  23. Let’s try first to monitor Don’t forget the JVM native

    buffers • jvm_buffer_total_capacity_bytes
  24. Monitoring is only as good as data is there Observability

    metrics rely on MBean to get memory areas Most JVM don’t export metrics for everything that uses memory
  25. RSS is the real footprint $ ps o pid,rss -p

    $(pidof java) PID RSS 6 4701120
  26. jcmd – a swiss knife Get the actual flag values

    $ jcmd $(pidof java) VM.flags | tr ' ' '\n' 6: ... -XX:InitialHeapSize=4563402752 -XX:InitialRAMPercentage=85.000000 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=4563402752 -XX:MaxNewSize=2736783360 -XX:MaxRAMPercentage=85.000000 -XX:MinHeapDeltaBytes=2097152 -XX:NativeMemoryTracking=summary ... PID Xms Xmx
  27. JVM’s Native Memory Tracking 1. Start the JVM with -XX:NativeMemoryTracking=summary

    2. Later run jcmd $(pidof java) VM.native_memory Modes • summary • detail • baseline/ diff
  28. $ jcmd $(pidof java) VM.native_memory 6: Native Memory Tracking: Total:

    reserved=7168324KB, committed=5380868KB - Java Heap (reserved=4456448KB, committed=4456448KB) (mmap: reserved=4456448KB, committed=4456448KB) - Class (reserved=1195628KB, committed=165788KB) (classes #28431) ( instance classes #26792, array classes #1639) (malloc=5740KB #87822) (mmap: reserved=1189888KB, committed=160048KB) ( Metadata: ) ( reserved=141312KB, committed=139876KB) ( used=135945KB) ( free=3931KB) ( waste=0KB =0.00%) ( Class space:) ( reserved=1048576KB, committed=20172KB) ( used=17864KB) ( free=2308KB) ( waste=0KB =0.00%) - Thread (reserved=696395KB, committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347) - Code (reserved=251877KB, committed=105201KB) (malloc=4189KB #11718) (mmap: reserved=247688KB, committed=101012KB) - GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB) - Compiler (reserved=5914KB, committed=5914KB) (malloc=6143KB #3281) (arena=180KB #5) - Internal (reserved=24460KB, committed=24460KB) (malloc=24460KB #13140) - Other (reserved=267034KB, committed=267034KB)
  29. $ jcmd $(pidof java) VM.native_memory 6: Native Memory Tracking: Total:

    reserved=7168324KB, committed=5380868KB - Java Heap (reserved=4456448KB, committed=4456448KB) (mmap: reserved=4456448KB, committed=4456448KB) - Class (reserved=1195628KB, committed=165788KB) (classes #28431) ( instance classes #26792, array classes #1639) (malloc=5740KB #87822) (mmap: reserved=1189888KB, committed=160048KB) ( Metadata: ) ( reserved=141312KB, committed=139876KB) ( used=135945KB) ( free=3931KB) ( waste=0KB =0.00%) ( Class space:) ( reserved=1048576KB, committed=20172KB) ( used=17864KB) ( free=2308KB) ( waste=0KB =0.00%) - Thread (reserved=696395KB, committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347) - Code (reserved=251877KB, committed=105201KB) (malloc=4189KB #11718) (mmap: reserved=247688KB, committed=101012KB) - GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB) - Compiler (reserved=5914KB, committed=5914KB) (malloc=6143KB #3281) (arena=180KB #5) - Internal (reserved=24460KB, committed=24460KB) (malloc=24460KB #13140) - Other (reserved=267034KB, committed=267034KB) (classes #28431) (thread #674) Java Heap (reserved=4456448KB, committed=4456448KB)
  30. $ jcmd $(pidof java) VM.native_memory 6: Native Memory Tracking: Total:

    reserved=7168324KB, committed=5380868KB - Java Heap (reserved=4456448KB, committed=4456448KB) (mmap: reserved=4456448KB, committed=4456448KB) - Class (reserved=1195628KB, committed=165788KB) (classes #28431) ( instance classes #26792, array classes #1639) (malloc=5740KB #87822) (mmap: reserved=1189888KB, committed=160048KB) ( Metadata: ) ( reserved=141312KB, committed=139876KB) ( used=135945KB) ( free=3931KB) ( waste=0KB =0.00%) ( Class space:) ( reserved=1048576KB, committed=20172KB) ( used=17864KB) ( free=2308KB) ( waste=0KB =0.00%) - Thread (reserved=696395KB, committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347) - Code (reserved=251877KB, committed=105201KB) (malloc=4189KB #11718) (mmap: reserved=247688KB, committed=101012KB) - GC (reserved=230739KB, committed=230739KB) (malloc=32031KB #63631) (mmap: reserved=198708KB, committed=198708KB) - Compiler (reserved=5914KB, committed=5914KB) (malloc=6143KB #3281) (arena=180KB #5) - Internal (reserved=24460KB, committed=24460KB) (malloc=24460KB #13140) - Other (reserved=267034KB, committed=267034KB) (malloc=267034KB #631) - Symbol (reserved=28915KB, committed=28915KB) (malloc=25423KB #330973) (arena=3492KB #1) - Native Memory Tracking (reserved=8433KB, committed=8433KB) (malloc=117KB #1498) (tracking overhead=8316KB) - Arena Chunk (reserved=217KB, committed=217KB) (malloc=217KB) - Logging (reserved=7KB, committed=7KB) (malloc=7KB #266) - Arguments (reserved=19KB, committed=19KB) (malloc=19KB #521) Total: reserved=7168324KB, committed=5380868KB Class (reserved=1195628KB, committed=165788KB) Thread (reserved=696395KB, committed=85455KB) Code (reserved=251877KB, committed=105201KB) GC (reserved=230739KB, committed=230739KB) Compiler (reserved=5914KB, committed=5914KB) Internal (reserved=24460KB, committed=24460KB) Other (reserved=267034KB, committed=267034KB)
  31. Direct byte buffers Those are the memory segments that are

    allocated outside the Java heap. Unused buffers are only freed upon GC. Netty for example use them. • < JDK 11, they are reported in the Internal section • ≥ JDK 11, they are reported in the Other section Internal (reserved=24460KB, committed=24460KB) Other (reserved=267034KB, committed=267034KB)
  32. Garbage Collection GC is actually more than only taking care

    of the garbage. It’s a full blown memory management for the Java Heap, and it requires memory for its internal data structures (E.g. for G1 regions, remembered sets, etc.) On small containers this might be a thing to consider GC (reserved=230739KB, committed=230739KB)
  33. Threads Threads also appear to take some space Thread (reserved=696395KB,

    committed=85455KB) (thread #674) (stack: reserved=692812KB, committed=81872KB) (malloc=2432KB #4046) (arena=1150KB #1347)
  34. Native Memory Tracking Good insights on the JVM sub-systems, but

    Does NMT show everything ? Is NMT data correct ? ⚠ Careful about the overhead! Measure if this is important for you !
  35. Huh what virtual, committed, reserved memory? virtual memory : memory

    management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large memory". reserved memory : Contiguous chunk of memory from the virtual memory that the program requested to the OS. committed memory : writable subset of reserved memory, might be backed by physical storage
  36. Huh, what virtual, committed, reserved memory? used heap : amount

    of memory occupied by live objects and to a certain extent object that are unreachable but not yet collected by the GC committed heap : the size of the writable heap memory where the JVM can write objects. This value seats between -Xms and -Xmx values heap max size : the limit of the heap (-Xmx) #JVM
  37. Native Memory Tracking Basically what NMT show is : how

    the JVM subsystems are using the available space
  38. Native Memory Tracking Good insights on the JVM sub-systems, but

    Does NMT show everything ? Is NMT data correct ?
  39. Virtual memory ? Virtual memory implies memory management. It is

    a OS feature • to maximize the utilization of the physical RAM • to reduce the complexity of handling shared access to physical RAM By providing processes an abstraction of the available memory
  40. Virtual memory On Linux memory is split in pages (usually

    4 KiB) Never used pages remain virtual, that is without physical storage Used pages is called Resident memory
  41. Virtual memory The numbers shown in NMT are actually about

    what the JVM asked for. Total: reserved=7168324KB, committed=5380868KB
  42. Native Memory Tracking Good insights on the JVM sub-systems, but

    Does NMT show everything ? Nope Is NMT data correct ? Yes, but not for resident memory usage
  43. What does it means with JVM flags ? For the

    Java heap, -Xms / -Xmx ⇒ indication of how much memory heap memory is reserved
  44. What does it means with JVM flags ? For the

    Java heap, -Xms / -Xmx ⇒ indication of how much memory heap memory is reserved Also -XX:MaxPermSize, -XX:MaxMetaspaceSize, -Xss, -XX:MaxDirectMemorySize ⇒ indication of how much memory heap memory is/can be reserved These flags do have a big impact on JVM subsystems, as they may trigger or not some behaviors, like : - GC if metaspace is too small - Heap resizing ig Xms ≠ Xmx - …
  45. 💡Memory mapped files They are not reported by Native Memory

    Tracking, yet They can be accounted in the RSS.
  46. 💡Memory mapped files In Java, using FileChannel.read alone, ⇒ Rely

    on the native OS read method (in unistd.h) ⇒ Use the OS page cache
  47. 💡Memory mapped files In Java, using FileChannel.read alone, ⇒ Rely

    on the native OS read method (in unistd.h) ⇒ Use the OS page cache But using FileChannel.map(MapMode, pos, length) ⇒ Rely on the mmap OS method (in sys/mman.h) ⇒ Load the requested content into the addressable space of the process ❌
  48. pmap To really deep dive you need to explore the

    memory mapping • Via /proc/{pid}/smaps • Or via pmap [-x|-X] {pid} Address Kbytes RSS Dirty Mode Mapping ... 00007fe51913b000 572180 20280 0 r--s- large-file.tar.xz ...
  49. How to configure memory requirement Different environments ⇒ different load,

    different subsystem behavior E.g. preprod : 100 req/s 👉 40 java threads total 👉 mostly liveness endpoints 👉 low versatility in data 👉 low GC activity requirement prod-us : 1000 req/s 👉 200 java threads total 👉 mostly business endpoints 👉 variance in data 👉 higher GC requirements
  50. How to configure memory requirement Is it possible to extract

    a formula ? Not that straightforward. Some might point to -XX:*RAMPercentage flags, it sets the Java heap size as a function of the available physical memory. It works. ⚠ -XX:InitialRAMPercentage ⟹ -Xms mem < 96MiB -XX:MinRAMPercentage ⟹ -Xmx mem > 96MiB -XX:MaxRAMPercentage ⟹ -Xmx
  51. How to configure memory requirement Is it possible to extract

    a formula ? 1 GiB 4 GiB prod-us preprod MaxRAMPercentage = 85 Java heap Java heap
  52. How to configure memory requirement Is it possible to extract

    a formula ? 1 GiB 4 GiB prod-us preprod Java heap MaxRAMPercentage = 85 Java heap
  53. How to configure memory requirement Is it possible to extract

    a formula ? 1 GiB 4 GiB prod-us preprod Java heap ≃ 850 MiB Java heap ≃ 3.40 GiB MaxRAMPercentage = 85
  54. How to configure memory requirement Is it possible to extract

    a formula ? 1 GiB 4 GiB prod-us preprod Java heap ≃ 850 MiB Java heap ≃ 3.4 GiB MaxRAMPercentage = 85 ~ 150 MiB for every subsystems Maybe OK for quiet workloads ~ 600 MiB for every subsystems Likely not enough for loaded systems ⟹ leads to oomkill
  55. How to configure memory requirement Traffic, Load are not linear,

    and do not have linear effects • MaxRAMPercentage is a linear function of the container available RAM • Too low MaxRAMPercentage ⟹ waste of space • Too high MaxRAMPercentage ⟹ risk of oomkills • Requires to find the sweet spot for all deployments • Requires to adjust if load changes • Need to convert back a percentage to raw value
  56. How to configure memory requirement -XX:*RAMPercentage flags sort of works,

    Its drawbacks don’t make this quite compelling. ✅ Prefer -Xms / -Xmx
  57. How to configure memory requirement If Xms and Xmx have

    the same size, heap is fixed, so focus on “native” memory RSS • GC internals • Threads • Direct memory buffers • Mapped file buffers • Metaspace • Code cache • …
  58. How to configure memory requirement It is very hard to

    predict the actual requirement to for all these. Can we add the values of these zones? Yes but it’s not really maintainable. Don’t mess with until you actually need to! E.g. for each JVM subsystem, you’ll need to understand to predict the actual size. It’s hard, and requires deep knowledge of the JVM. Just don’t !
  59. How to configure memory requirement In our experience it’s best

    to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. Heap Container memory limit at 5 GiB
  60. How to configure memory requirement In our experience it’s best

    to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. 1. Observe the RSS evolution Heap RSS Container memory limit at 5 GiB
  61. How to configure memory requirement In our experience it’s best

    to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. 1. Observe the RSS evolution 2. If RSS stabilizes after some time Heap RSS RSS stabilizing Container memory limit at 5 GiB
  62. How to configure memory requirement In our experience it’s best

    to actually retrofit. What does it mean ? Give a larger memory limit to the container, much higher than the max heap size. 1. Observe the RSS evolution 2. If RSS stabilizes after some time 3. Set the new memory limit with enough leeway (eg 200 MiB) Heap RSS New memory limit With some leeway for RSS increase Container memory limit at 5 GiB
  63. How to configure memory requirement If the graphs show this

    RSS less than Heap size 🧐 Remember virtual memory! If a page has not been used, then it’s virtual RSS Java heap untouched RSS
  64. How to configure memory requirement ⚠ If the Java heap

    is not fully used (as in RSS), the RSS measure to get the max memory utilisation will be wrong To avoid the virtual memory pitfall use -XX:+AlwaysPreTouch All Java heap pages touched RSS
  65. Case in hand : Netty Buffers • Handles pool of

    DirectByteBuffers (simplified) • Allocates large chunks and subdivides to satisfy allocations Problem: the more requests to handle the more it may allocate buffers & consume direct memory (native) If not capped ⟹ OOMKill
  66. Controlling Netty Buffers • JVM options -XX:MaxDirectMemorySize Hard limit on

    direct ByteBuffers total size Throws OutOfMemoryError • Property io.netty.maxDirectMemory to control only Netty buffers? ⇒ No, it’s more complicated
  67. Controlling Netty Buffers Properties controlling Netty Buffer Pool ThreadLocal Caches!

    Depends on the number of thread in EventLoops • io.netty.allocator.cacheTrimInterval • io.netty.allocator.useCacheForAllThreads • io.netty.allocation.cacheTrimIntervalMillis • io.netty.allocator.maxCachedBufferCapacity • io.netty.allocator.numDirectArenas
  68. Controlling Netty Buffers ThreadLocal Caches! Depends on the number of

    thread in EventLoops private static EventLoopGroup getBossGroup(boolean useEpoll) { if (useEpoll) { return new EpollEventLoopGroup(NB_THREADS); } else { return new NioEventLoopGroup(NB_THREADS); } }
  69. Case in hand : native allocator If something doesn’t add

    up : check the native allocator, but why ? To get memory any program must either • Call the OS asking for a memory mapping via mmap function • Call the C standard library malloc function On Linux, standard library = glibc
  70. Case in hand : native allocator The glibc’s malloc is

    managing memory via technic called Arena memory management Unfortunately there’s no serviceability tooling around glibc arena management (unless modifying the program to call C API) It may be possible to extrapolate things using a tool like pmap
  71. Case in hand : native allocator Analyzing memory mapping 00007fe164000000

    2736 2736 2736 rw--- [ anon ] 00007fe1642ac000 62800 0 0 ----- [ anon ] Virtual 64 MiB RSS ~ 2.6 MiB
  72. Case in hand : native allocator Analyzing memory mapping 00007fe164000000

    2736 2736 2736 rw--- [ anon ] 00007fe1642ac000 62800 0 0 ----- [ anon ] Virtual 64 MiB x 257 ⟹ RSS ~1.2 GiB RSS ~ 2.6 MiB
  73. Case in hand : native allocator • Glibc reacts to

    CPUs, application threads • On each access there’s a lock • Higher number of threads ⟹ higher contention on the arenas ⟹ lead glibc to create more arenas • There are some tuning options in particular MALLOC_ARENA_MAX, M_MMAP_THRESHOLD, … ⚠ Significant understanding of how glibc’s malloc work, allocation size, etc
  74. Case in hand : native allocator Better solution, change the

    application native allocator • tcmalloc from Google’s gperftools • jemalloc from Facebook • minimalloc from Microsoft LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so
  75. Case in hand : native allocator If using tcmalloc or

    jemalloc, one step away from native allocation profiling. Useful to narrow native memory leak.
  76. Quick Fix Increase Probe liveness timeout (either initial delay or

    interval) livenessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 60 periodSeconds: 10
  77. Troubleshooting Compile Time Use jstat -compiler <pid> to see cumulated

    compilation time (in s) $ jstat -compiler 1 Compiled Failed Invalid Time FailedType FailedMethod 6002 0 0 101.16 0
  78. Troubleshooting using JFR Use java -XX:StartFlightRecording jcmd 1 JFR.dump name=1

    filename=petclinic.jfr jfr print --events jdk.CompilerStatistics petclinic.jfr
  79. Measuring startup time docker run --cpus=<n> -ti spring-petclinic CPUs JVM

    Startup time (s) Compile time (s) 4 8.402 17.36 2 8.458 10.17 1 15.797 20.22 0.8 20.731 21.71 0.4 41.55 46.51 0.2 86.279 92.93
  80. C1 vs C2 C1 + C2 C1 only # compiled

    methods 6,117 5,084 # C1 compiled methods 5,254 5,084 # C2 compiled methods 863 0 Total Time (ms) 21,678 1,234 Total Time in C1 (ms) 2,071 1,234 Total Time in C2 (ms) 19,607 0
  81. TieredCompilation Heuristics Level transitions: • 0 ➟ 2 ➟ 3

    ➟ 4 (C2 Q too long) • 0 ➟ (3 ➟ 2) ➟ 4 (C1 Q too long, change level in-Q) • 0 ➟ (3 or 2) ➟ 1 (trivial method or can’t compiled in C2) • 0 ➟ 4 (can’t compiled in C1) Note: level 3 is 30% slower than level 2 Interpreter C1 + Profiling C2 Comp Level 0 3 4 C1 1 C1 + Limited Profiling 2
  82. Compiler Settings To only use C1 JIT compiler: -XX:TieredStopAtLevel=1 To

    adjust C2 compiler threads: -XX:CICompilerCount=<n>
  83. Measuring startup time docker run --cpus=<n> -ti spring-petclinic CPUs JVM

    Startup time (s) Compile time (s) JVM Startup time (s) XX:TieredStopAtLevel=1 Compile time (s) 4 8.402 17.36 6.908 (-18%) 1.47 2 8.458 10.17 6.877 (-19%) 1.41 1 15.797 20.22 8.821 (-44%) 1.74 0.8 20.731 21.71 10.857 (-48%) 2.08 0.4 41.55 46.51 22.225 (-47%) 3.67 0.2 86.279 92.93 45.706 (-47%) 6.95
  84. GC

  85. Settings properly GC: Metadata Threshold To avoid Full GC for

    loading more class anre Metaspace resize: Set initial Metaspace size high enough to load all your required classes -XX:MetaspaceSize=512M
  86. Settings properly GC Use a fixed heap size : -Xms

    = -Xmx -XX:InitialHeapSize = -XX:MaxHeapSize Heap resize done during Full GC for SerialGC & Parallel GC. G1 is able to resize without FullGC (regions, not the metaspace)
  87. GC ergonomics: GC selection To verify in log GC (-Xlog:gc):

    CPU Memory GC < 2 < 2GB Serial ≥ 2 < 2GB Serial < 2 ≥ 2GB Serial ≥ 2 ≥ 2 GB Parallel(<JDK9) / G1(≥JDK9) [0.004s][info][gc] Using G1
  88. GC ergonomics: # threads selection -XX:ParallelGCThreads=<n> Used for Parallelizing work

    during STW phases # physical cores ParallelGCThreads ≤ 8 # cores > 8 8 + ⅝ * (# cores - 8)
  89. GC ergonomics: # threads selection -XX:ConcGCThreads=<n> Used for concurrent work

    while application is running G1 Shenandoah ZGC Max(ParallelGCThreads + 2 / 4, 1) ¼ # cores ¼ if dynamic or ⅛ # cores
  90. CPU shares Sharing cpu among containers of a node Correspond

    to Requests for Kubernetes Allow to use all the CPUs if needed sharing with all others containers $ cat /sys/fs/cgroup/cpu.weight 20 $ cat /sys/fs/cgroup/cpu.weight 10 resources: requests: cpu: 500m resources: requests: cpu: 250m
  91. CPU quotas Fixing limits of CPU used by a container

    Correspond to Limits to kubernetes resources: limits: cpu: 500m resources: limits: cpu: 250m $ cat /sys/fs/cgroup/cpu.max 50000 100000 $ cat /sys/fs/cgroup/cpu.max 25000 100000
  92. Shares / Quotas CPU is shared among multiple process. Ill

    processes could consume all the computing bandwidth. Cgroups help prevent that but require to define boundaries. A 100% A 100% A 100% A 100% C waiting be scheduled B waiting be scheduled 🚦
  93. Shares / Quotas The lower bound of a CPU request

    is called shares. A CPU core is divided in 1024 “slices”. A host with 4 CPU will have 4096 shares.
  94. Shares / Quotas Programs also have the notion of shares.

    The OS will distributes these computing slices propertionnally. Process asking for 1432 shares (~1.4 CPU) Process asking for 2048 shares (2 CPU) Process asking for 616 shares = 4096 Each are guaranteed to have what they asked for.
  95. Shares / Quotas Programs also have the notion of shares.

    The OS will distributes these computing slices propertionnally. Process asking for 1432 shares (~1.4 CPU) Process asking for 2048 shares (2 CPU) Process asking for 616 shares = 4096 Each are guaranteed to have what they asked for. 💡 Upper bounds are not enforced, if there’s CPU available a process can burst
  96. Shares / Quotas Programs also have the notion of shares.

    The OS will distributes these computing slices propertionnally. Process asking for 1432 shares (~1.4 CPU) Process asking for 2048 shares (2 CPU) Process asking for 616 shares = 4096 Each are guaranteed to have what they asked for. 💡 Pod schedulers like Kubernetes, use this mechanism to place a pod were enough computation is available 💡 Upper bounds are not enforced, if there’s CPU available a process can burst
  97. Shares / Quotas Different mechanism used to limit a process.

    CPU is split in periods of 100 ms (by default) A fraction of a CPU is called millicore, and it’s a thousandth Exemple : 100 * ( 500 / 1000 ) = 50ms resources: limits: cpu: 500m 50 ms per period of 100 ms Period millicores cpu fraction
  98. Shares / Quotas Now Indeed, it means the limit applies

    to all accounted cores : 4 CPU ⟹ 4 x 100ms = 400ms resources: limits: cpu: 2500m 250 ms per period of 100 ms 🧐
  99. Shares / Quotas Shares and quota have nothing to do

    with a hardware socket resources: limits: cpu: 1 limits: cpu: 1
  100. Shares / Quotas Shares and quota have nothing to do

    with a hardware socket resources: limits: cpu: 1 limits: cpu: 1
  101. Shares / Quotas If the process reaches its limit, it

    will get throttled ie it will have to wait for the next period. Eg s process can consume a 200ms budget on • 2 cores with 100ms on each • 8 cores with 25ms on each
  102. CPU Throttling When you reach limit with CPU quotas, throttling

    happens Throttling ⟹ STW pauses Monitor throttling: Cgroup v1: /sys/fs/cgroup/cpu,cpuacct/<container>/cpu.stat Cgroup v2: /sys/fs/cgroup/cpu.stat • nr_periods – number of periods that any thread in the cgroup was runnable • nr_throttled – number of runnable periods in which the application used its entire quota and was throttled • throttled_time – sum total amount of time individual threads within the cgroup were throttled
  103. availableProcessors ergonomics Setting CPU shares/quotas have a direct impact on

    Runtime.availableProcessors() API Shares Quotas Period availableProcessors() 4096 -1 100 000 4 (Shares / 1024) 1024 300 000 100 000 3 (Quotas / Period)
  104. availableProcessors ergonomics Runtime.availableProcessors() API is used to : • size

    some concurrent structures • ForkJoinPool, used for Parallel Streams, CompletableFuture, …
  105. Tuning CPU Trade-off cpu needs for startup time VS request

    time • Adjust CPU shares / CPU quotas • Adjust liveness timeout • Use readiness / startup probes
  106. Memory • JVM memory is not only Java heap •

    Native parts are less known, and difficult to monitor and estimate • Yet they are important moving part to account to avoid OOMKills • Bonus revise virtual memory
  107. Startup • Containers with <2 cpus are an constraint environment

    for JVM • Need to keep in mind that JVM subsystems like JIT or GC need to be adjusted for requirements • To be aware of these subsystems helps to find the balance between resources and requirements of your application
  108. References Using Jdk Flight Recorder and Jdk Mission Control MaxRAMPercentage

    is not what I whished for Off-Heap reconnaissance Startup, Containers and TieredCompilation Hotspot JVM performance tuning guidelines Application Dynamic Class Data Sharing in HotSpot JVM Jdk18 G1 Parallel GC Changes Unthrottled fixing cpu limits in the cloud Best Practices Java single-core containers Containerize your Java applications