Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Memory API: Patterns, Use Cases, and Performance

José
October 08, 2024

Memory API: Patterns, Use Cases, and Performance

Using the off-heap memory to store and process large amounts of data didn't change in Java since Java SE 4, when ByteBuffer was introduced. Since then, operating systems moved from 32 bits to 64 bits, and the available RAM in a regular machine moved from megabytes to gigabytes, and more. Another API was much needed, as a ByteBuffer is a 32 bits buffer, not enough for modern applications. First published as a preview feature in Java SE 19, the Foreign Function and Memory API made it as a final feature in Java SE 22. The Memory part brings several new concepts. Among them Arenas and MemorySegments now give you the possibility to manage gigabytes of contiguous off-heap memory, with a very elegant layout model. On the other hand, MemoryLayout allows for a C-struct like organization of your data in memory.
This presentation shows you this complex API, in a step by step approach. It explains how your data is organized and aligned in memory, and the impact it has on the API. It also focuses on the delicate use of VarHandle, a critical element to access your data. It then shows you how you can load large files in memory segments, and shows you the performance you can get in the processing of billions of data elements.

José

October 08, 2024
Tweet

More Decks by José

Other Decks in Programming

Transcript

  1. Memory API: Patterns, Uses Cases, and Performance From the Panama

    Foreign Functions and Memory API José Paumard Java Developer Advocate Java Platform Group Rémi Forax Maître de conferences Université Gustave Eiffel
  2. https://twitter.com/Nope! https://github.com/forax https://speakerdeck.com/forax OpenJDK, ASM, Tatoo, Pro, etc… One of

    the Father of invokedynamic (Java 7) Lambda (Java 8), Module (Java 9) Constant dynamic (Java 11) Record, text blocks, sealed types (Java 14 / 15) Valhalla (Java 25+)
  3. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 5 Tune

    in! Inside Java Newscast JEP Café Road To 21 series Inside.java Inside Java Podcast Sip of Java Cracking the Java coding interview
  4. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 11 Final

    Feature in the JDK 22 https://openjdk.org/projects/panama/
  5. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 13 Heal

    the rift between Java and C Fixing issues in the Java NIO API Namely, fix and update what you can do with ByteBuffer ByteBuffer where released in Java 4, in 2002 What is Panama About?
  6. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 14 New

    assert keyword Exception chaining XML Parser Java NIO! (New Input / Output, JSR 51) What’s New in Java 4? (2002)
  7. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 15 New

    assert keyword Exception chaining XML Parser Java NIO What’s New in Java 4?
  8. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 16 Dynamic

    Web Site in 2002 IE 5.5 SP2? IE 6? ActiveXObject("Microsoft.XMLHTTP") Tomcat 3 Maybe 4?
  9. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 17 The

    World Before Java 4 heap memory native memory write() read() array[i] array[i] Servlet doPost() doGet() ... COPY!
  10. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 18 The

    World with NIO heap memory native memory write() read() buf.put() buf.get() ByteBuffer position limit capacity ... Servlet doPost() doGet() ...
  11. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 19 Off-heap

    allocation File mapping Creating a ByteBuffer var buffer = ByteBuffer.allocateDirect(1_024); // int var buffer = FileChannel.map( READ_WRITE, position, size); // longs
  12. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 22 Too

    high level for a Memory Access API position, capacity, reset are not needed 32 bits indexing only Allow for unaligned access, but may be very slow Non-deterministic deallocation! closing a mapped file does not close the ByteBuffer Issues with the ByteBuffer API
  13. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 23 The

    GC - selects a region containing the ByteBuffer - then sees that the ByteBuffer is dead - then a Cleaner code (weakref) is pushed to a Cleaner queue Later, a cleaner thread dequeues the Cleaner code and calls free on the off-heap memory (or not…) How is Deallocation Working?
  14. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 24 Panama

    brings a new API At a lower level than the ByteBuffer API The goal is to fix these issues Called the MemorySegment API Welcome to Panama
  15. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 26 A

    MemorySegment: - is safe (cannot be used once freed) - gives you control over the allocation / deallocation - brings close to C performance - offers direct access, indexed 64 bits access, structured access - opt-in unsafe access (for C interop, may crash later) - retrofit ByteBuffer on top MemorySegment
  16. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 27 It

    is unsafe! - Close to C performance - No use after free protection (security) - Can peek/poke everywhere (may crash later) - No null check for on heap array access (may crash) Memory access methods are - deprecated for removal (JEP 471) - warnings since 2006 What about sun.misc.Unsafe?
  17. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 29 Most

    CPU require your data to be aligned in memory These are properly aligned Alignement 0x00FFA000 0x00FFA004 0x00FFA008 0x00FFA00C 0x00FFA010 byte short int short byte
  18. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 30 Most

    CPU require your data to be aligned in memory These are misaligned Alignement 0x00FFA000 0x00FFA004 0x00FFA008 0x00FFA00C 0x00FFA010 short int
  19. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 31 Off-heap

    allocation, direct access The MemorySegment API var arena = Arena.global(); var segment = arena.allocation(1_024); // long, off-heap segment.set(ValueLayout.JAVA_INT, 4L, 42); var value = segment.get(ValueLayout.JAVA_INT, 4L);
  20. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 32 Heap

    allocation, indexed access The MemorySegment API var ints = new int[] {1, 2, 3, 4}; var segment = MemorySegment.ofArray(ints); // on-heap segment.setAtIndex(ValueLayout.JAVA_INT, 2L, 65); var cell = segment.getAtIndex(ValueLayout.JAVA_INT, 2L);
  21. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 33 Setting

    strings adds a trailing \0 The MemorySegment API: String var segment = Arena.global().allocate(512L); println("segment " + segment); segment.setString(0, "hello", UTF_8); // adds a '\0’ var text = segment.getString(0, UTF_8); println(text);
  22. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 34 Copy

    from on-heap to off-heap The MemorySegment API: File Mapping var ints = new int[] {1, 2, 3, 4}; var arraySegment = MemorySegment.ofArray(ints); // on-heap var offHeapSegment = Arena.global().allocate(64L); // off-heap offHeapSegment.copyFrom(arraySegment);
  23. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 35 Writing

    data to a mapped file: you need a ByteBuffer The MemorySegment API: File Mapping var ints = new int[] {1, 2, 3, 4}; var arraySegment = MemorySegment.ofArray(ints); // on-heap var offHeapSegment = Arena.global().allocate(64L); // off-heap offHeapSegment.copyFrom(arraySegment); var byteBuffer = offHeapSegment.asByteBuffer(); // this is a view! byteBuffer.limit( ints.length * (int) ValueLayout.JAVA_INT.byteSize()); try (var file = FileChannel.open(path, CREATE, WRITE)) { file.write(byteBuffer); }
  24. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 38 An

    Arena can create off-heap memory segments It initializes memory segment with zeroes It is AutoCloseable (more on this in a mn) It deallocates the memory segments it created on close() Introducing Arena
  25. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 39 There

    are four of them: What is this Arena object? var global = Arena.global(); // singleton var confined = Arena.ofConfined(); var shared = Arena.ofShared(); var auto = Arena.ofAuto(); // supports legacy // ByteBuffer semantics
  26. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 40 There

    are four of them: What is this Arena object? Closeable Bounded Lifetime Shared among threads Confined Yes Yes No Shared Yes Yes Yes Global No No Yes Auto No Yes Yes
  27. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 41 int[]

    vs memorySegment.get(JAVA_INT, ...) Benchmarks Array 0.728 ± 0.009 ns/op OfArray 1.358 ± 0.003 ns/op Confined 1.255 ± 0.002 ns/op Auto 1.254 ± 0.002 ns/op Shared 1.254 ± 0.013 ns/op Global 1.258 ± 0.026 ns/op Unsafe 0.627 ± 0.001 ns/op
  28. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 42 Looping

    and summing 512 ints Benchmarks Array 128.338 ± 0.084 ns/op OfArray 131.927 ± 0.761 ns/op Confined 131.829 ± 0.077 ns/op Auto 131.832 ± 0.491 ns/op Shared 131.760 ± 0.068 ns/op Global 131.727 ± 0.137 ns/op Unsafe 128.083 ± 0.131 ns/op
  29. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 43 Access

    time is independent of the type of arena For random direct access - Overhead is important (2x) - 3 checks 1. Is it the right thread? 2. Has the Arena been Closed? 3. Is access in bounds? Performance
  30. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 44 Access

    time is independent of the type of arena For loop + indexed access - Fixed cost at the beginning of the loop - 3 Checks are hoisted out of the loop 1. Is it the right thread? Is done once 2. Has the Arena been Closed? Is done once 3. Is access in bounds? Is elided Performance
  31. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 45 Memory

    fragmentation Application integrity JExtract Memory Layout Structured memory access with offsets and VarHandle Using Records to create a simple access API After the Break
  32. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 47 About

    Arenas: - Different arena types with different semantics - Same access time What about allocation / deallocation? Welcome Back!
  33. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 48 Allocation/Deallocation

    of an Arena + MemorySegment Benchmarks Array 2.522 ± 0.015 ns/op OfArray 6.494 ± 0.093 ns/op Confined 82.287 ± 1.530 ns/op Unsafe (malloc) 22.834 ± 0.097 ns/op Unsafe with init 72.338 ± 0.390 ns/op
  34. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 49 With

    two arenas How Do Arenas Allocate Segments? var arena1 = Arena.ofConfined(); var s11 = arena1.allocate(...); heap memory native memory Arena JVM s11
  35. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 50 With

    two arenas How Do Arenas Allocate Segments? var arena1 = Arena.ofConfined(); var s11 = arena1.allocate(...); var s12 = arena1.allocate(...); heap memory native memory Arena JVM s11 s12
  36. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 51 With

    two arenas How Do Arenas Allocate Segments? var arena1 = Arena.ofConfined(); var arena2 = Arena.ofConfined(); var s21 = arena2.allocate(...); var s11 = arena1.allocate(...); var s22 = arena2.allocate(...); var s12 = arena1.allocate(...); var s23 = arena2.allocate(...); var s24 = arena2.allocate(...); memory s22 s21 s23 s22 s24 s21
  37. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 52 Then

    arena1 is closed And all its memory segments are deallocated How Do Arenas Allocate Segments? memory arena1.close(); s22 s23 s24 s21
  38. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 53 Then

    arena3 is created How Do Arenas Allocate Segments? memory var arena3 = Arena.ofConfined(); var s31 = arena3.allocate(...); s22 s23 s24 s31 s21
  39. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 54 Then

    arena3 is created How Do Arenas Allocate Segments? memory var arena3 = Arena.ofConfined(); var s31 = arena3.allocate(...); var s32 = arena3.allocate(...); s32 s22 s23 s24 s31 s21 X X
  40. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 55 Then

    arena3 is created And you end up with fragmentation! How Do Arenas Allocate Segments? memory var arena3 = Arena.ofConfined(); var s31 = arena3.allocate(...); var s32 = arena3.allocate(...); s31 s22 s23 s24 s31 s21
  41. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 56 You

    end up with holes in your memory It leads to two problems: - Allocation is slow, you need to find a large enough space for your memory segment - Not enough contiguous free memory may prevent the creation of a large memory segment Fragmentation
  42. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 57 Allocation/Deallocation

    of an Arena + MemorySegment Benchmarks Array 2.522 ± 0.015 ns/op OfArray 6.494 ± 0.093 ns/op Confined 82.287 ± 1.530 ns/op Auto 434.694 ± 247.868 ns/op Shared 6696.144 ± 37.833 ns/op Unsafe (malloc) 22.834 ± 0.097 ns/op Unsafe with init 72.338 ± 0.390 ns/op
  43. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 58 ofAuto():

    close() has the same semantics as ByteBuffer Slooooow! In the worst case scenario it calls System.gc() (even slooooooower!) Deallocation / close()
  44. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 59 ofShared():

    you want to avoid having a volatile access in get() (to know if the arena has been closed) close() ask all threads to go to a (GC) safepoint checks method on top of the stack is annotated as performing an access checks if the locals contains the closing arena ⇒ Linear with the number of platform threads Deallocation / close()
  45. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 63 Memory

    segments are safe by default - even creating a memory segment from a long is safe (byteSize is 0) MemorySegment.reinterpret(newSize) - opt-in to unsafe - requires --enable-native-access on the command line - emits a warning in Java 23, will be an error in the future Unsafe MemorySegment
  46. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 65 JEP

    261: Module System JEP 260: Encapsulate Most Internal APIs JEP 396: Strongly Encapsulate JDK Internals by Default JEP 403: Strongly Encapsulate JDK Internals JEP 451: Prepare to Disallow the Dynamic Loading of Agents JEP 471: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal JEP 472: Prepare to Restrict the Use of JNI Draft JEP: Integrity by Default
  47. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 68 Simplify

    C interoperability - JExtract takes a .h file and creates java classes from it - It creates one class for the .h with the function definitions - Then one per struct What is JExtract?
  48. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 69 Uses

    LLVM internally To correctly parse C declarations And to extract platform/OS definitions (eg: what is the size of an int) It is an external tool, that needs to be downloaded separately JExtract 22 works fine with the JDK 23+ What is JExtract?
  49. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 70 It

    is an interface that describe a piece of memory What is MemoryLayout? MemoryLayout SequencedLayout GroupLayout PaddingLayout ValueLayout StructLayout UnionLayout
  50. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 71 Defining

    a struct Point A MemoryLayout can be Named var pointLayout = MemoryLayout.structLayout( ValueLayout.JAVA_INT, ValueLayout.JAVA_INT );
  51. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 72 Size

    of the struct Point, offset of x and y in Point MemoryLayout Size and Offset var pointLayout = MemoryLayout.structLayout( ValueLayout.JAVA_INT, ValueLayout.JAVA_INT ); long pointLayoutSize = pointLayout.byteSize(); long xOffset = pointLayout.byteOffset(0); // by index
  52. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 73 Size

    of the struct Point, offset of x and y in Point MemoryLayout Size and Offset var pointLayout = MemoryLayout.structLayout( ValueLayout.JAVA_INT.withName("x"), ValueLayout.JAVA_INT.withName("y") ).withName("point"); long pointLayoutSize = pointLayout.byteSize(); long xOffset = pointLayout.byteOffset(0); // by index long yOffset = pointLayout.byteOffset("y"); // by offset
  53. 10/7/2024 Copyright © 2023, Oracle and/or its affiliates 74 Sometimes

    memory layouts need padding Alignement and Padding struct { char kind; int payload; char extra; } 0 32 ki 64 16 48 payload padding static final MemoryLayout LAYOUT = MemoryLayout.structLayout( ValueLayout.JAVA_BYTE.withName("kind"), MemoryLayout.paddingLayout(3), ValueLayout.JAVA_INT.withName("payload"), ValueLayout.JAVA_BYTE.withName("extra"), MemoryLayout.paddingLayout(3) ); 80 96 ex padding
  54. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 77 An

    object that gives access to fields with different semantics: - a get / set access (plain, opaque, volatile) - and concurrent access: compareAndSet, getAndAdd, … It hides the offset and size computations to access the elements of your memory layout What is a VarHandle?
  55. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 79 1)

    The compiler doesn’t do any type checking: it trusts you! (and the different IDE are not there to hep you…) 2) It allows conversion at runtime It can be convenient, but can lead to autoboxing You can use withInvokeExactBehavior() VarHandle Caveats
  56. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 80 Compute

    the sum of 512 point.x + point.y OfArray with offset 214.479 ± 1.712 ns/op OfArray with VarHandle 141.404 ± 0.081 ns/op Unsafe with offset 137.518 ± 0.476 ns/op Arena with offset 212.881 ± 3.436 ns/op Arena with VarHandle 141.190 ± 0.107 ns/op Benchmarks
  57. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 81 Offset

    computation by the user is slow VarHandle offers an access pattern to the JVM, which gives you better performance JExtract does not create VarHandles, only offsets Using Offsets or VarHandle?
  58. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 83 Automatically

    maps MemoryLayout to Records, the API is simpler If the JVM can see the record creation and its last use, then the escape analysis can remove the allocation Which will be guaranteed when Valhalla is there! Record Mapping API
  59. 10/7/2024 Copyright © 2024, Oracle and/or its affiliates 85 It

    allows users to use strings for struct field: simpler API. The string must be a constant If not the JIT will generate a cascade of if-else Performance cliff, hard to diagnose Access API