Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ем на полях Работа с памятью вне Java-кучи: ест...

Michael Storozhilov
February 29, 2020
540

ем на полях Работа с памятью вне Java-кучи: есть ли будущее у ByteBuffer'ов?

Работа с памятью вне Java-кучи: есть ли будущее у ByteBuffer'ов?

На сегодняшний день, ByteBuffer'ы - часть NIO (Non-blocking I/O) API -
являются единственным поддерживаемым способом доступа к памяти вне
Java-кучи (off-heap). Несмотря на популярность, API страдает от
существенных ограничений, что не позволяет достичь оптимальной
производительности в некоторых сценариях использования.

Доклад будет посвящен Memory Access API, который должен появиться в JDK 14 в виде инкубационного модуля (JEP 370). Новый API предоставляет
безопасный и эффективный доступ к внешней памяти из Java: дизайн "с
нуля" освобождает от ограничений присущих NIO и позволяет достичь более высокой скорости работы.

Michael Storozhilov

February 29, 2020
Tweet

More Decks by Michael Storozhilov

Transcript

  1. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Beyond ByteBuffers Vladimir Ivanov Senior Principal Software Engineer HotSpot JVM Compilers, JPG Oracle February, 2020 Surfing off-heap
  2. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
  3. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Going off-heap • Accessing off-heap memory from a Java application can be useful in a number of circumstances: – avoid the cost and unpredictability associated with garbage collection – share memory across multiple processes – interacting with native libraries – allow memory contents to be serialized and deserialized, by mapping files into memory pages (e.g. mmap) • Java’s de facto API for accessing off-heap memory is ByteBuffer – Other alternatives also available: Unsafe, JNI 3 A reality check
  4. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    ByteBuffers • The ByteBuffer API was added in Java SE 1.4 as part of the NIO effort • Goal: provide better scalability (vs. stream-oriented IO) – Buffer-oriented: read data from a source (e.g. FileChannel) into a buffer – Non-blocking: if there’s no data to be read, return immediately (channel dependent) – Multiplexing: handle multiple channels in a single thread, using selectors • Rich and stateful API design to help with idiomatic IO code – Lends well to partial read/writes (useful in network processing/charset encoding) – Built-in prevention for buffer underruns/overruns 5 History
  5. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } }
  6. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } bufferstr positionstr limitstr == capacitystr
  7. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr == capacitystr
  8. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr capacitystr
  9. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr capacitystr
  10. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr capacitystr
  11. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr capacitystr
  12. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr capacitystr
  13. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr positionstr limitstr capacitystr
  14. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr capacitystr positionstr == limitstr
  15. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’ bufferstr capacitystr positionstr == limitstr
  16. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getByte()); } str.clear(); } } bufferstr positionstr limitstr == capacitystr ‘H’ ‘e’ ‘l’ ‘l’ ‘o’ ‘!’
  17. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Towards direct buffers • Problem: how to implement FileChannel.read(ByteBuffer)? – the OS needs a stable address to write things into – a ByteBuffer is backed by a regular Java array which can be moved by the GC • Possible solutions – allocate temporary (off-heap) buffer and manually copy data to the Java heap – Tell the GC not to move the buffer array (aka object pinning) • Better solution: direct buffers – Like a ByteBuffer, but backed by off-heap memory 18
  18. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocate(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getChar()); } str.clear(); } }
  19. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with direct buffers An example try (FileChannel inChannel = aFile.getChannel()) { ByteBuffer str = ByteBuffer.allocateDirect(10); while (inChannel.read(str) > 0) { str.flip(); while (str.hasRemaining()) { System.out.print(str.getChar()); } str.clear(); } }
  20. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap • Direct buffers give developers a new weapon: Allocate and access off-heap memory in a safe and efficient fashion – Bounds check to prevent out-of-bounds access – GC-driven deallocation to prevent clients from accessing already freed buffers – Fast Unsafe access under the hood • But how good direct buffers are as a general off-heap API? 21
  21. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Bounds check static final long BASE_ADDRESS = UNSAFE.allocateMemory(4 * 10); @Benchmark public int testUnsafeGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += UNSAFE.getInt(BASE_ADDRESS + (i * 4)); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s
  22. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Bounds check static final ByteBuffer DBB = ByteBuffer.allocateDirect(4 * 10); @Benchmark public int testBufferGet() { int sum = 0; while (DBB.hasRemaining()) { sum += DBB.getInt(); } DBB.clear(); return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferGet 29.17 Mops/s
  23. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Bounds check static final ByteBuffer DBB = ByteBuffer.allocateDirect(4 * 10); @Benchmark public int testBufferAbsoluteGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += DBB.getInt(i * 4); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferGet 29.17 Mops/s testBufferAbsoluteGet 43.79 Mops/s
  24. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Bounds check static final ByteBuffer DBB = ByteBuffer.allocateDirect(4 * 10).order(ByteOrder.nativeOrder()); @Benchmark public int testBufferAbsoluteNativeGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += DBB.getInt(i * 4); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferGet 29.17 Mops/s testBufferAbsoluteGet 43.79 Mops/s testBufferAbsoluteNativeGet 46.63 Mops/s
  25. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Allocation @Param({"16"}) public int size; @Benchmark public void testUnsafeAllocFree() { long addr = UNSAFE.allocateMemory(size); UNSAFE.freeMemory(addr); } benchmark allocation size throughput testUnsafeAllocFree 16 bytes 7.13 Mops/s
  26. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Allocation @Param({"16"}) public int size; @Benchmark public void testBufferAlloc() { ByteBuffer.allocateDirect(size); } benchmark allocation size throughput testUnsafeAllocFree 16 bytes 7.13 Mops/s testBufferAlloc 16 bytes 1.17 Mops/s
  27. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Allocation benchmark allocation size throughput testUnsafeAllocFree 16 bytes 7.13 Mops/s testBufferAlloc 16 bytes 1.17 Mops/s testBufferAllocFree 16 bytes 3.05 Mops/s @Param({"16"}) public int size; @Benchmark public void testBufferAllocFree() { ByteBuffer dbb = ByteBuffer.allocateDirect(size); UNSAFE.invokeCleaner(dbb); }
  28. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Allocation benchmark allocation size throughput GC rate testUnsafeAllocFree 16 bytes 7.13 Mops/s ≈ 10⁻⁵ bytes/op testBufferAlloc 16 bytes 1.17 Mops/s 136 bytes/op testBufferAllocFree 16 bytes 3.05 Mops/s 136 bytes/op @Param({"16"}) public int size; @Benchmark public void testBufferAllocFree() { ByteBuffer dbb = ByteBuffer.allocateDirect(size); UNSAFE.invokeCleaner(dbb); }
  29. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Allocation benchmark allocation size throughput GC rate testUnsafeAllocFree 16 bytes 7.13 Mops/s ≈ 10⁻⁵ bytes/op testBufferAlloc 16 bytes 1.17 Mops/s 136 bytes/op testBufferAllocFree 16 bytes 3.05 Mops/s 136 bytes/op testBufferAllocFree 16 Kbytes 0.96 Mops/s 136 bytes/op testBufferAllocFree 16 Mbytes ≈ 0.001 Mops/s 136 bytes/op @Param({"16", "16K", "16M") public int size; @Benchmark public void testBufferAllocFree() { ByteBuffer dbb = ByteBuffer.allocateDirect(size); UNSAFE.invokeCleaner(dbb); }
  30. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap • Direct buffers work well assuming you use them as intended! – Direct buffers are seldom allocated, relatively small and typically reused – Cost of IO typically dominates other costs (e.g. bounds check) • But direct buffers fail to scale when used as a general off-heap API – Bad for native structs: expensive allocation/non-deterministic release gets in the way – When mapping persistent memory file descriptors, the 2GB limit hurts – Contents accessed sequentially, or DYI offset computation (with absolute addressing) • Time has come to design a supported off-heap API from the ground up – Or we can keep piling up workarounds (e.g. absolute addressing, Unsafe hacks) 31 Square peg, round hole
  31. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap 32 ByteBuffer enhancement list
  32. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory Access API • Goal: provide low-level, safe and efficient memory access across a variety of memory sources (on-heap, off-heap) – Fill the gap that leads to Unsafe/ByteBuffer usage! • Key abstractions MemorySegment → memory region with spatial and temporal bounds MemoryAddress → offset within a segment MemoryLayout → description of a segment’s contents • Memory dereference operations supported via VarHandle API – Don’t reinvent the wheel! 34
  33. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Segment and addresses 35 struct Point { int x; int y; } pts[5]; x0 y0 x1 y1 x2 y2 x3 y3 x4 y4 segmentA baseA limitA
  34. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Segment and addresses 36 Moving addresses struct Point { int x; int y; } pts[5]; x0 y0 x1 y1 x2 y2 x3 y3 x4 y4 baseA baseA + O offset(O) segmentA limitA
  35. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Segment and addresses 37 Slicing segments struct Point { int x; int y; } pts[5]; x0 y0 x1 y1 x2 y2 x3 y3 x4 y4 slice(O, L) baseB = baseA + O segmentA baseA limitB = baseB + L limitA segmentB slice(O, L)
  36. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory access handles • Memory in a segment is accessed through a memory access VarHandle – Supported carrier types: byte, char, short, int, long, float, double • At least one mandated access coordinate (of type MemoryAddress) – the address at which dereference occurs • Combinators can add more access coordinates (of type long) to achieve more nuanced addressing modes – E.g. add variable offset to accessed address, to mimic multi-dimensional access 38
  37. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Accessing segment memory 39 Leaf accessor x0 y0 x1 y1 x2 y2 x3 y3 x4 y4 segmentA baseA limitA x(baseA ) VarHandle x = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); // (MemoryAddress) -> int
  38. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Accessing segment memory 40 Displaced access combinator x0 y0 x1 y1 x2 y2 x3 y3 x4 y4 segmentA baseA limitA y(baseA ) offset(4) VarHandle x = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); // (MemoryAddress) -> int VarHandle y = MemoryHandles.withOffset(x, 4); // (MemoryAddress) -> int
  39. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Accessing segment memory 41 Indexed access combinator x0 y0 x1 y1 x2 y2 x3 y3 x4 y4 segmentA baseA limitA ys(baseA , 0) ys(baseA , 1) ys(baseA , 2) ys(baseA , 3) ys(baseA , 4) offset(4) offset(8) offset(8) offset(8) offset(8) VarHandle x = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); // (MemoryAddress) -> int VarHandle y = MemoryHandles.withOffset(x, 4); // (MemoryAddress) -> int VarHandle ys = MemoryHandles.withStride(y, 8); // (MemoryAddress, long) -> int
  40. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    The life of a segment • Goal: deterministic deallocation – Segments are created, accessed and then closed • When a memory segment is closed, additional resources can be released – E.g. closing a segment frees the native memory backing the segment • Accessing an already closed segment is not allowed – Safety requirement, more on that later • Forgetting to close segments can lead to memory leaks – Try-with-resources to the rescue! 42 Temporal bounds
  41. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); VarHandle xHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 0), 8); VarHandle yHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 4), 8); try (MemorySegment points = MemorySegment.allocateNative(4 * 2 * 5)) { MemoryAddress base = points.baseAddress(); for (long i = 0 ; i < 5 ; i++) { xHandle.set(base, i, (int)i); yHandle.set(base, i, (int)i); } } Working with segments An example
  42. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with segments An example VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); VarHandle xHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 0), 8); VarHandle yHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 4), 8); try (MemorySegment points = MemorySegment.allocateNative(4 * 2 * 5)) { MemoryAddress base = points.baseAddress(); for (long i = 0 ; i < 5 ; i++) { xHandle.set(base, i, (int)i); yHandle.set(base, i, (int)i); } } // points.close() frees memory segmentpoints
  43. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with segments 45 An example VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); VarHandle xHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 0), 8); VarHandle yHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 4), 8); try (MemorySegment points = MemorySegment.allocateNative(4 * 2 * 5)) { MemoryAddress base = points.baseAddress(); for (long i = 0 ; i < 5 ; i++) { xHandle.set(base, i, (int)i); yHandle.set(base, i, (int)i); } } 0 1 2 3 4 xHandle(basepoints , 0) xHandle(basepoints , 1) xHandle(basepoints , 2) xHandle(basepoints , 4) xHandle(basepoints , 3) segmentpoints
  44. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with segments 46 An example VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); VarHandle xHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 0), 8); VarHandle yHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 4), 8); try (MemorySegment points = MemorySegment.allocateNative(4 * 2 * 5)) { MemoryAddress base = points.baseAddress(); for (long i = 0 ; i < 5 ; i++) { xHandle.set(base, i, (int)i); yHandle.set(base, i, (int)i); } } yHandle(basepoints , 0) yHandle(basepoints , 1) yHandle(basepoints , 2) yHandle(basepoints , 4) yHandle(basepoints , 3) segmentpoints 0 0 1 1 2 2 3 3 4 4
  45. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with segments An example VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); VarHandle xHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 0), 8); VarHandle yHandle = MemoryHandles.withStride(MemoryHandles.withOffset(intHandle, 4), 8); try (MemorySegment points = MemorySegment.allocateNative(4 * 2 * 5)) { MemoryAddress base = points.baseAddress(); for (long i = 0 ; i < 5 ; i++) { xHandle.set(base, i, (int)i); yHandle.set(base, i, (int)i); } } Hardwired constants
  46. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory layouts • Goal: reduce the brittleness of the code – Byte size computation performed by hand – Creation of var handle chains contains hardwired byte constants • Idea: describe memory layouts programmatically, with an API – sizes, offsets, alignment constraints can all be derived from layouts • Bonus point: memory access handles can be derived from layouts too! 49 Navigating segments
  47. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with layouts 50 An example struct Point { int x; int y; } pts[5];
  48. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with layouts 51 An example struct Point { int x; int y; } pts[5]; MemoryLayout.ofSequence(5, );
  49. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with layouts 52 An example struct Point { int x; int y; } pts[5]; MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( ) );
  50. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with layouts 53 An example struct Point { int x; int y; } pts[5]; MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32) MemoryLayout.ofValueBits(32) ) );
  51. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with layouts 54 An example struct Point { int x; int y; } pts[5]; MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32) .withName(“x”), MemoryLayout.ofValueBits(32) .withName(“y”) ) );
  52. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Layout paths • Given a complex layout, we can select one of its nested layout elements using a so called layout path – Useful for computing offsets, generating memory access handles • Rules of the game 1. A layout path must start at a compound layout (group or sequence) 2. From a path to a sequence layout, we can obtain a path to a sequence element 3. From a path to a group layout, we can obtain a path to a (named) group element 4. A layout path must end at a value layout 55 Selecting layouts
  53. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Path expressions 56 An example • Goal: obtain a memory access var handle to access all ys coordinates SequenceLayout seq = MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32).withName(“x”), MemoryLayout.ofValueBits(32).withName(“y”) ) ); VarHandle yHandle = ???
  54. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Path expressions 57 An example SequenceLayout seq = MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32).withName(“x”), MemoryLayout.ofValueBits(32).withName(“y”) ) ); VarHandle yHandle = seq.varHandle(int.class, ); // (MemoryAddress) -> int
  55. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Path expressions 58 An example SequenceLayout seq = MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32).withName(“x”), MemoryLayout.ofValueBits(32).withName(“y”) ) ); VarHandle yHandle = seq.varHandle(int.class, MemoryLayout.PathElement.sequenceElement(), ); // (MemoryAddress, long) -> int
  56. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Path expressions 59 An example SequenceLayout seq = MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()).withName(“x”), MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()).withName(“y”) ) ); VarHandle yHandle = seq.varHandle(int.class, MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement(“y”) ); // (MemoryAddress, long) -> int
  57. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with segments 60 Layouts to the rescue! SequenceLayout seq = MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()).withName("x"), MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()).withName("y") ) ); var xHandle = seq.varHandle(int.class, MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement("x")); var yHandle = seq.varHandle(int.class, MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement("y")); try (MemorySegment points = MemorySegment.allocateNative(seq)) { MemoryAddress base = points.baseAddress(); long size = seq.elementCount().getAsLong(); for (long i = 0; i < size; i++) { xHandle.set(base, i, (int) i); yHandle.set(base, i, (int) i); } }
  58. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Working with segments 61 Layouts to the rescue! Information flows from layouts SequenceLayout seq = MemoryLayout.ofSequence(5, MemoryLayout.ofStruct( MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()).withName("x"), MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()).withName("y") ) ); var xHandle = seq.varHandle(int.class, MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement("x")); var yHandle = seq.varHandle(int.class, MemoryLayout.PathElement.sequenceElement(), MemoryLayout.PathElement.groupElement("y")); try (MemorySegment points = MemorySegment.allocateNative(seq)) { MemoryAddress base = points.baseAddress(); long size = seq.elementCount().getAsLong(); for (long i = 0; i < size; i++) { xHandle.set(base, i, (int) i); yHandle.set(base, i, (int) i); } }
  59. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Safety • Principle: safe API, hard VM crashes should not be possible! – Non-critical conditions (e.g. reading an int value as a float) treated as user errors • Memory access - what can go wrong? – Out-of-bounds access – Access to already freed memory • To achieve safety, the memory access API provides strong spatial and temporal safety guarantees – Bounds and liveness state checks enforced on every access 63 Spatial and temporal safety
  60. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Breaking segments An example VarHandle intHandle = MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()) .varHandle(int.class); MemorySegment segment = MemorySegment.allocateNative(10); MemoryAddress base = segment.baseAddress(); intHandle.set(base.addOffset(8), 42); Exception in thread "main" java.lang.IllegalStateException: Out of bound access on segment MemorySegment{ id:0x42517214 limit:10 }; new offset = 8; new length = 4 at jdk.incubator.foreign/jdk.internal.foreign.MemorySegmentImpl.checkRange(MemorySegmentImpl.java:139) at jdk.incubator.foreign/jdk.internal.foreign.MemoryAddressImpl.checkAccess(MemoryAddressImpl.java:83) at java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.checkAddress(…) at java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.set0(VarHandleMemoryAddressAsInts.java:72) at java.base/java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800c73840.set(Unknown Source) at java.base/java.lang.invoke.VarHandleGuards.guard_LI_V(VarHandleGuards.java:114)
  61. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Breaking segments An example VarHandle intHandle = MemoryLayout.ofValueBits(32, ByteOrder.nativeOrder()) .varHandle(int.class); MemorySegment segment = MemorySegment.allocateNative(10); MemoryAddress base = segment.baseAddress(); intHandle.set(base.addOffset(8), 42); segment.close(); intHandle.set(base, 42); Exception in thread "main" java.lang.IllegalStateException: Segment is not alive at jdk.incubator.foreign/jdk.internal.foreign.MemorySegmentImpl.checkValidState(…) at jdk.incubator.foreign/jdk.internal.foreign.MemorySegmentImpl.checkRange(MemorySegmentImpl.java:135) at jdk.incubator.foreign/jdk.internal.foreign.MemoryAddressImpl.checkAccess(MemoryAddressImpl.java:83) at java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.checkAddress(…) at java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.set0(VarHandleMemoryAddressAsInts.java:72) at java.base/java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800c73840.set(Unknown Source) at java.base/java.lang.invoke.VarHandleGuards.guard_LI_V(VarHandleGuards.java:114)
  62. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Bounds check static final ByteBuffer DBB = ByteBuffer.allocateDirect(4 * 10).order(ByteOrder.nativeOrder()); @Benchmark public int testBufferAbsoluteNativeGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += DBB.getInt(i * 4); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferAbsoluteNativeGet 46.63 Mops/s
  63. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Bounds check static final MemoryAddress MA = MemorySegment.allocateNative(4 * 10).baseAddress(); static final VarHandle INT_VH = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); @Benchmark public int testMemoryAddressGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += (int)INT_VH.get(MA.addOffset(i * 4)); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferAbsoluteNativeGet 46.63 Mops/s testMemoryAddressGet 33.81 Mops/s
  64. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    The price of safety • Idiomatic memory access API usage can be problematic for JITs – Escape analysis can sometimes be an issue (C2) – Certain paths (e.g. try-with-resources) not hot enough (C2) – Memory barriers before/after Unsafe calls can hinder hoisting (C2/Graal) – Range check elimination works on ints, not longs (C2/Graal) • Some targeted solutions are in the works (C2), promising early results • Supporting loops with 64-bit trip counters is tricky and requires more work 68 Challenges
  65. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Bounds check static final MemoryAddress MA = MemorySegment.allocateNative(4 * 10).baseAddress(); static final VarHandle INT_VH = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); static final VarHandle INT_VH_INDEXED = MemoryHandles.withStride(INT_VH, 4); @Benchmark public int testMemoryAddressIndexedGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += (int)INT_VH_INDEXED.get(MA, (long)i); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferAbsoluteNativeGet 46.63 Mops/s testMemoryAddressGet 33.81 Mops/s testMemoryAddressIndexedGet 47.01 Mops/s
  66. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Bounds check static final MemoryAddress MA = MemorySegment.allocateNative(4 * 10).baseAddress(); static final VarHandle INT_VH = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); @Benchmark public int testMemoryAddressGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += (int)INT_VH.get(MA.addOffset(i * 4)); } return sum; } benchmark throughput testUnsafeGet 171.06 Mops/s testBufferAbsoluteNativeGet 46.63 Mops/s testMemoryAddressGet (C2) 33.81 Mops/s testMemoryAddressIndexedGet (C2) 47.01 Mops/s testMemoryAddressGet (Graal) 121.45 Mops/s
  67. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Bounds check benchmark throughput testUnsafeGet 171.06 Mops/s testBufferAbsoluteNativeGet 46.63 Mops/s testMemoryAddressGet (C2) 33.81 Mops/s testMemoryAddressGet (Graal) 121.45 Mops/s testMemoryAddressGet (C2 w/ patch) 154.27 Mops/s static final MemoryAddress MA = MemorySegment.allocateNative(4 * 10).baseAddress(); static final VarHandle INT_VH = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()); @Benchmark public int testMemoryAddressGet() { int sum = 0; for (int i = 0 ; i < 10 ; i++) { sum += (int)INT_VH.get(MA.addOffset(i * 4)); } return sum; }
  68. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Direct buffers and off-heap Allocation benchmark allocation size throughput GC rate testUnsafeAllocFree 16 bytes 7.13 Mops/s ≈ 10⁻⁵ bytes/op testBufferAllocFree 16 bytes 3.05 Mops/s 136 bytes/op @Param({"16"}) public int size; @Benchmark public void testBufferAllocFree() { ByteBuffer dbb = ByteBuffer.allocateDirect(size); UNSAFE.invokeCleaner(dbb); }
  69. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Allocation @Param({"16"}) public int size; @Benchmark public long testSegmentAllocFree() { try (MemorySegment segment = MemorySegment.allocateNative(size)) { return segment.byteSize(); } } benchmark allocation size throughput GC rate testUnsafeAllocFree 16 bytes 7.13 Mops/s ≈ 10⁻⁵ bytes/op testBufferAllocFree 16 bytes 3.05 Mops/s 136 bytes/op testSegmentAllocFree 16 bytes 4.13 Mops/s 24 bytes/op
  70. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Allocation // -Djdk.internal.foreign.skipZeroMemory=true @Param({"16"}) public int size; @Benchmark public long testSegmentAllocFree() { try (MemorySegment segment = MemorySegment.allocateNative(size)) { return segment.byteSize(); } } benchmark allocation size throughput GC rate testUnsafeAllocFree 16 bytes 7.13 Mops/s ≈ 10⁻⁵ bytes/op testBufferAllocFree 16 bytes 3.05 Mops/s 136 bytes/op testSegmentAllocFree (zeroing) 16 bytes 4.13 Mops/s 24 bytes/op testSegmentAllocFree (no zeroing) 16 bytes 6.86 Mops/s 24 bytes/op
  71. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and off-heap Allocation benchmark allocation size throughput GC rate testUnsafeAllocFree 16 bytes 7.13 Mops/s ≈ 10⁻⁵ bytes/op testBufferAllocFree 16 bytes 3.05 Mops/s 136 bytes/op testSegmentAllocFree (C2) 16 bytes 6.86 Mops/s 24 bytes/op testSegmentAllocFree (Graal) 16 bytes 6.98 Mops/s 0.004 bytes/op testSegmentAllocFree (C2 + Valhalla) 16 bytes 7.08 Mops/s ≈ 10⁻⁵ bytes/sec // -Djdk.internal.foreign.skipZeroMemory=true @Param({"16"}) public int size; @Benchmark public long testSegmentAllocFree() { try (MemorySegment segment = MemorySegment.allocateNative(size)) { return segment.byteSize(); } }
  72. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory segments and byte buffers • The memory access API provides bidirectional interop with byte buffers – Create memory segments from existing byte buffers – Create byte buffer views from existing memory segments • Byte buffer interop – what can go wrong? – Access a segment when underlying byte buffer memory has been released – Access a byte buffer view backed by an already closed segment • Safety counter-measures – Keep strong reference to underlying byte buffer, to avoid premature GC – Temporal checks when accessing byte buffer views backed by memory segments 76 More safety issues
  73. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Breaking buffer segments An example MemorySegment segment = MemorySegment.allocateNative(8); ByteBuffer bb = segment.asByteBuffer(); bb.getInt(); segment.close(); bb.getInt(); Exception in thread "main" java.lang.IllegalStateException: Segment is not alive at jdk.incubator.foreign/jdk.internal.foreign.MemorySegmentImpl.checkValidState(MemorySegmentImpl.java:125) at java.base/java.nio.ScopedByteBuffer.checkAlive(ScopedByteBuffer.java:48) at java.base/java.nio.ScopedByteBuffer.getInt(ScopedByteBuffer.java:526) ...
  74. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Concurrent access • Concurrent memory access - what can go wrong? – Access vs. access - two threads access same memory address concurrently – Access vs. close - one thread access a segment while another is closing it • Access vs. access races can be mitigated with VarHandle primitives – compareAndSwap, getVolatile, … • Access vs. close races undermines safety, but preventing them is hard – How to know if/when a segment can be closed? 78 “Off to the races”
  75. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Concurrent access • Solution: segments start off in a thread-confined state – Segment is owned by a thread, only that thread can access/close segment – Ownership established once and for all, when segment is created • Too strict? Segment ownership can be transferred to a different thread, or removed, to allow for concurrent access – Ownership changes invalidate existing segments and create new ones – Threads must acquire confined copies of shared segments – A shared segment can be closed only when all its confined copies have been closed • http://cr.openjdk.java.net/~mcimadamore/panama/confinement.html 79 Segments and confinement
  76. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory access API • The memory access API is the basic building block for off-heap access – Wean developers off uses of ByteBuffer/Unsafe APIs • The memory access API is safe – Spatial and temporal checks upon access – Robust ownership model to control concurrent access and races • The memory access API is efficient – Constant spatial bounds lead to better JIT optimizations – Deterministic deallocation removes GC from the picture – Mostly on par with Unsafe access, some work to do on problematic use cases 81 Summary
  77. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Memory access API • The API will be available as an incubating module in JDK 14 – --add-modules jdk.incubator.foreign – https://openjdk.java.net/jeps/370 • Next steps – Performance work – Refine the incubating API based on the feedback – Integrate the API into higher layers of the Panama interop story (new FFI) 82 Present and future
  78. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
  79. Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

    Beyond ByteBuffers Vladimir Ivanov Senior Principal Software Engineer HotSpot JVM Compilers team JPG, Oracle February, 2020 Surfing off-heap