Understanding Go Memory Allocation - Gophercon UK

Understanding Go Memory Allocation - Gophercon UK

Ever wondered how does Go manage memory allocation? In this talk we are going to explore Go’s memory allocator and understand how its algorithm interacts with the operating system to manage memory!

B39c757b2a5fdbdec05a65e882dd07eb?s=128

André Carvalho

August 03, 2018
Tweet

Transcript

  1. Understanding Go Memory Allocation André Carvalho @andresantostc 1

  2. Developer @ 2 andrestc.com tsuru

  3. 3

  4. Virtual Memory • Processes do not read directly from physical

    memory ◦ Security ◦ Coordination between multiple processes • Virtual Memory abstracts that away from the processes ◦ Segmentation ◦ Page tables 4
  5. Virtual Memory 5 Frame 0 Frame 1 Frame 2 Frame

    3 Frame 4 Frame 5 Frame 6 Frame 7 RAM Disk Other process Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page Process Frame Page Table 3 6
  6. func main() { rand.Seed(time.Now().UnixNano()) i := rand.Intn(100) fmt.Printf("%v at %p\n",

    i, &i) for { } } 6
  7. 7 $ ./vmemory 53 at 0xc420016110 $ ./vmemory 68 at

    0xc420016110 Running two instances at the same time… Same virtual address
  8. Process Memory Layout 8 Text Data Heap BSS Stack Program

    Break Code Initialized static variables Uninitialized static variables Dynamic allocated variables Function stack frames
  9. Stack Allocation 9 Stack Used Stack Pointer (SP) Unused Allocation

    SP += size; return Stack[SP-size]; Deallocation SP -= size;
  10. Heap Allocation • For objects with size only known at

    runtime • C provides malloc and free • C++ provides new and delete • Go uses escape analysis and has garbage collection 10
  11. Minimal Allocator 11

  12. Minimal Allocator We need to implement two functions 12 void*

    malloc(size_t size) void free(void *ptr)
  13. Minimal Allocator 13 Application Allocator OS malloc mmap Allocator uses

    syscalls like mmap/munmap to allocate/deallocate munmap madvise free
  14. Minimal Allocator Linked list with free objects size=n next=* Header

    n bytes size=m next=nil m bytes 14 Head
  15. malloc(10) 15 Head NULL Minimal Allocator - Allocating

  16. 16 Virtual Address Space 0x000000c000000000 mmap( 0x000000c000000000, 4096, PROT_WRITE |

    PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, ...) Start Address Size Permission Flags Minimal Allocator - Allocating
  17. malloc(10) 17 4084 12 4084 4096 Head Minimal Allocator -

    Allocating
  18. malloc(10) 18 4062 Head 10 22 Minimal Allocator - Allocating

  19. malloc(10) 19 4062 Head 10 P Allocator returns p, which

    points right after the header Minimal Allocator - Allocating
  20. 20 4062 Head free(p) Minimal Allocator - Deallocating

  21. free(p) 10 p - size(header) 21 4062 Head Minimal Allocator

    - Deallocating
  22. • Can be implemented in a few hundred LOCs •

    Issues ◦ Fragmentation ◦ Corruption ◦ Releasing memory back to OS ▪ When? ▪ How? munmap, madvise... ◦ Multi-thread ◦ ... 22 Minimal Allocator
  23. Go Runtime Allocator 23 • TCMalloc • Invoking the Allocator

    • Go’s Allocator
  24. Thread-Caching Malloc (TCMalloc) • Originally implemented for the C language

    by Google • Served as basis for Go’s runtime allocator • Reduces lock contention for multithreaded programs 24
  25. TCMalloc • Each thread has a local cache • Two

    types of allocations ◦ Small allocations (<= 32 kB) ◦ Large allocations • Manages memory in units called Spans ◦ Runs of contiguous memory pages ◦ Metadata is kept separated from the allocation arena 25
  26. TCMalloc - Large Allocations • Served by the central heap

    • Requested size is rounded up to number of pages (4kB) 26 malloc(34 kB) ⇒ malloc(36 kB) ⇒ 9 pages malloc(33 kB) ⇒ malloc(36 kB) ⇒ 9 pages
  27. TCMalloc - Large Allocations 1 page 2 pages ... Span

    Span Span Span Central Heap Span 27 254 pages Span > 255 pages Span Span
  28. TCMalloc - Large Allocations Application Central Heap 34 kb OS

    X pages 28 X pages
  29. TCMalloc - Small Allocations • Served by the local thread

    cache • Requested size is rounded up to one of the size classes 29 malloc(4 bytes) ⇒ malloc(8 bytes) malloc(6 bytes) ⇒ malloc(8 bytes)
  30. TCMalloc - Small Allocations Class 0 Class 1 Class 2

    ... Local Thread Cache 30
  31. TCMalloc - Small Allocations Class 0 Class 1 Class 2

    ... Local Thread Cache 31
  32. TCMalloc - Small Allocations Class 0 Class 1 Class 2

    ... Local Thread Cache Span Span Span ... Central Free List Class 1 32 Run of contiguous pages Span
  33. TCMalloc - Small Allocations Class 0 Class 1 Class 2

    ... Local Thread Cache Span Span Span ... Central Free List Class 1 33
  34. TCMalloc - Small Allocations Span Span Span Central Free List

    Class 1 Span 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Span Central Heap 34
  35. TCMalloc - Small Allocations Span Span Span Central Free List

    Class 1 Span 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Central Heap Span ... 35 Span
  36. TCMalloc - Small Allocations Application Local Thread Cache Central Free

    List Central Heap 4 bytes N 8-byte objects X pages OS Y*X pages 36 X pages N 8-byte objects Y*X pages
  37. TCMalloc - Deallocation Page 1 Page 2 Page 3 Page

    4 Page 5 Page 6 Span A Span B Span C 37
  38. TCMalloc - Deallocation free( ) Page Span 38

  39. TCMalloc - Deallocation free( ) Page Span Class 0 Class

    1 Class 2 ... Local Thread Cache Small object 39
  40. TCMalloc - Deallocation free( ) Page Span Large object Page

    1 Page 2 Page 3 Page 4 Span A Span B Span C 40
  41. TCMalloc - Deallocation free( ) Page Span Large object Page

    1 Page 2 Page 3 Page 4 Span A Span B 41
  42. TCMalloc - Deallocation free( ) Page Span Large object 1

    page 2 pages ... > 255 pages Span B Central Heap 42
  43. TCMalloc - Deallocation free( ) Page Span Large object 1

    page 2 pages ... > 255 pages Span B Central Heap 43
  44. Go Runtime Allocator 44 • TCMalloc • Invoking the Allocator

    • Go’s Allocator
  45. package main func main() { f() } //go:noinline func f()

    *int { i := 10 return &i } 45
  46. package main func main() { f() } //go:noinline func f()

    *int { i := 10 return &i } 46 $ go build -gcflags "-m -m" main.go # command-line-arguments ./main.go:8:6: cannot inline f: marked go:noinline ./main.go:3:6: cannot inline main: non-leaf function ./main.go:10:9: &i escapes to heap ./main.go:10:9: from ~r0 (return) at ./main.go:10:2 ./main.go:9:2: moved to heap: i
  47. 47 $ go tool compile -S main.go ... 0x001d 00029

    (main.go:9) LEAQ type.int(SB), AX 0x0024 00036 (main.go:9) MOVQ AX, (SP) 0x0028 00040 (main.go:9) PCDATA $0, $0 0x0028 00040 (main.go:9) CALL runtime.newobject(SB) ...
  48. 48 $ go tool compile -S main.go ... 0x001d 00029

    (main.go:9) LEAQ type.int(SB), AX 0x0024 00036 (main.go:9) MOVQ AX, (SP) 0x0028 00040 (main.go:9) PCDATA $0, $0 0x0028 00040 (main.go:9) CALL runtime.newobject(SB) ... func newobject(typ *_type) unsafe.Pointer { return mallocgc(typ.size, typ, true) }
  49. Go Runtime Allocator 49 • TCMalloc • Invoking the Allocator

    • Go’s Allocator
  50. Go’s Allocator • Based of TCMalloc • Garbage Collector ◦

    Tightly coupled with the allocator ◦ Makes hard (impossible?) to replace with other implementations • Three types of allocations ◦ Tiny Allocations (size < 16 bytes, no pointers) ◦ Small Allocations (size <= 32 kbytes) ◦ Large Allocations 50
  51. Garbage Collector ⇒ Concurrent mark and sweep 51 Go’s Allocator

    - Sweeping 1. Scan all objects 2. Mark objects that are live 3. Sweep objects that are not live a. In background b. In response to allocations
  52. Go’s Allocator - Large Allocations 52 1 page 2 pages

    ... > 255 pages Span Span Span Span Span Span mheap Busy Spans Span Span Span Before allocating, mheap sweeps the requested number of pages
  53. Go’s Allocator - Large Allocations 53 1 page 2 pages

    ... > 255 pages Span Span Span Span Span Span Span Span Span Span Span mheap Free Spans
  54. Go’s Allocator - Large Allocations 54 1 page 2 pages

    ... > 255 pages Span Span Span Span Span Span mheap Free Spans Span Span Span Span Span mtreap ⇒ randomized binary tree
  55. Go’s Allocator - Large Allocations 55 After allocating, depending on

    the total amount of live memory... The goroutine may perform additional work for the GC!
  56. Go’s Allocator - Small Allocations 56 P 1 mcache Each

    logical processor (P) has a local cache (mcache) P 2 mcache
  57. Go’s Allocator - Small Allocations 57 P 1 P 2

    mcache mcache Each mcache maintains a span for each size class Span Span ... class 0 class 1 Span Span ... class 0 class 1
  58. Go’s Allocator - Small Allocations 58 class bytes/obj bytes/span objects

    1 8 8192 1024 2 16 8192 512 3 32 8192 256 4 64 8192 170 ... 65 28672 57344 2 66 32768 32768 1
  59. Go’s Allocator - Small Allocations 59 P 1 mcache mcache

    returns the address for a free object on the span Span Span ... class 0 class 1 Span
  60. Go’s Allocator - Small Allocations 60 P 1 mcache mcache

    request a new span from mcentral for this size class Span Span ... class 0 class 1 Span
  61. Go’s Allocator - Small Allocations 61 P 1 mcache Each

    mcentral has two linked lists, empty and nonempty spans Span Span ... ... class 0 class 1 mcentral mcentral Span Span Span Span
  62. Go’s Allocator - Small Allocations 62 P 1 mcache Span

    with free objects will be given to the mcache Span ... ... class 0 class 1 mcentral mcentral Span Span Span Span
  63. Go’s Allocator - Small Allocations 63 P 1 mcache mcentral

    will try to sweep existing spans Span Span ... ... class 0 class 1 mcentral mcentral Span Span If there are no nonempty spans….
  64. Go’s Allocator - Small Allocations 64 As a last resort,

    mcentral will ask for a new span from mheap class 0 mcentral Span Span mheap
  65. Go’s Allocator - Small Allocations 65 mcentral will give this

    span to mcache class 0 mcentral Span Span mheap Span
  66. Go’s Allocator - Tiny Allocations Allocations for objects with no

    pointers and size < 16 bytes The main targets of tiny allocator are small strings and standalone escaping variables. On a json benchmark the allocator reduces number of allocations by ~12% and reduces heap size by ~20%. 66
  67. 67 Go’s Allocator - Tiny Allocations 64 bytes Allocated Free

    Allocated Free • Each P keeps a 64-bytes object allocated from a span • Each tiny allocation appends a subobject
  68. 68 Go’s Allocator - Tiny Allocations Allocated Free mcache P

    1 • Grab a new object from the mcache ≃ small allocation • Eventually, GC will deallocate the old object Free P 1 mcache
  69. • Runtime periodically releases memory to the OS • Releases

    spans that were swept more than 5 minutes ago • In Linux, uses the madvise(2) syscall 69 Go’s Allocator - Releasing memory to the OS madvise(addr, size, _MADV_DONTNEED)
  70. 70 stats := runtime.MemStats{} runtime.ReadMemStats(&stats) type MemStats struct { ...

    // Heap memory statistics. HeapAlloc uint64 HeapSys uint64 HeapIdle uint64 HeapInuse uint64 HeapReleased uint64 HeapObjects uint64 ... }
  71. References 1. http://goog-perftools.sourceforge.net/doc/tcmalloc.html 2. https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html 3. https://gabrieletolomei.wordpress.com/miscellanea/operating-systems/in-memory-layout/ 4. Lec 10

    | MIT 6.172 - https://www.youtube.com/watch?v=p0bc1f6ULxw 5. https://faculty.washington.edu/aragon/pubs/rst89.pdf 6. http://man7.org/linux/man-pages/man2/mmap.2.html 7. http://man7.org/linux/man-pages/man2/madvise.2.html 8. https://nostarch.com/tlpi 71
  72. Thanks! andrestc.com @andresantostc 72