Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Go Memory Allocation - Gophercon UK

Understanding Go Memory Allocation - Gophercon UK

Ever wondered how does Go manage memory allocation? In this talk we are going to explore Go’s memory allocator and understand how its algorithm interacts with the operating system to manage memory!

André Carvalho

August 03, 2018
Tweet

More Decks by André Carvalho

Other Decks in Programming

Transcript

  1. 3

  2. Virtual Memory • Processes do not read directly from physical

    memory ◦ Security ◦ Coordination between multiple processes • Virtual Memory abstracts that away from the processes ◦ Segmentation ◦ Page tables 4
  3. Virtual Memory 5 Frame 0 Frame 1 Frame 2 Frame

    3 Frame 4 Frame 5 Frame 6 Frame 7 RAM Disk Other process Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page Process Frame Page Table 3 6
  4. 7 $ ./vmemory 53 at 0xc420016110 $ ./vmemory 68 at

    0xc420016110 Running two instances at the same time… Same virtual address
  5. Process Memory Layout 8 Text Data Heap BSS Stack Program

    Break Code Initialized static variables Uninitialized static variables Dynamic allocated variables Function stack frames
  6. Stack Allocation 9 Stack Used Stack Pointer (SP) Unused Allocation

    SP += size; return Stack[SP-size]; Deallocation SP -= size;
  7. Heap Allocation • For objects with size only known at

    runtime • C provides malloc and free • C++ provides new and delete • Go uses escape analysis and has garbage collection 10
  8. Minimal Allocator We need to implement two functions 12 void*

    malloc(size_t size) void free(void *ptr)
  9. Minimal Allocator 13 Application Allocator OS malloc mmap Allocator uses

    syscalls like mmap/munmap to allocate/deallocate munmap madvise free
  10. 16 Virtual Address Space 0x000000c000000000 mmap( 0x000000c000000000, 4096, PROT_WRITE |

    PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, ...) Start Address Size Permission Flags Minimal Allocator - Allocating
  11. malloc(10) 19 4062 Head 10 P Allocator returns p, which

    points right after the header Minimal Allocator - Allocating
  12. • Can be implemented in a few hundred LOCs •

    Issues ◦ Fragmentation ◦ Corruption ◦ Releasing memory back to OS ▪ When? ▪ How? munmap, madvise... ◦ Multi-thread ◦ ... 22 Minimal Allocator
  13. Thread-Caching Malloc (TCMalloc) • Originally implemented for the C language

    by Google • Served as basis for Go’s runtime allocator • Reduces lock contention for multithreaded programs 24
  14. TCMalloc • Each thread has a local cache • Two

    types of allocations ◦ Small allocations (<= 32 kB) ◦ Large allocations • Manages memory in units called Spans ◦ Runs of contiguous memory pages ◦ Metadata is kept separated from the allocation arena 25
  15. TCMalloc - Large Allocations • Served by the central heap

    • Requested size is rounded up to number of pages (4kB) 26 malloc(34 kB) ⇒ malloc(36 kB) ⇒ 9 pages malloc(33 kB) ⇒ malloc(36 kB) ⇒ 9 pages
  16. TCMalloc - Large Allocations 1 page 2 pages ... Span

    Span Span Span Central Heap Span 27 254 pages Span > 255 pages Span Span
  17. TCMalloc - Small Allocations • Served by the local thread

    cache • Requested size is rounded up to one of the size classes 29 malloc(4 bytes) ⇒ malloc(8 bytes) malloc(6 bytes) ⇒ malloc(8 bytes)
  18. TCMalloc - Small Allocations Class 0 Class 1 Class 2

    ... Local Thread Cache Span Span Span ... Central Free List Class 1 32 Run of contiguous pages Span
  19. TCMalloc - Small Allocations Class 0 Class 1 Class 2

    ... Local Thread Cache Span Span Span ... Central Free List Class 1 33
  20. TCMalloc - Small Allocations Span Span Span Central Free List

    Class 1 Span 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Span Central Heap 34
  21. TCMalloc - Small Allocations Span Span Span Central Free List

    Class 1 Span 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Central Heap Span ... 35 Span
  22. TCMalloc - Small Allocations Application Local Thread Cache Central Free

    List Central Heap 4 bytes N 8-byte objects X pages OS Y*X pages 36 X pages N 8-byte objects Y*X pages
  23. TCMalloc - Deallocation Page 1 Page 2 Page 3 Page

    4 Page 5 Page 6 Span A Span B Span C 37
  24. TCMalloc - Deallocation free( ) Page Span Class 0 Class

    1 Class 2 ... Local Thread Cache Small object 39
  25. TCMalloc - Deallocation free( ) Page Span Large object Page

    1 Page 2 Page 3 Page 4 Span A Span B Span C 40
  26. TCMalloc - Deallocation free( ) Page Span Large object Page

    1 Page 2 Page 3 Page 4 Span A Span B 41
  27. TCMalloc - Deallocation free( ) Page Span Large object 1

    page 2 pages ... > 255 pages Span B Central Heap 42
  28. TCMalloc - Deallocation free( ) Page Span Large object 1

    page 2 pages ... > 255 pages Span B Central Heap 43
  29. package main func main() { f() } //go:noinline func f()

    *int { i := 10 return &i } 46 $ go build -gcflags "-m -m" main.go # command-line-arguments ./main.go:8:6: cannot inline f: marked go:noinline ./main.go:3:6: cannot inline main: non-leaf function ./main.go:10:9: &i escapes to heap ./main.go:10:9: from ~r0 (return) at ./main.go:10:2 ./main.go:9:2: moved to heap: i
  30. 47 $ go tool compile -S main.go ... 0x001d 00029

    (main.go:9) LEAQ type.int(SB), AX 0x0024 00036 (main.go:9) MOVQ AX, (SP) 0x0028 00040 (main.go:9) PCDATA $0, $0 0x0028 00040 (main.go:9) CALL runtime.newobject(SB) ...
  31. 48 $ go tool compile -S main.go ... 0x001d 00029

    (main.go:9) LEAQ type.int(SB), AX 0x0024 00036 (main.go:9) MOVQ AX, (SP) 0x0028 00040 (main.go:9) PCDATA $0, $0 0x0028 00040 (main.go:9) CALL runtime.newobject(SB) ... func newobject(typ *_type) unsafe.Pointer { return mallocgc(typ.size, typ, true) }
  32. Go’s Allocator • Based of TCMalloc • Garbage Collector ◦

    Tightly coupled with the allocator ◦ Makes hard (impossible?) to replace with other implementations • Three types of allocations ◦ Tiny Allocations (size < 16 bytes, no pointers) ◦ Small Allocations (size <= 32 kbytes) ◦ Large Allocations 50
  33. Garbage Collector ⇒ Concurrent mark and sweep 51 Go’s Allocator

    - Sweeping 1. Scan all objects 2. Mark objects that are live 3. Sweep objects that are not live a. In background b. In response to allocations
  34. Go’s Allocator - Large Allocations 52 1 page 2 pages

    ... > 255 pages Span Span Span Span Span Span mheap Busy Spans Span Span Span Before allocating, mheap sweeps the requested number of pages
  35. Go’s Allocator - Large Allocations 53 1 page 2 pages

    ... > 255 pages Span Span Span Span Span Span Span Span Span Span Span mheap Free Spans
  36. Go’s Allocator - Large Allocations 54 1 page 2 pages

    ... > 255 pages Span Span Span Span Span Span mheap Free Spans Span Span Span Span Span mtreap ⇒ randomized binary tree
  37. Go’s Allocator - Large Allocations 55 After allocating, depending on

    the total amount of live memory... The goroutine may perform additional work for the GC!
  38. Go’s Allocator - Small Allocations 56 P 1 mcache Each

    logical processor (P) has a local cache (mcache) P 2 mcache
  39. Go’s Allocator - Small Allocations 57 P 1 P 2

    mcache mcache Each mcache maintains a span for each size class Span Span ... class 0 class 1 Span Span ... class 0 class 1
  40. Go’s Allocator - Small Allocations 58 class bytes/obj bytes/span objects

    1 8 8192 1024 2 16 8192 512 3 32 8192 256 4 64 8192 170 ... 65 28672 57344 2 66 32768 32768 1
  41. Go’s Allocator - Small Allocations 59 P 1 mcache mcache

    returns the address for a free object on the span Span Span ... class 0 class 1 Span
  42. Go’s Allocator - Small Allocations 60 P 1 mcache mcache

    request a new span from mcentral for this size class Span Span ... class 0 class 1 Span
  43. Go’s Allocator - Small Allocations 61 P 1 mcache Each

    mcentral has two linked lists, empty and nonempty spans Span Span ... ... class 0 class 1 mcentral mcentral Span Span Span Span
  44. Go’s Allocator - Small Allocations 62 P 1 mcache Span

    with free objects will be given to the mcache Span ... ... class 0 class 1 mcentral mcentral Span Span Span Span
  45. Go’s Allocator - Small Allocations 63 P 1 mcache mcentral

    will try to sweep existing spans Span Span ... ... class 0 class 1 mcentral mcentral Span Span If there are no nonempty spans….
  46. Go’s Allocator - Small Allocations 64 As a last resort,

    mcentral will ask for a new span from mheap class 0 mcentral Span Span mheap
  47. Go’s Allocator - Small Allocations 65 mcentral will give this

    span to mcache class 0 mcentral Span Span mheap Span
  48. Go’s Allocator - Tiny Allocations Allocations for objects with no

    pointers and size < 16 bytes The main targets of tiny allocator are small strings and standalone escaping variables. On a json benchmark the allocator reduces number of allocations by ~12% and reduces heap size by ~20%. 66
  49. 67 Go’s Allocator - Tiny Allocations 64 bytes Allocated Free

    Allocated Free • Each P keeps a 64-bytes object allocated from a span • Each tiny allocation appends a subobject
  50. 68 Go’s Allocator - Tiny Allocations Allocated Free mcache P

    1 • Grab a new object from the mcache ≃ small allocation • Eventually, GC will deallocate the old object Free P 1 mcache
  51. • Runtime periodically releases memory to the OS • Releases

    spans that were swept more than 5 minutes ago • In Linux, uses the madvise(2) syscall 69 Go’s Allocator - Releasing memory to the OS madvise(addr, size, _MADV_DONTNEED)
  52. 70 stats := runtime.MemStats{} runtime.ReadMemStats(&stats) type MemStats struct { ...

    // Heap memory statistics. HeapAlloc uint64 HeapSys uint64 HeapIdle uint64 HeapInuse uint64 HeapReleased uint64 HeapObjects uint64 ... }
  53. References 1. http://goog-perftools.sourceforge.net/doc/tcmalloc.html 2. https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html 3. https://gabrieletolomei.wordpress.com/miscellanea/operating-systems/in-memory-layout/ 4. Lec 10

    | MIT 6.172 - https://www.youtube.com/watch?v=p0bc1f6ULxw 5. https://faculty.washington.edu/aragon/pubs/rst89.pdf 6. http://man7.org/linux/man-pages/man2/mmap.2.html 7. http://man7.org/linux/man-pages/man2/madvise.2.html 8. https://nostarch.com/tlpi 71