Slide 1

Slide 1 text

Understanding Go Memory Allocation André Carvalho @andresantostc 1

Slide 2

Slide 2 text

Developer @ 2 andrestc.com tsuru

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

Virtual Memory ● Processes do not read directly from physical memory ○ Security ○ Coordination between multiple processes ● Virtual Memory abstracts that away from the processes ○ Segmentation ○ Page tables 4

Slide 5

Slide 5 text

Virtual Memory 5 Frame 0 Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Frame 7 RAM Disk Other process Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page Process Frame Page Table 3 6

Slide 6

Slide 6 text

func main() { rand.Seed(time.Now().UnixNano()) i := rand.Intn(100) fmt.Printf("%v at %p\n", i, &i) for { } } 6

Slide 7

Slide 7 text

7 $ ./vmemory 53 at 0xc420016110 $ ./vmemory 68 at 0xc420016110 Running two instances at the same time… Same virtual address

Slide 8

Slide 8 text

Process Memory Layout 8 Text Data Heap BSS Stack Program Break Code Initialized static variables Uninitialized static variables Dynamic allocated variables Function stack frames

Slide 9

Slide 9 text

Stack Allocation 9 Stack Used Stack Pointer (SP) Unused Allocation SP += size; return Stack[SP-size]; Deallocation SP -= size;

Slide 10

Slide 10 text

Heap Allocation ● For objects with size only known at runtime ● C provides malloc and free ● C++ provides new and delete ● Go uses escape analysis and has garbage collection 10

Slide 11

Slide 11 text

Minimal Allocator 11

Slide 12

Slide 12 text

Minimal Allocator We need to implement two functions 12 void* malloc(size_t size) void free(void *ptr)

Slide 13

Slide 13 text

Minimal Allocator 13 Application Allocator OS malloc mmap Allocator uses syscalls like mmap/munmap to allocate/deallocate munmap madvise free

Slide 14

Slide 14 text

Minimal Allocator Linked list with free objects size=n next=* Header n bytes size=m next=nil m bytes 14 Head

Slide 15

Slide 15 text

malloc(10) 15 Head NULL Minimal Allocator - Allocating

Slide 16

Slide 16 text

16 Virtual Address Space 0x000000c000000000 mmap( 0x000000c000000000, 4096, PROT_WRITE | PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, ...) Start Address Size Permission Flags Minimal Allocator - Allocating

Slide 17

Slide 17 text

malloc(10) 17 4084 12 4084 4096 Head Minimal Allocator - Allocating

Slide 18

Slide 18 text

malloc(10) 18 4062 Head 10 22 Minimal Allocator - Allocating

Slide 19

Slide 19 text

malloc(10) 19 4062 Head 10 P Allocator returns p, which points right after the header Minimal Allocator - Allocating

Slide 20

Slide 20 text

20 4062 Head free(p) Minimal Allocator - Deallocating

Slide 21

Slide 21 text

free(p) 10 p - size(header) 21 4062 Head Minimal Allocator - Deallocating

Slide 22

Slide 22 text

● Can be implemented in a few hundred LOCs ● Issues ○ Fragmentation ○ Corruption ○ Releasing memory back to OS ■ When? ■ How? munmap, madvise... ○ Multi-thread ○ ... 22 Minimal Allocator

Slide 23

Slide 23 text

Go Runtime Allocator 23 ● TCMalloc ● Invoking the Allocator ● Go’s Allocator

Slide 24

Slide 24 text

Thread-Caching Malloc (TCMalloc) ● Originally implemented for the C language by Google ● Served as basis for Go’s runtime allocator ● Reduces lock contention for multithreaded programs 24

Slide 25

Slide 25 text

TCMalloc ● Each thread has a local cache ● Two types of allocations ○ Small allocations (<= 32 kB) ○ Large allocations ● Manages memory in units called Spans ○ Runs of contiguous memory pages ○ Metadata is kept separated from the allocation arena 25

Slide 26

Slide 26 text

TCMalloc - Large Allocations ● Served by the central heap ● Requested size is rounded up to number of pages (4kB) 26 malloc(34 kB) ⇒ malloc(36 kB) ⇒ 9 pages malloc(33 kB) ⇒ malloc(36 kB) ⇒ 9 pages

Slide 27

Slide 27 text

TCMalloc - Large Allocations 1 page 2 pages ... Span Span Span Span Central Heap Span 27 254 pages Span > 255 pages Span Span

Slide 28

Slide 28 text

TCMalloc - Large Allocations Application Central Heap 34 kb OS X pages 28 X pages

Slide 29

Slide 29 text

TCMalloc - Small Allocations ● Served by the local thread cache ● Requested size is rounded up to one of the size classes 29 malloc(4 bytes) ⇒ malloc(8 bytes) malloc(6 bytes) ⇒ malloc(8 bytes)

Slide 30

Slide 30 text

TCMalloc - Small Allocations Class 0 Class 1 Class 2 ... Local Thread Cache 30

Slide 31

Slide 31 text

TCMalloc - Small Allocations Class 0 Class 1 Class 2 ... Local Thread Cache 31

Slide 32

Slide 32 text

TCMalloc - Small Allocations Class 0 Class 1 Class 2 ... Local Thread Cache Span Span Span ... Central Free List Class 1 32 Run of contiguous pages Span

Slide 33

Slide 33 text

TCMalloc - Small Allocations Class 0 Class 1 Class 2 ... Local Thread Cache Span Span Span ... Central Free List Class 1 33

Slide 34

Slide 34 text

TCMalloc - Small Allocations Span Span Span Central Free List Class 1 Span 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Span Central Heap 34

Slide 35

Slide 35 text

TCMalloc - Small Allocations Span Span Span Central Free List Class 1 Span 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Central Heap Span ... 35 Span

Slide 36

Slide 36 text

TCMalloc - Small Allocations Application Local Thread Cache Central Free List Central Heap 4 bytes N 8-byte objects X pages OS Y*X pages 36 X pages N 8-byte objects Y*X pages

Slide 37

Slide 37 text

TCMalloc - Deallocation Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Span A Span B Span C 37

Slide 38

Slide 38 text

TCMalloc - Deallocation free( ) Page Span 38

Slide 39

Slide 39 text

TCMalloc - Deallocation free( ) Page Span Class 0 Class 1 Class 2 ... Local Thread Cache Small object 39

Slide 40

Slide 40 text

TCMalloc - Deallocation free( ) Page Span Large object Page 1 Page 2 Page 3 Page 4 Span A Span B Span C 40

Slide 41

Slide 41 text

TCMalloc - Deallocation free( ) Page Span Large object Page 1 Page 2 Page 3 Page 4 Span A Span B 41

Slide 42

Slide 42 text

TCMalloc - Deallocation free( ) Page Span Large object 1 page 2 pages ... > 255 pages Span B Central Heap 42

Slide 43

Slide 43 text

TCMalloc - Deallocation free( ) Page Span Large object 1 page 2 pages ... > 255 pages Span B Central Heap 43

Slide 44

Slide 44 text

Go Runtime Allocator 44 ● TCMalloc ● Invoking the Allocator ● Go’s Allocator

Slide 45

Slide 45 text

package main func main() { f() } //go:noinline func f() *int { i := 10 return &i } 45

Slide 46

Slide 46 text

package main func main() { f() } //go:noinline func f() *int { i := 10 return &i } 46 $ go build -gcflags "-m -m" main.go # command-line-arguments ./main.go:8:6: cannot inline f: marked go:noinline ./main.go:3:6: cannot inline main: non-leaf function ./main.go:10:9: &i escapes to heap ./main.go:10:9: from ~r0 (return) at ./main.go:10:2 ./main.go:9:2: moved to heap: i

Slide 47

Slide 47 text

47 $ go tool compile -S main.go ... 0x001d 00029 (main.go:9) LEAQ type.int(SB), AX 0x0024 00036 (main.go:9) MOVQ AX, (SP) 0x0028 00040 (main.go:9) PCDATA $0, $0 0x0028 00040 (main.go:9) CALL runtime.newobject(SB) ...

Slide 48

Slide 48 text

48 $ go tool compile -S main.go ... 0x001d 00029 (main.go:9) LEAQ type.int(SB), AX 0x0024 00036 (main.go:9) MOVQ AX, (SP) 0x0028 00040 (main.go:9) PCDATA $0, $0 0x0028 00040 (main.go:9) CALL runtime.newobject(SB) ... func newobject(typ *_type) unsafe.Pointer { return mallocgc(typ.size, typ, true) }

Slide 49

Slide 49 text

Go Runtime Allocator 49 ● TCMalloc ● Invoking the Allocator ● Go’s Allocator

Slide 50

Slide 50 text

Go’s Allocator ● Based of TCMalloc ● Garbage Collector ○ Tightly coupled with the allocator ○ Makes hard (impossible?) to replace with other implementations ● Three types of allocations ○ Tiny Allocations (size < 16 bytes, no pointers) ○ Small Allocations (size <= 32 kbytes) ○ Large Allocations 50

Slide 51

Slide 51 text

Garbage Collector ⇒ Concurrent mark and sweep 51 Go’s Allocator - Sweeping 1. Scan all objects 2. Mark objects that are live 3. Sweep objects that are not live a. In background b. In response to allocations

Slide 52

Slide 52 text

Go’s Allocator - Large Allocations 52 1 page 2 pages ... > 255 pages Span Span Span Span Span Span mheap Busy Spans Span Span Span Before allocating, mheap sweeps the requested number of pages

Slide 53

Slide 53 text

Go’s Allocator - Large Allocations 53 1 page 2 pages ... > 255 pages Span Span Span Span Span Span Span Span Span Span Span mheap Free Spans

Slide 54

Slide 54 text

Go’s Allocator - Large Allocations 54 1 page 2 pages ... > 255 pages Span Span Span Span Span Span mheap Free Spans Span Span Span Span Span mtreap ⇒ randomized binary tree

Slide 55

Slide 55 text

Go’s Allocator - Large Allocations 55 After allocating, depending on the total amount of live memory... The goroutine may perform additional work for the GC!

Slide 56

Slide 56 text

Go’s Allocator - Small Allocations 56 P 1 mcache Each logical processor (P) has a local cache (mcache) P 2 mcache

Slide 57

Slide 57 text

Go’s Allocator - Small Allocations 57 P 1 P 2 mcache mcache Each mcache maintains a span for each size class Span Span ... class 0 class 1 Span Span ... class 0 class 1

Slide 58

Slide 58 text

Go’s Allocator - Small Allocations 58 class bytes/obj bytes/span objects 1 8 8192 1024 2 16 8192 512 3 32 8192 256 4 64 8192 170 ... 65 28672 57344 2 66 32768 32768 1

Slide 59

Slide 59 text

Go’s Allocator - Small Allocations 59 P 1 mcache mcache returns the address for a free object on the span Span Span ... class 0 class 1 Span

Slide 60

Slide 60 text

Go’s Allocator - Small Allocations 60 P 1 mcache mcache request a new span from mcentral for this size class Span Span ... class 0 class 1 Span

Slide 61

Slide 61 text

Go’s Allocator - Small Allocations 61 P 1 mcache Each mcentral has two linked lists, empty and nonempty spans Span Span ... ... class 0 class 1 mcentral mcentral Span Span Span Span

Slide 62

Slide 62 text

Go’s Allocator - Small Allocations 62 P 1 mcache Span with free objects will be given to the mcache Span ... ... class 0 class 1 mcentral mcentral Span Span Span Span

Slide 63

Slide 63 text

Go’s Allocator - Small Allocations 63 P 1 mcache mcentral will try to sweep existing spans Span Span ... ... class 0 class 1 mcentral mcentral Span Span If there are no nonempty spans….

Slide 64

Slide 64 text

Go’s Allocator - Small Allocations 64 As a last resort, mcentral will ask for a new span from mheap class 0 mcentral Span Span mheap

Slide 65

Slide 65 text

Go’s Allocator - Small Allocations 65 mcentral will give this span to mcache class 0 mcentral Span Span mheap Span

Slide 66

Slide 66 text

Go’s Allocator - Tiny Allocations Allocations for objects with no pointers and size < 16 bytes The main targets of tiny allocator are small strings and standalone escaping variables. On a json benchmark the allocator reduces number of allocations by ~12% and reduces heap size by ~20%. 66

Slide 67

Slide 67 text

67 Go’s Allocator - Tiny Allocations 64 bytes Allocated Free Allocated Free ● Each P keeps a 64-bytes object allocated from a span ● Each tiny allocation appends a subobject

Slide 68

Slide 68 text

68 Go’s Allocator - Tiny Allocations Allocated Free mcache P 1 ● Grab a new object from the mcache ≃ small allocation ● Eventually, GC will deallocate the old object Free P 1 mcache

Slide 69

Slide 69 text

● Runtime periodically releases memory to the OS ● Releases spans that were swept more than 5 minutes ago ● In Linux, uses the madvise(2) syscall 69 Go’s Allocator - Releasing memory to the OS madvise(addr, size, _MADV_DONTNEED)

Slide 70

Slide 70 text

70 stats := runtime.MemStats{} runtime.ReadMemStats(&stats) type MemStats struct { ... // Heap memory statistics. HeapAlloc uint64 HeapSys uint64 HeapIdle uint64 HeapInuse uint64 HeapReleased uint64 HeapObjects uint64 ... }

Slide 71

Slide 71 text

References 1. http://goog-perftools.sourceforge.net/doc/tcmalloc.html 2. https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html 3. https://gabrieletolomei.wordpress.com/miscellanea/operating-systems/in-memory-layout/ 4. Lec 10 | MIT 6.172 - https://www.youtube.com/watch?v=p0bc1f6ULxw 5. https://faculty.washington.edu/aragon/pubs/rst89.pdf 6. http://man7.org/linux/man-pages/man2/mmap.2.html 7. http://man7.org/linux/man-pages/man2/madvise.2.html 8. https://nostarch.com/tlpi 71

Slide 72

Slide 72 text

Thanks! andrestc.com @andresantostc 72