Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Go Memory Allocation - Gophercon UK

Understanding Go Memory Allocation - Gophercon UK

Ever wondered how does Go manage memory allocation? In this talk we are going to explore Go’s memory allocator and understand how its algorithm interacts with the operating system to manage memory!

André Carvalho

August 03, 2018
Tweet

More Decks by André Carvalho

Other Decks in Programming

Transcript

  1. Understanding Go Memory
    Allocation
    André Carvalho
    @andresantostc
    1

    View Slide

  2. Developer @
    2
    andrestc.com
    tsuru

    View Slide

  3. 3

    View Slide

  4. Virtual Memory
    ● Processes do not read directly from physical memory
    ○ Security
    ○ Coordination between multiple processes
    ● Virtual Memory abstracts that away from the processes
    ○ Segmentation
    ○ Page tables
    4

    View Slide

  5. Virtual Memory
    5
    Frame 0
    Frame 1
    Frame 2
    Frame 3
    Frame 4
    Frame 5
    Frame 6
    Frame 7
    RAM
    Disk
    Other process
    Page 0
    Page 1
    Page 2
    Page 3
    Page 4
    Page 5
    Page 6
    Page 7
    Page
    Process
    Frame
    Page Table
    3
    6

    View Slide

  6. func main() {
    rand.Seed(time.Now().UnixNano())
    i := rand.Intn(100)
    fmt.Printf("%v at %p\n", i, &i)
    for {
    }
    }
    6

    View Slide

  7. 7
    $ ./vmemory
    53 at 0xc420016110
    $ ./vmemory
    68 at 0xc420016110
    Running two instances at the same time…
    Same virtual address

    View Slide

  8. Process Memory Layout
    8
    Text
    Data
    Heap
    BSS
    Stack
    Program Break
    Code
    Initialized static variables
    Uninitialized static variables
    Dynamic allocated variables
    Function stack frames

    View Slide

  9. Stack Allocation
    9
    Stack Used
    Stack Pointer (SP)
    Unused
    Allocation
    SP += size;
    return Stack[SP-size];
    Deallocation
    SP -= size;

    View Slide

  10. Heap Allocation
    ● For objects with size only known at runtime
    ● C provides malloc and free
    ● C++ provides new and delete
    ● Go uses escape analysis and has garbage collection
    10

    View Slide

  11. Minimal Allocator
    11

    View Slide

  12. Minimal Allocator
    We need to implement two functions
    12
    void* malloc(size_t size)
    void free(void *ptr)

    View Slide

  13. Minimal Allocator
    13
    Application Allocator OS
    malloc
    mmap
    Allocator uses syscalls like mmap/munmap to allocate/deallocate
    munmap
    madvise
    free

    View Slide

  14. Minimal Allocator
    Linked list with free objects
    size=n
    next=*
    Header n bytes
    size=m
    next=nil
    m bytes
    14
    Head

    View Slide

  15. malloc(10)
    15
    Head
    NULL
    Minimal Allocator - Allocating

    View Slide

  16. 16
    Virtual Address Space
    0x000000c000000000
    mmap(
    0x000000c000000000,
    4096,
    PROT_WRITE | PROT_READ,
    MAP_PRIVATE | MAP_ANONYMOUS,
    ...)
    Start Address
    Size
    Permission
    Flags
    Minimal Allocator - Allocating

    View Slide

  17. malloc(10)
    17
    4084
    12 4084
    4096
    Head
    Minimal Allocator - Allocating

    View Slide

  18. malloc(10)
    18
    4062
    Head 10
    22
    Minimal Allocator - Allocating

    View Slide

  19. malloc(10)
    19
    4062
    Head 10
    P
    Allocator returns p, which points right after the header
    Minimal Allocator - Allocating

    View Slide

  20. 20
    4062
    Head
    free(p)
    Minimal Allocator - Deallocating

    View Slide

  21. free(p)
    10
    p - size(header)
    21
    4062
    Head
    Minimal Allocator - Deallocating

    View Slide

  22. ● Can be implemented in a few hundred LOCs
    ● Issues
    ○ Fragmentation
    ○ Corruption
    ○ Releasing memory back to OS
    ■ When?
    ■ How? munmap, madvise...
    ○ Multi-thread
    ○ ...
    22
    Minimal Allocator

    View Slide

  23. Go Runtime
    Allocator
    23
    ● TCMalloc
    ● Invoking the Allocator
    ● Go’s Allocator

    View Slide

  24. Thread-Caching Malloc (TCMalloc)
    ● Originally implemented for the C language by Google
    ● Served as basis for Go’s runtime allocator
    ● Reduces lock contention for multithreaded programs
    24

    View Slide

  25. TCMalloc
    ● Each thread has a local cache
    ● Two types of allocations
    ○ Small allocations (<= 32 kB)
    ○ Large allocations
    ● Manages memory in units called Spans
    ○ Runs of contiguous memory pages
    ○ Metadata is kept separated from the allocation arena
    25

    View Slide

  26. TCMalloc - Large Allocations
    ● Served by the central heap
    ● Requested size is rounded up to number of pages (4kB)
    26
    malloc(34 kB) ⇒ malloc(36 kB) ⇒ 9 pages
    malloc(33 kB) ⇒ malloc(36 kB) ⇒ 9 pages

    View Slide

  27. TCMalloc - Large Allocations
    1 page
    2 pages
    ...
    Span Span
    Span Span
    Central Heap
    Span
    27
    254 pages Span
    > 255
    pages
    Span Span

    View Slide

  28. TCMalloc - Large Allocations
    Application Central Heap
    34 kb
    OS
    X pages
    28
    X pages

    View Slide

  29. TCMalloc - Small Allocations
    ● Served by the local thread cache
    ● Requested size is rounded up to one of the size classes
    29
    malloc(4 bytes) ⇒ malloc(8 bytes)
    malloc(6 bytes) ⇒ malloc(8 bytes)

    View Slide

  30. TCMalloc - Small Allocations
    Class 0
    Class 1
    Class 2
    ...
    Local Thread Cache
    30

    View Slide

  31. TCMalloc - Small Allocations
    Class 0
    Class 1
    Class 2
    ...
    Local Thread Cache
    31

    View Slide

  32. TCMalloc - Small Allocations
    Class 0
    Class 1
    Class 2
    ...
    Local Thread Cache
    Span
    Span
    Span
    ...
    Central Free List
    Class 1
    32
    Run of contiguous pages
    Span

    View Slide

  33. TCMalloc - Small Allocations
    Class 0
    Class 1
    Class 2
    ...
    Local Thread Cache
    Span
    Span
    Span
    ...
    Central Free List
    Class 1
    33

    View Slide

  34. TCMalloc - Small Allocations
    Span
    Span
    Span
    Central Free List
    Class 1
    Span
    1 page
    2 pages
    ...
    > 255
    pages
    Span Span
    Span Span Span
    Span Span
    Central Heap
    34

    View Slide

  35. TCMalloc - Small Allocations
    Span
    Span
    Span
    Central Free List
    Class 1
    Span
    1 page
    2 pages
    ...
    > 255
    pages
    Span Span
    Span Span
    Span Span
    Central Heap
    Span ...
    35
    Span

    View Slide

  36. TCMalloc - Small Allocations
    Application Local Thread Cache
    Central Free List
    Central Heap
    4 bytes
    N 8-byte
    objects
    X pages
    OS
    Y*X pages
    36
    X pages
    N 8-byte
    objects
    Y*X pages

    View Slide

  37. TCMalloc - Deallocation
    Page 1 Page 2 Page 3 Page 4 Page 5 Page 6
    Span A Span B Span C
    37

    View Slide

  38. TCMalloc - Deallocation
    free( ) Page Span
    38

    View Slide

  39. TCMalloc - Deallocation
    free( ) Page Span
    Class 0
    Class 1
    Class 2
    ...
    Local Thread Cache
    Small object
    39

    View Slide

  40. TCMalloc - Deallocation
    free( ) Page Span
    Large object
    Page 1 Page 2 Page 3 Page 4
    Span A Span B Span C
    40

    View Slide

  41. TCMalloc - Deallocation
    free( ) Page Span
    Large object
    Page 1 Page 2 Page 3 Page 4
    Span A Span B
    41

    View Slide

  42. TCMalloc - Deallocation
    free( ) Page Span
    Large object
    1 page
    2 pages
    ...
    > 255
    pages
    Span B
    Central Heap
    42

    View Slide

  43. TCMalloc - Deallocation
    free( ) Page Span
    Large object
    1 page
    2 pages
    ...
    > 255
    pages
    Span B
    Central Heap
    43

    View Slide

  44. Go Runtime
    Allocator
    44
    ● TCMalloc
    ● Invoking the Allocator
    ● Go’s Allocator

    View Slide

  45. package main
    func main() {
    f()
    }
    //go:noinline
    func f() *int {
    i := 10
    return &i
    }
    45

    View Slide

  46. package main
    func main() {
    f()
    }
    //go:noinline
    func f() *int {
    i := 10
    return &i
    }
    46
    $ go build -gcflags "-m -m" main.go
    # command-line-arguments
    ./main.go:8:6: cannot inline f: marked
    go:noinline
    ./main.go:3:6: cannot inline main: non-leaf
    function
    ./main.go:10:9: &i escapes to heap
    ./main.go:10:9: from ~r0 (return) at
    ./main.go:10:2
    ./main.go:9:2: moved to heap: i

    View Slide

  47. 47
    $ go tool compile -S main.go
    ...
    0x001d 00029 (main.go:9) LEAQ type.int(SB), AX
    0x0024 00036 (main.go:9) MOVQ AX, (SP)
    0x0028 00040 (main.go:9) PCDATA $0, $0
    0x0028 00040 (main.go:9) CALL runtime.newobject(SB)
    ...

    View Slide

  48. 48
    $ go tool compile -S main.go
    ...
    0x001d 00029 (main.go:9) LEAQ type.int(SB), AX
    0x0024 00036 (main.go:9) MOVQ AX, (SP)
    0x0028 00040 (main.go:9) PCDATA $0, $0
    0x0028 00040 (main.go:9) CALL runtime.newobject(SB)
    ...
    func newobject(typ *_type) unsafe.Pointer {
    return mallocgc(typ.size, typ, true)
    }

    View Slide

  49. Go Runtime
    Allocator
    49
    ● TCMalloc
    ● Invoking the Allocator
    ● Go’s Allocator

    View Slide

  50. Go’s Allocator
    ● Based of TCMalloc
    ● Garbage Collector
    ○ Tightly coupled with the allocator
    ○ Makes hard (impossible?) to replace with other implementations
    ● Three types of allocations
    ○ Tiny Allocations (size < 16 bytes, no pointers)
    ○ Small Allocations (size <= 32 kbytes)
    ○ Large Allocations
    50

    View Slide

  51. Garbage Collector ⇒ Concurrent mark and sweep
    51
    Go’s Allocator - Sweeping
    1. Scan all objects
    2. Mark objects that are live
    3. Sweep objects that are not live
    a. In background
    b. In response to allocations

    View Slide

  52. Go’s Allocator - Large Allocations
    52
    1 page
    2 pages
    ...
    > 255
    pages
    Span Span
    Span Span Span Span
    mheap
    Busy
    Spans Span Span Span
    Before allocating, mheap sweeps the requested number of pages

    View Slide

  53. Go’s Allocator - Large Allocations
    53
    1 page
    2 pages
    ...
    > 255
    pages
    Span Span
    Span Span
    Span
    Span
    Span Span
    Span
    Span
    Span
    mheap
    Free
    Spans

    View Slide

  54. Go’s Allocator - Large Allocations
    54
    1 page
    2 pages
    ...
    > 255
    pages
    Span Span
    Span Span
    Span
    Span
    mheap
    Free
    Spans
    Span Span
    Span
    Span
    Span
    mtreap ⇒ randomized binary tree

    View Slide

  55. Go’s Allocator - Large Allocations
    55
    After allocating, depending on the total amount of live
    memory...
    The goroutine may perform additional work for the GC!

    View Slide

  56. Go’s Allocator - Small Allocations
    56
    P
    1
    mcache
    Each logical processor (P) has a local cache (mcache)
    P
    2
    mcache

    View Slide

  57. Go’s Allocator - Small Allocations
    57
    P
    1
    P
    2
    mcache
    mcache
    Each mcache maintains a span for each size class
    Span
    Span
    ...
    class 0
    class 1
    Span
    Span
    ...
    class 0
    class 1

    View Slide

  58. Go’s Allocator - Small Allocations
    58
    class bytes/obj bytes/span objects
    1 8 8192 1024
    2 16 8192 512
    3 32 8192 256
    4 64 8192 170
    ...
    65 28672 57344 2
    66 32768 32768 1

    View Slide

  59. Go’s Allocator - Small Allocations
    59
    P
    1
    mcache
    mcache returns the address for a free object on the span
    Span
    Span
    ...
    class 0
    class 1
    Span

    View Slide

  60. Go’s Allocator - Small Allocations
    60
    P
    1
    mcache
    mcache request a new span from mcentral for this size class
    Span
    Span
    ...
    class 0
    class 1
    Span

    View Slide

  61. Go’s Allocator - Small Allocations
    61
    P
    1
    mcache
    Each mcentral has two linked lists, empty and nonempty spans
    Span
    Span
    ...
    ...
    class 0
    class 1
    mcentral
    mcentral
    Span Span
    Span Span

    View Slide

  62. Go’s Allocator - Small Allocations
    62
    P
    1
    mcache
    Span with free objects will be given to the mcache
    Span
    ...
    ...
    class 0
    class 1
    mcentral
    mcentral
    Span
    Span
    Span Span

    View Slide

  63. Go’s Allocator - Small Allocations
    63
    P
    1
    mcache
    mcentral will try to sweep existing spans
    Span
    Span
    ...
    ...
    class 0
    class 1
    mcentral
    mcentral
    Span Span
    If there are no
    nonempty spans….

    View Slide

  64. Go’s Allocator - Small Allocations
    64
    As a last resort, mcentral will ask for a new span from mheap
    class 0 mcentral
    Span Span
    mheap

    View Slide

  65. Go’s Allocator - Small Allocations
    65
    mcentral will give this span to mcache
    class 0 mcentral
    Span Span
    mheap
    Span

    View Slide

  66. Go’s Allocator - Tiny Allocations
    Allocations for objects with no pointers and size < 16 bytes
    The main targets of tiny allocator are small strings and
    standalone escaping variables. On a json benchmark the
    allocator reduces number of allocations by ~12% and
    reduces heap size by ~20%.
    66

    View Slide

  67. 67
    Go’s Allocator - Tiny Allocations
    64 bytes
    Allocated Free Allocated Free
    ● Each P keeps a 64-bytes object allocated from a span
    ● Each tiny allocation appends a subobject

    View Slide

  68. 68
    Go’s Allocator - Tiny Allocations
    Allocated Free
    mcache
    P
    1
    ● Grab a new object from the mcache ≃ small allocation
    ● Eventually, GC will deallocate the old object
    Free
    P
    1
    mcache

    View Slide

  69. ● Runtime periodically releases memory to the OS
    ● Releases spans that were swept more than 5 minutes ago
    ● In Linux, uses the madvise(2) syscall
    69
    Go’s Allocator - Releasing memory to the OS
    madvise(addr, size, _MADV_DONTNEED)

    View Slide

  70. 70
    stats := runtime.MemStats{}
    runtime.ReadMemStats(&stats)
    type MemStats struct {
    ...
    // Heap memory statistics.
    HeapAlloc uint64
    HeapSys uint64
    HeapIdle uint64
    HeapInuse uint64
    HeapReleased uint64
    HeapObjects uint64
    ...
    }

    View Slide

  71. References
    1. http://goog-perftools.sourceforge.net/doc/tcmalloc.html
    2. https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html
    3. https://gabrieletolomei.wordpress.com/miscellanea/operating-systems/in-memory-layout/
    4. Lec 10 | MIT 6.172 - https://www.youtube.com/watch?v=p0bc1f6ULxw
    5. https://faculty.washington.edu/aragon/pubs/rst89.pdf
    6. http://man7.org/linux/man-pages/man2/mmap.2.html
    7. http://man7.org/linux/man-pages/man2/madvise.2.html
    8. https://nostarch.com/tlpi
    71

    View Slide

  72. Thanks!
    andrestc.com
    @andresantostc
    72

    View Slide