OSv malloc

OSv malloc

A briefly explanation of malloc() in OSv

842515eaf8fbb2dfcc75197e7797dc15?s=128

Satoru Takeuchi

October 12, 2017
Tweet

Transcript

  1. 2.

    Introduction • This article is for OSv Advent Calendar [1]

    day 21 • Prerequisite ◦ The basic knowledge about OS/OSv [2] ◦ The knowledge about C++ 11 specification ◦ The basic knowledge about algorithm and data structure ◦ It”s better If you’re familiar with Linux kernel source • Source tree: The latest master branch [3] at Dec 21, 2014 ◦ HEAD commit: f7d3ddd648b38789daa8287626a66863a780f139 ◦ For simplicity、omit debug feature, trace feature and exclusive control mechanism • Pathnames in this document are the relative path from the top of OSv source tree
  2. 3.

    Summary • Defined in core/mempool.c: Less than 2K lines ◦

    The implementation is very simple ◦ A comment says ”Our malloc() is very coarse” :-) • Specialized to small object (I guess it’s for JVM) • One malloc() code is used for both kernel and user apps: It’s OSv way! ◦ It’s like calling kmalloc() directly from user application in linux kernel • Allocating size is aligned to 2^n >=8 (n is a positive integer) ◦ e.g) malloc(4) returns a 8byte memory object
  3. 4.

    The structure of memory management subsystem • Use the different

    mechanisms depend on claiming size ◦ <= 1KiB: mempool ◦ > 1KiB: page allocator user or kernel mempool (It ‘s like sl[auo]b allocator in Linux kernel) memory management subsystem page allocator (It’s like buddy allocator in Linux kernel) malloc()/free()
  4. 5.

    Simplified call trace malloc(size) - > std_malloc(size, align) # Useb

    by other memory allocation APIs(*1) Align size to 2^n >= 8 byte (n is a positive integer) if (size <= 1KiB && after SMP initialization(always true for applications)) malloc_pools[lg(n)].alloc() # Allocate from mempool else if (1KiB < size <= 4KiB) memory::alloc_page() # Allocate one page from page allocator else malloc_large() # Allocate pages from page allocator # Use the most complex logic among # the three ways here (Omit to explain here) *1) If you explicitly set alignment, call trace becomes more complex. For more information, please refer to the implementation of std_malloc()
  5. 6.

    Allocating an object <= 1KiB • Use mempool as I

    described before ◦ Managed by class malloc_pool (inherit from class pool) • Definition: malloc_pool malloc_pools[] ◦ One pool for each object size ▪ # of this array is lg(page size)+1. In x86_64 (its page size is 4KiB), it’s 12+1=13 ▪ Each pool corresponds to allocation size, 1,2,4,8,...,4KiB,8KiB • I wonder why last entry corresponds to 8KiB exists... ◦ malloc() from applications never use mempool[11,…,(page_size+1)]
  6. 7.

    mempool • slab allocator[4] in OSv. Used for small object

    <= page size ◦ Managed by class pool # I consider it’s name is too abstract • Handle memory objects from 8byte to 1page(4KiB) ◦ [8,1KiB) => Multiple objects in one page ◦ (1KiB,4KiB] => one object in one page • Have per-CPU cache for improving scalability in MP system ◦ Because of the locality of reference, improving the probability of allocating object which callee CPU recently used ◦ Exclusive control is not necessary on memory allocation by mempool • For more information, please refer to the comment beginning from ” Memory allocation strategy” and source code
  7. 8.

    page allocator • Used for page size or more object

    ◦ Managed by “class page_range_allocator” • Have two level caches, named L1 and L2 ◦ L1: per-CPU cache ◦ L2: global cache • Global page allocator: The bottom layer under these two caches kernel subsystem (including mempool) L1 cache (per-CPU cache) L2 cache (Shared among all CPUs) global page allocator page allocator
  8. 9.

    page allocator: L1 cache • per-CPU cache • Managed by

    “struct l1” ◦ Cache up to “l1::max(=512)” pages • Definition: “l1 percpu_l1[<# of CPUs>]” • UI: “*_local” don’t synchronous fill/refill pages ◦ Page allocation: l1::alloc_page{,_local} ◦ Page free: l1::free_page{,_local} • Interface to L2 cache ◦ Async: Use per-CPU thread, “page_pool_l1_<cpu>” ▪ # of pages < l1::max*¼ => Fill some pages ▪ # of pages > l1::max*¾ => Refill some pages ◦ Sync: If # of pages becomes 0 or l1::max, fill/refill some pages
  9. 10.

    page allocator: L2 cache • Cache shared among all CPUs

    • Managed by class l2 ◦ Cache up to “l2::max” pages • UI (used by mempool): “try_*” don’t synchronously fill/refill pages ◦ Multiple page allocation: l2::{try_,}alloc_page_batch ◦ Multiple page free: l2::{try_,}free_page_batch • Interface to global page allocator ◦ Async: Use per-CPU thread, “page_pool_l2” ▪ # of pages < l2::max*¼ => Fill some pages ▪ # of pages > l2::max*¾ => Refill some pages ◦ Sync: If # of pages becomes 0 or l2::max, fill/refill some pages
  10. 11.

    page allocator: global page allocator • The deepest component in

    the OSv’s memory management subsystem • Manage whole free pages in whole system • Omit to explain it here due to lack of my extra time… ;-(
  11. 12.

    References 1. OSv Advent Calendar2014 http://qiita.com/advent-calendar/2014/osv 2. OSvのご紹介 in OSC2014

    Tokyo/Fall, Takuya ASADA, Cloudius Systems http://www.slideshare.net/syuu1228/osv-in-osc2014-tokyofall 3. The source tree of OSv https://github.com/cloudius-systems/osv 4. slab allocation at Wikipedia http://en.wikipedia.org/wiki/Slab_allocation 5. mallocの旅(Glibc編), こさき@ぬまづ http://www.slideshare.net/kosaki55tea/glibc-malloc
  12. 13.

    Extra: My impression after reading OSv’s code • First I

    tried to just the code of malloc(). However, finally, I read most of memory management code. ◦ I forgot that kernel and user apps run on the same memory space • The source is simple and easy to read (than the giant linux kernel code) ◦ There seems to be plenty of room to improve performance ◦ Nice code to learn OS • There is malloc(size, align) which is similar to posix_memalign() ◦ Only kernel can use it because it’s not exported to user applications ◦ It can be accomplished by C++’s overload feature. Viva C++! ▪ When I encountered these two functions, I couldn’t understand why it works since I forgot the overload feature at that time • My last experience in C++ is 10 years ago (C++89 era)...