Slide 1

Slide 1 text

Out of memory or plenty to spare? 1 Joshua Thijssen jaytaph The fine details of reading memory consumption

Slide 2

Slide 2 text

Disclaimers: 2 No PHP (but we will still talk about it). Pretty advanced stuff. Simplified.

Slide 3

Slide 3 text

Q: How much memory is our server using? 3

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

4

Slide 6

Slide 6 text

$ free -m total used free shared buffers cached Mem: 3963 3500 462 0 722 1263 -/+ buffers/cache: 1515 2448 Swap: 400 20 379 5

Slide 7

Slide 7 text

$ free -m total used free shared buffers cached Mem: 3963 3500 462 0 722 1263 -/+ buffers/cache: 1515 2448 Swap: 400 20 379 5

Slide 8

Slide 8 text

6 Free Used Buffers / Cache

Slide 9

Slide 9 text

6 Free Used Buffers / Cache Free

Slide 10

Slide 10 text

7 Free Used Buffers / Cache Free

Slide 11

Slide 11 text

Active / Total Objects (% used) : 2187767 / 2283870 (95.8%) Active / Total Slabs (% used) : 261417 / 261421 (100.0%) Active / Total Caches (% used) : 114 / 192 (59.4%) Active / Total Size (% used) : 1013948.21K / 1024061.72K (99.0%) Minimum / Average / Maximum Object : 0.02K / 0.45K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 825808 825378 99% 0.98K 206452 4 825808K ext4_inode_cache 522921 512677 98% 0.19K 24901 21 99604K dentry 394290 364506 92% 0.10K 10110 39 40440K buffer_head 166914 142785 85% 0.04K 1686 99 6744K ext4_extent_status 143756 142975 99% 0.05K 1732 83 6928K jbd2_inode 63700 62887 98% 0.56K 9100 7 36400K radix_tree_node 42273 29552 69% 0.06K 671 63 2684K kmalloc-64 23870 19345 81% 0.12K 770 31 3080K kmalloc-96 16884 14623 86% 0.06K 268 63 1072K anon_vma_chain 12144 12024 99% 0.12K 368 33 1472K kernfs_node_cache 11424 10582 92% 0.19K 544 21 2176K vm_area_struct 10044 7897 78% 0.03K 81 124 324K kmalloc-32 9331 9135 97% 0.56K 1333 7 5332K inode_cache 7434 6552 88% 0.06K 118 63 472K anon_vma 6528 6528 100% 0.62K 1088 6 4352K proc_inode_cache 3312 2330 70% 0.25K 207 16 828K filp 2490 2458 98% 0.05K 30 83 120K ftrace_event_field 2294 2207 96% 0.12K 74 31 296K kmalloc-128 1596 1440 90% 0.19K 76 21 304K kmalloc-192 1568 1476 94% 0.07K 28 56 112K Acpi-Operand 1112 1068 96% 1.00K 278 4 1112K kmalloc-1024 1104 1076 97% 0.09K 24 46 96K ftrace_event_file 882 518 58% 0.19K 42 21 168K cred_jar 880 644 73% 0.25K 55 16 220K skbuff_head_cache 828 802 96% 0.65K 138 6 552K shmem_inode_cache 648 552 85% 0.11K 18 36 72K jbd2_journal_head 608 523 86% 0.25K 38 16 152K kmalloc-256 8 slabtop

Slide 12

Slide 12 text

9 Q: How much memory is our application using?

Slide 13

Slide 13 text

10

Slide 14

Slide 14 text

Processes 11

Slide 15

Slide 15 text

12 Processes

Slide 16

Slide 16 text

12 ➡ Completely isolated from each other. Processes

Slide 17

Slide 17 text

12 ➡ Completely isolated from each other. ➡ Act like they own the place. Processes

Slide 18

Slide 18 text

12 ➡ Completely isolated from each other. ➡ Act like they own the place. ➡ Must ask permission for pretty much everything. Processes

Slide 19

Slide 19 text

13 Operating system (kernel) Process 1 Process 2 Process 3 Process 4 Network Disk I/O Screen

Slide 20

Slide 20 text

14

Slide 21

Slide 21 text

➡ Every process can access up to 4 GB of memory* 14

Slide 22

Slide 22 text

➡ Every process can access up to 4 GB of memory* ➡ Even if your computer does not have 4GB of memory. 14

Slide 23

Slide 23 text

➡ Every process can access up to 4 GB of memory* ➡ Even if your computer does not have 4GB of memory. ➡ Even if your computer does have MORE than 4GB of memory. 14

Slide 24

Slide 24 text

➡ On 64bit machines: ➡ theoretically: 16 exabytes (in relation: size of the internet in 2013 was estimated 672 exabytes) ➡ But most often, only between 8-128TB are used. 15

Slide 25

Slide 25 text

16 Physical Memory

Slide 26

Slide 26 text

16 Physical Memory Virtual Memory

Slide 27

Slide 27 text

17

Slide 28

Slide 28 text

17 g 3 z x

Slide 29

Slide 29 text

17

Slide 30

Slide 30 text

18 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB

Slide 31

Slide 31 text

18 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB

Slide 32

Slide 32 text

18 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB program + data

Slide 33

Slide 33 text

18 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB program + data shared

Slide 34

Slide 34 text

18 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB program + data stack shared

Slide 35

Slide 35 text

18 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB program + data stack shared

Slide 36

Slide 36 text

19 4 kb 4 kb 4 kb 4 kb 4 kb 4 kb 4 kb 0x12340000 0x12341000 0x12342000 0x12343000 0x12344000 0x12345000 0x12346000

Slide 37

Slide 37 text

20

Slide 38

Slide 38 text

21

Slide 39

Slide 39 text

➡ Every process gets its own page table. 21

Slide 40

Slide 40 text

➡ Every process gets its own page table. ➡ Only pages that are actually used are filled! 21

Slide 41

Slide 41 text

➡ Every process gets its own page table. ➡ Only pages that are actually used are filled! ➡ If all pages are filled (ie: 4GB is mapped), the page table would be 4MB in size. 21

Slide 42

Slide 42 text

22 0x12340000 0x00001000 0x12341000 0x00522000 0x12342000 0x00852000 0x12346000 0x00633000 Virt Phys

Slide 43

Slide 43 text

➡ Every virtual address that is used MUST be converted to an actual physical address through the page tables. ➡ Caching in CPU via the Translation Lookaside Buffer (TLB) 23

Slide 44

Slide 44 text

24 1c 1b 1a Physical Virtual

Slide 45

Slide 45 text

24 1c 1b 1a 1b 1a 1c Physical Virtual

Slide 46

Slide 46 text

25 1b 1c 1b 1a 1a 2c 2b 2a 2a 2b 2c 1c Physical Virtual Virtual

Slide 47

Slide 47 text

1b 26 1c 1b 1a 1a 2c 2b 2a 2a 2b 2c 3c 3b 3a 3a 3b 3c 1c Physical Virtual Virtual

Slide 48

Slide 48 text

1b 26 1c 1b 1a 1a 2c 2b 2a 2a 2b 2c 3c 3b 3a 3a 3b 3c 4c 4b 4a 1c Physical Virtual Virtual

Slide 49

Slide 49 text

27 Virtual Virtual 1b 1a 2b 2c 3a 3b 3c 2a 1c Physical 1c 1b 1a 2c 2b 2a 3c 3b 3a 4c 4b 4a * *

Slide 50

Slide 50 text

27 Virtual Virtual Swap 1b 1a 2b 2c 3a 3b 3c 2a 1c Physical 1c 1b 1a 2c 2b 2a 3c 3b 3a 4c 4b 4a * *

Slide 51

Slide 51 text

27 Virtual Virtual Swap 1b 1a 2b 2c 3a 3b 3c Physical 2a 1c 1c 1b 1a 2c 2b 2a 3c 3b 3a 4c 4b 4a * *

Slide 52

Slide 52 text

27 Virtual Virtual Swap 1b 1a 2b 2c 3a 3b 3c Physical 4c 4a 4b 2a 1c 1c 1b 1a 2c 2b 2a 3c 3b 3a 4c 4b 4a * *

Slide 53

Slide 53 text

➡ CPU tells OS when a page is not loaded in memory (Page fault). ➡ OS loads page from SWAP. ➡ OS returns control back. ➡ Process is - NEVER - aware of this. 28 SWAP

Slide 54

Slide 54 text

Quick recap ➡ Every process can use up to 4GB. ➡ Every process gets its own page table. ➡ Every process starts with the smallest possible page table. ➡ Pages can be swapped in/out memory by CPU/OS, unaware by the process. 29

Slide 55

Slide 55 text

30 What happens when a process wants (more) memory?

Slide 56

Slide 56 text

31 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB program + data stack shared

Slide 57

Slide 57 text

31 0x00000000 0x00010000 0xC0000000 0xFFFFFFFF 1 GB 3 GB program + data stack shared

Slide 58

Slide 58 text

Ask the OS: (s)brk() or mmap() 32

Slide 59

Slide 59 text

(s)brk() changes the data segment size 33

Slide 60

Slide 60 text

34

Slide 61

Slide 61 text

34 sbrk(10000)

Slide 62

Slide 62 text

34 Heap sbrk(10000)

Slide 63

Slide 63 text

35

Slide 64

Slide 64 text

➡ sbrk() does not ALLOCATE (phys) memory. 35

Slide 65

Slide 65 text

➡ sbrk() does not ALLOCATE (phys) memory. ➡ sbrk() creates table entries in the page table. 35

Slide 66

Slide 66 text

➡ sbrk() does not ALLOCATE (phys) memory. ➡ sbrk() creates table entries in the page table. ➡ Physical memory usage stays the same. 35

Slide 67

Slide 67 text

➡ sbrk() does not ALLOCATE (phys) memory. ➡ sbrk() creates table entries in the page table. ➡ Physical memory usage stays the same. ➡ Virtual memory usage goes up. 35

Slide 68

Slide 68 text

36 #include #include void main(void) { sbrk(1024 * 1024 * 1024); }

Slide 69

Slide 69 text

36 #include #include void main(void) { sbrk(1024 * 1024 * 1024); } top - 11:11:28 up 8 min, 2 users, load average: 0.01, 0.01, 0.00 Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.9%sy, 0.0%ni, 99.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3922404k total, 492688k used, 3429716k free, 81944k buffers Swap: 1675260k total, 0k used, 1675260k free, 171296k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2132 jthijsse 20 0 1027m 400 324 S 0.0 0.0 0:00.00 memory 1669 jthijsse 20 0 128m 5896 2064 S 0.0 0.2 0:00.83 zsh 1871 jthijsse 20 0 125m 5220 1860 S 0.0 0.1 0:00.18 zsh 1668 jthijsse 20 0 95844 1740 800 S 0.0 0.0 0:00.38 sshd 1870 jthijsse 20 0 95844 1736 796 S 0.0 0.0 0:00.21 sshd 2143 jthijsse 20 0 15028 1328 984 R 6.9 0.0 0:00.09 top

Slide 70

Slide 70 text

➡ Only when we access the virtual memory, actual memory usage goes up. ➡ Only pages that are accessed will be created/loaded into physical memory. 37

Slide 71

Slide 71 text

➡ Allocating 1GB of memory is cheap/fast. ➡ Iterating 1GB of memory is not. 38

Slide 72

Slide 72 text

39 #include #include void main(void) { int j = 1024 * 1024 * 1024; char *a = malloc(j); for (int i=0; i!=j; i++) { a[i] = 1; } }

Slide 73

Slide 73 text

39 #include #include void main(void) { int j = 1024 * 1024 * 1024; char *a = malloc(j); for (int i=0; i!=j; i++) { a[i] = 1; } } top - 11:09:49 up 7 min, 2 users, load average: 0.07, 0.02, 0.00 Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3922404k total, 1543680k used, 2378724k free, 81940k buffers Swap: 1675260k total, 0k used, 1675260k free, 171264k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2095 jthijsse 20 0 1027m 1.0g 352 S 0.0 26.7 0:03.46 memory2 1669 jthijsse 20 0 128m 5896 2064 S 0.0 0.2 0:00.83 zsh 1871 jthijsse 20 0 125m 5220 1860 S 0.0 0.1 0:00.15 zsh 1668 jthijsse 20 0 95844 1740 800 S 0.0 0.0 0:00.38 sshd 1870 jthijsse 20 0 95844 1736 796 S 0.0 0.0 0:00.17 sshd 2059 jthijsse 20 0 15028 1332 992 R 0.0 0.0 0:00.69 top

Slide 74

Slide 74 text

➡ mmap() maps files into memory. ➡ Lazy-load. Only creates virtual pages, loads parts into physical memory and connects when needed. ➡ mmap() can also be used to allocate memory (without connecting to files) 40

Slide 75

Slide 75 text

41

Slide 76

Slide 76 text

41

Slide 77

Slide 77 text

41 mmap()ed

Slide 78

Slide 78 text

42

Slide 79

Slide 79 text

a = "Hello World" if (a == "foobar") { // Do something } 43 [root@localhost ~]# pmap -x 1271 1271: php-fpm: master process (/etc/php-fpm.conf) Address Kbytes RSS Dirty Mode Mapping 001ee000 248 8 0 r-x-- libgssapi_krb5.so.2.2 0022c000 4 4 4 r---- libgssapi_krb5.so.2.2 0022d000 4 4 4 rw--- libgssapi_krb5.so.2.2 0022f000 28 4 0 r-x-- libcrypt-2.12.so 00236000 4 4 4 r---- libcrypt-2.12.so 00237000 4 4 4 rw--- libcrypt-2.12.so .... 08048000 3400 204 0 r-x-- php-fpm 0839a000 328 140 20 rw--- php-fpm 083ec000 96 32 32 rw--- [ anon ] 092b4000 1316 1176 1176 rw--- [ anon ] ... af483000 4 4 4 rw-s- zero (deleted) af484000 28 0 0 r--s- gconv-modules.cache af48b000 131072 0 0 rw-s- zero (deleted) b748b000 160 4 4 rw--- [ anon ] b74b3000 2048 8 0 r---- locale-archive b76b3000 1312 124 124 rw--- [ anon ] b77fb000 4 4 4 rw-s- zero (deleted) b77fc000 4 4 4 rw-s- zero (deleted) b77fd000 4 4 4 rw-s- zero (deleted) b77fe000 4 4 4 rw-s- zero (deleted) b77ff000 4 4 4 rw--- [ anon ] bf876000 84 48 48 rw--- [ stack ] -------- ------- ------- ------- ------- total kB 155544 - - -

Slide 80

Slide 80 text

44

Slide 81

Slide 81 text

45 Process 2 Process 1 Process 3

Slide 82

Slide 82 text

46 48 1c 1b 1a 2c 2b 2a Physical Process 2 Process 1

Slide 83

Slide 83 text

46 48 1c 1b 1a 2c 2b 2a a b c Physical Process 2 Process 1

Slide 84

Slide 84 text

46 48 1c 1b 1a 2c 2b 2a a b c Physical Process 2 Process 1

Slide 85

Slide 85 text

47 fork()

Slide 86

Slide 86 text

48 Process 1

Slide 87

Slide 87 text

48 Process 1 fork() =>

Slide 88

Slide 88 text

48 Process 1 Process 2 fork() =>

Slide 89

Slide 89 text

49 1c 1b 1a 1'c 1'b 1'a a b c Physical Virtual Virtual fork() =>

Slide 90

Slide 90 text

50 1c 1b 1a 1'c 2b 1'a 1a 1b 1c Physical Virtual Virtual fork() => 2b

Slide 91

Slide 91 text

51

Slide 92

Slide 92 text

52 ➡ Don't worry about high virtual memory usage. ➡ Resident memory set is key. ➡ When fork()'ing, memory usage (even RSS) becomes hard to manage / detect. ➡ Don't swap! Recap

Slide 93

Slide 93 text

➡ https://techtalk.intersec.com/2013/07/memory-part-2- understanding-process-memory/ ➡ http://locklessinc.com/articles/memory_usage/ ➡ http://rhaas.blogspot.nl/2012/01/linux-memory- reporting.html ➡ http://people.freebsd.org/~lstewart ➡ http://deathbytape.com/post/110371790629/intro- virtual-memoryarticles/cpumemory.pdf ➡ http://nikic.github.com/2011/12/12/How-big-are-PHP- arrays-really-Hint-BIG.html 53 303 See Other

Slide 94

Slide 94 text

http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 54

Slide 95

Slide 95 text

55 Find me on twitter: @jaytaph Find me for development and training: www.noxlogic.nl Find me on email: [email protected] Find me for blogs: www.adayinthelifeof.nl