Slide 1

Slide 1 text

THIS IS YOUR DATABASE ON LINUX @SAMKOTTLER

Slide 2

Slide 2 text

ABOUT ME ▸ Platform engineering @ DigitalOcean ▸ Formerly of Red Hat & Venmo ▸ Committer to CentOS, Fedora, Icinga, Ansible

Slide 3

Slide 3 text

PROFESSIONALLY CAREMAD AT OS'S

Slide 4

Slide 4 text

WHAT WE'LL COVER ▸ Memory ▸ Disk ▸ Network

Slide 5

Slide 5 text

MEMORY

Slide 6

Slide 6 text

VM.SWAPPINESS ▸ Instructs the OS about when to swap ▸ Default value of 60 ▸ Set this to 0

Slide 7

Slide 7 text

"My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful." - Andrew Morton

Slide 8

Slide 8 text

TRANSPARENT HUGE PAGES ▸ Intended to keep fewer entries in the TLB ▸ Hands out 2M or 1G pages on malloc() ▸ Breaks when paired with madvise(), particularly with jemalloc ▸ Disable THP

Slide 9

Slide 9 text

BIT.LY/DO-ALLOCATORS

Slide 10

Slide 10 text

DIRTY RATIO ▸ vm.dirty_background_ratio: start flushing to disk ▸ vm.dirty_ratio: require synchronous I/O ▸ /proc/vmstat should be carefully monitored

Slide 11

Slide 11 text

NUMA ▸ Unlike UMA, NUMA means that memory is addressed per-CPU ▸ Modern systems generally have 2+ NUMA nodes ▸ Local operations on a CPU will cause swapping, even if memory is available. ▸ /usr/bin/numactl --interleave all

Slide 12

Slide 12 text

DISK

Slide 13

Slide 13 text

FILESYSTEMS ▸ ext4 is still a very safe bet ▸ xfs has had performance issues, largely solved now ▸ btrfs is interesting

Slide 14

Slide 14 text

YEAH, SO DON'T RUN BTRFS

Slide 15

Slide 15 text

SPINNING DISKS ▸ Mostly just don't, except for cold storage ▸ CFQ/elevator is a good bet

Slide 16

Slide 16 text

SSD'S ▸ Use them, they're generally okay ▸ Don't run multi-threaded workloads on consumer grade drives ▸ deadline/noop scheduler

Slide 17

Slide 17 text

MORE BITS ON SSD'S ▸ Controller firmware quality is generally bad ▸ Did I mention controller firmware quality is low? ▸ Your drives might just sometimes die because of firmware ▸ Find a working firmware release, rarely change versions

Slide 18

Slide 18 text

NVME ▸ Non-volatile memory via PCIe ▸ SATA/SAS/Fibre channel are too slow for high-end flash ▸ Currently economical for use as read through/write back cache ▸ Supported in Linux since 3.3

Slide 19

Slide 19 text

NETWORK

Slide 20

Slide 20 text

TCP METRICS ▸ Stores information about congestion and window size for like 1k connections. ▸ Windows size and congestion information based on previous conditions. ▸ Set net.ipv4.tcp_no_metrics_save to 1.

Slide 21

Slide 21 text

FIN TIMEOUT ▸ Determines how long to wait for a FIN ▸ Set net.ipv4.tcp_fin_timeout to something below 60

Slide 22

Slide 22 text

OTHER CHANGES ▸ Disable SYN cookies ▸ ECN

Slide 23

Slide 23 text

THANKS! ▸ [email protected] ▸ github.com/skottler