ABOUT ME
▸ Platform engineering @ DigitalOcean
▸ Formerly of Red Hat & Venmo
▸ Committer to CentOS, Fedora, Icinga, Ansible
Slide 3
Slide 3 text
PROFESSIONALLY CAREMAD
AT OS'S
Slide 4
Slide 4 text
WHAT WE'LL COVER
▸ Memory
▸ Disk
▸ Network
Slide 5
Slide 5 text
MEMORY
Slide 6
Slide 6 text
VM.SWAPPINESS
▸ Instructs the OS about when to swap
▸ Default value of 60
▸ Set this to 0
Slide 7
Slide 7 text
"My point is that decreasing the tendency of the kernel to swap stuff
out is wrong. You really don't want hundreds of megabytes of
BloatyApp's untouched memory floating about in the machine. Get it out
on the disk, use the memory for something useful." - Andrew Morton
Slide 8
Slide 8 text
TRANSPARENT HUGE PAGES
▸ Intended to keep fewer entries in the TLB
▸ Hands out 2M or 1G pages on malloc()
▸ Breaks when paired with madvise(), particularly with
jemalloc
▸ Disable THP
Slide 9
Slide 9 text
BIT.LY/DO-ALLOCATORS
Slide 10
Slide 10 text
DIRTY RATIO
▸ vm.dirty_background_ratio: start flushing to disk
▸ vm.dirty_ratio: require synchronous I/O
▸ /proc/vmstat should be carefully monitored
Slide 11
Slide 11 text
NUMA
▸ Unlike UMA, NUMA means that memory is addressed per-CPU
▸ Modern systems generally have 2+ NUMA nodes
▸ Local operations on a CPU will cause swapping, even if memory is
available.
▸ /usr/bin/numactl --interleave all
Slide 12
Slide 12 text
DISK
Slide 13
Slide 13 text
FILESYSTEMS
▸ ext4 is still a very safe bet
▸ xfs has had performance issues, largely solved now
▸ btrfs is interesting
Slide 14
Slide 14 text
YEAH, SO DON'T RUN BTRFS
Slide 15
Slide 15 text
SPINNING DISKS
▸ Mostly just don't, except for cold storage
▸ CFQ/elevator is a good bet
Slide 16
Slide 16 text
SSD'S
▸ Use them, they're generally okay
▸ Don't run multi-threaded workloads on consumer grade drives
▸ deadline/noop scheduler
Slide 17
Slide 17 text
MORE BITS ON SSD'S
▸ Controller firmware quality is generally bad
▸ Did I mention controller firmware quality is low?
▸ Your drives might just sometimes die because of firmware
▸ Find a working firmware release, rarely change versions
Slide 18
Slide 18 text
NVME
▸ Non-volatile memory via PCIe
▸ SATA/SAS/Fibre channel are too slow for high-end flash
▸ Currently economical for use as read through/write back cache
▸ Supported in Linux since 3.3
Slide 19
Slide 19 text
NETWORK
Slide 20
Slide 20 text
TCP METRICS
▸ Stores information about congestion and window size for like 1k
connections.
▸ Windows size and congestion information based on previous
conditions.
▸ Set net.ipv4.tcp_no_metrics_save to 1.
Slide 21
Slide 21 text
FIN TIMEOUT
▸ Determines how long to wait for a FIN
▸ Set net.ipv4.tcp_fin_timeout to something below 60