Slide 1

Slide 1 text

Lessons from defending the indefensible Marek Majkowski marek@cloudflare.com @majek04

Slide 2

Slide 2 text

2 Denial of service (DoS) (source: the internet)

Slide 3

Slide 3 text

Unique view 3

Slide 4

Slide 4 text

DoS attempts daily 4 DoS events per day

Slide 5

Slide 5 text

Defending from DoS is hard 5 X Attacker Visitor example.com

Slide 6

Slide 6 text

Attack surface 6 Internet Router NIC Kernel App ~2Mpps 0.3Mpps 0.1Mpps >20M pps

Slide 7

Slide 7 text

Attack surface 7 Internet Router NIC Kernel App this presentation

Slide 8

Slide 8 text

Agenda 1. Network congestion 2. L3 - High volume packet floods 3. L4 - Packet floods against TCP stack 4. L7 - Botnets 5. L7+ - Very large botnets 8

Slide 9

Slide 9 text

Network congestion 9

Slide 10

Slide 10 text

BGP null routing 10 ! route 1.2.3.4/32 {! discard;! community [ 13335:666 13335:668 13335:36006 ];! }!

Slide 11

Slide 11 text

Application integration 11 1.2.3.4 1.2.3.5 1.2.3.6 ! dig A example.com! 1.2.3.7 X

Slide 12

Slide 12 text

High volume packet floods (L3) 12

Slide 13

Slide 13 text

13 Let it flow (source: Yogendra Joshi)

Slide 14

Slide 14 text

High volume packet flood 14 Packets per second

Slide 15

Slide 15 text

UDP DNS flood 15 ! IP 202.194.181.95.15443 > 1.2.3.4:53: 63476% [1au] A? example.com. (50)! IP 221.12.236.115.6570 > 1.2.3.4:53: 11406% [1au] A? example.com. (50)! IP 203.94.134.43.18473 > 1.2.3.4:53: 8559% [1au] A? example.com. (50)! IP 203.196.66.75.32573 > 1.2.3.4:53: 47971% [1au] A? example.com. (50)! IP 124.240.198.136.23336 > 1.2.3.4:53: 61152% [1au] A? example.com. (50)! IP 218.247.70.185.11679 > 1.2.3.4:53: 16360% [1au] A? example.com. (50)! IP 202.109.218.98.27549 > 1.2.3.4:53: 17829% [1au] A? example.com. (50)! IP 203.148.240.82.21825 > 1.2.3.4:53: 22590% [1au] A? example.com. (50)! IP 211.167.108.67.25782 > 1.2.3.4:53: 17663% [1au] A? example.com. (50)! IP 203.209.60.18.20221 > 1.2.3.4:53: 38257% [1au] A? example.com. (50)! IP 203.81.181.168.12749 > 1.2.3.4:53: 53492% [1au] A? example.com. (50)!

Slide 16

Slide 16 text

Sad DNS server 16

Slide 17

Slide 17 text

17 Spoofed? (source: DaPuglet)

Slide 18

Slide 18 text

18 Drop!

Slide 19

Slide 19 text

19 1 in 10K packets

Slide 20

Slide 20 text

Packet characteristics 20 ! • Packet length • Payload • Goal: limit false positives

Slide 21

Slide 21 text

Matching on payload in iptables 21

Slide 22

Slide 22 text

Payload matching with BPF 22 ! iptables -A INPUT \! --dst 1.2.3.4 \! -p udp --dport 53 \! -m bpf --bytecode "14,0 0 0 20,177 0 0 0,12 0 0 0,7 0 0 0,64 0 0 0,21 0 7 124090465,64 0 0 4,21 0 5 1836084325,64 0 0 8,21 0 3 56848237,80 0 0 12,21 0 1 0,6 0 0 1,6 0 0 0" \! -j DROP!

Slide 23

Slide 23 text

BPF bytecode 23 ! ldx 4*([14]&0xf)! ld #34! add x! tax! lb_0:! ldb [x + 0]! add x! add #1! tax! ld [x + 0]! jneq #0x07657861, lb_1! ld [x + 4]! jneq #0x6d706c65, lb_1! ld [x + 8]! jneq #0x03636f6d, lb_1! ldb [x + 12]! jneq #0x00, lb_1! ret #1! lb_1:! ret #0!

Slide 24

Slide 24 text

Tcpdump expressions • Originally: • xt_bpf implemented in 2013 by Willem de Bruijn ! • Tcpdump expressions are limited - no variables • Benefits in hand-crafting BPF 24 tcpdump -n “udp and port 53”

Slide 25

Slide 25 text

BPF tools 25 • Open source: • https://github.com/cloudflare/bpftools • Can match various DNS patterns: • *.example.com! • --case-insensitive *.example.com! • --invalid-dns

Slide 26

Slide 26 text

26 ~2M pps

Slide 27

Slide 27 text

27 Happy DNS server

Slide 28

Slide 28 text

Sad OS - interrupt storms 28

Slide 29

Slide 29 text

Payload matching close to NIC 29

Slide 30

Slide 30 text

Modern NIC's 30 Network card RX Queue #1 RX Queue #2 RX Queue #3 RX Queue #N Ethernet CPU #1 CPU #2 CPU #3 CPU #N

Slide 31

Slide 31 text

Traditional kernel bypass 31 Network card User space RX Queue #1 RX Queue #2 RX Queue #3 RX Queue #N Ethernet

Slide 32

Slide 32 text

Partial kernel bypass 32 Network card RX Queue #1 RX Queue #2 RX Queue #N RX Queue #? user space Kernel Ethernet aka bifurcated driver

Slide 33

Slide 33 text

Partial kernel bypass ! • Or EFVI for SolarFlares: • http://www.openonload.org/ • Open sourced netmap patch, tested on Intel: • https://github.com/luigirizzo/netmap/pull/87 33

Slide 34

Slide 34 text

Iptables offload 34 Network card RX Queue #1 RX Queue #2 RX Queue #N RX Queue #? ! userspace offload Ethernet Kernel

Slide 35

Slide 35 text

35 >3M pps It works really well

Slide 36

Slide 36 text

36 It works really well

Slide 37

Slide 37 text

No characteristics: Attacks against TCP/IP network stack (L4) 37

Slide 38

Slide 38 text

ACK floods 38 ! IP 48.60.32.50.15244 > 1.2.3.4.80: Flags [.], ack 1754729313, win 16153! IP 31.102.214.103.13396 > 1.2.3.4.80: Flags [.], ack 1569851274, win 15707! IP 112.36.216.55.56515 > 1.2.3.4.80: Flags [.], ack 2051477187, win 16102! IP 65.130.63.30.10341 > 1.2.3.4.80: Flags [.], ack 2108282782, win 16112! IP 16.18.205.115.15962 > 1.2.3.4.80: Flags [.], ack 1359019408, win 16119! IP 128.177.247.54.13752 > 1.2.3.4.80: Flags [.], ack 1416531343, win 16102! IP 204.59.118.78.61528 > 1.2.3.4.80: Flags [.], ack 348671255, win 16101! IP 119.195.142.20.3344 > 1.2.3.4.80: Flags [.], ack 1917538144, win 16161! IP 70.197.6.24.39340 > 1.2.3.4.80: Flags [.], ack 1920842431, win 16124!

Slide 39

Slide 39 text

39 ~0.3M pps

Slide 40

Slide 40 text

Statefull firewall - conntrack 40 ! iptables -A INPUT \! --dst 1.2.3.4 \! -m conntrack --ctstate INVALID \! -j DROP! ! sysctl -w net/netfilter/nf_conntrack_tcp_loose=0!

Slide 41

Slide 41 text

41 ~2M pps

Slide 42

Slide 42 text

Effective against TCP attacks • Works well against: • ACK • FIN • RST • X-mas • What about SYN floods? 42

Slide 43

Slide 43 text

SYN floods 43 ! IP 94.242.250.109.47330 > 1.2.3.4:80: Flags [S], seq 1444613291, win 63243! IP 188.138.1.240.61454 > 1.2.3.4:80: Flags [S], seq 1995637287, win 60551! IP 207.244.90.205.17572 > 1.2.3.4:80: Flags [S], seq 1523683071, win 61607! IP 94.242.250.224.65127 > 1.2.3.4:80: Flags [S], seq 928944042, win 61778! IP 207.244.90.205.43074 > 1.2.3.4:80: Flags [S], seq 137074667, win 63891! IP 64.22.81.44.23865 > 1.2.3.4:80: Flags [S], seq 838596928, win 63808! IP 188.138.1.137.23373 > 1.2.3.4:80: Flags [S], seq 593106072, win 60272! IP 207.244.90.205.39653 > 1.2.3.4:80: Flags [S], seq 47289666, win 63210! IP 208.66.78.204.64197 > 1.2.3.4:80: Flags [S], seq 1850809890, win 62714! IP 207.244.90.205.33108 > 1.2.3.4:80: Flags [S], seq 319707959, win 63351! IP 207.244.90.205.6937 > 1.2.3.4:80: Flags [S], seq 1591500126, win 63902! IP 213.152.180.151.60560 > 1.2.3.4:80: Flags [S], seq 1902119375, win 62511! IP 64.22.79.127.11061 > 1.2.3.4:80: Flags [S], seq 1456438676, win 62148!

Slide 44

Slide 44 text

44 0M pps

Slide 45

Slide 45 text

SYN in Linux 45 SYN backlog SYN_RECV Listen backlog ESTABLISHED SYN ACK accept() App SYN+ACK

Slide 46

Slide 46 text

SYN Cookies 46 5 bits t mod 32 3 bits MSS 24 bits hash(ip, port, t) sequence number: 26 bits timestamp 1 bit ECN 1 bit SACK 4 bits wscale timestamp: ! sysctl -w net.ipv4.tcp_syncookies = 1! sysctl -w net.ipv4.tcp_timestamps = 1!

Slide 47

Slide 47 text

47 ~0.3M pps

Slide 48

Slide 48 text

Recent changes • The idea is to remove the LISTEN lock • Heavy refactoring of the SYN queue • Submitted by Eric Dumazet in early October 2015 • Merged to net-next, will land in 4.4 48

Slide 49

Slide 49 text

Connections from a botnet (L7) 49

Slide 50

Slide 50 text

50 Real TCP/IP connections

Slide 51

Slide 51 text

Small volume 51 Packets per second

Slide 52

Slide 52 text

Symptoms 52 • Concurrent connection count going up • Many sockets in "orphaned" state • "Time waits" socket state indicates churn

Slide 53

Slide 53 text

Sad HTTP server 53

Slide 54

Slide 54 text

54 IP reputation (source: the internet)

Slide 55

Slide 55 text

Reputation in iptables 1. Conntrack Connlimit 2. Hashlimits • Rate limit SYN packets per IP 3. Ipset • Manual blacklisting - feed IP blacklist from HTTP server logs • Supports subnets, timeouts • Automatic blacklisting hashlimits 55

Slide 56

Slide 56 text

Make it a SYN flood ! ! ! ! ! • Disable HTTP keep-alives • Make it a SYN flood 56 ! GET / HTTP/1.1! Host: www.example.com! ! GET / HTTP/1.1! Host: www.example.com! ! GET / HTTP/1.1! Host: www.example.com! ...!

Slide 57

Slide 57 text

Very large botnets (L7+) 57

Slide 58

Slide 58 text

Very large botnets • Blacklist IP's based on payload • "BPF" or "string" module for match + ipsets auto expiry 58 ! GET /forum.php HTTP/1.1! Accept: */*! Accept-Language: zh-cn! Accept-Encoding: gzip, deflate! User-Agent: Mozilla/5.0 (compatible; Baiduspider/2.0;... ! Host: www.example.com:80! Connection: Keep-Alive!

Slide 59

Slide 59 text

300k RPS, 650k uniques 59 (source: CloudFlare blog)

Slide 60

Slide 60 text

Detection 60

Slide 61

Slide 61 text

Sflow for real time analytics 61 sflow central aggregation switch switch switch

Slide 62

Slide 62 text

Centralized Sflow 62 ! $ tailsflow -i sflow | tcpdump -n -r - -c 10 'vlan and ip'! reading from file -, link-type EN10MB (Ethernet)! IP 10.11.8.17.8070 > 10.11.8.82.24982:! IP 10.16.8.95.8070 > 10.16.10.139.33176: 18:55:22.345369! IP 70.215.131.237.3232 > 104.16.19.35.80: 18:55:22.345371! IP 162.222.178.71.35563 > 173.245.58.146.53:! IP 199.71.213.20.40150 > 173.245.58.146.53: 18:55:22.345430 IP 195.175.255.138.62803 > 173.245.58.221.53:! IP 220.213.193.137.52163 > 104.31.188.8.80: ! IP 10.40.8.97.8070 > 10.40.8.59.46943:! IP 115.231.91.118.35120 > 173.245.58.146.53:! IP 10.12.11.5.8070 > 10.12.8.106.24514:!

Slide 63

Slide 63 text

Host-sflowd 63 iptables -I INPUT \! -m statistic \! --mode random --probability 0.00048828125 \! -j NFLOG --nflog-group 33! ! hsflowd -d -f hsflowd.conf -o /var/run/hsflowd.auto - p /var/run/hsflowd.pid sflow {! DNSSD = off! collector {! ip = 4.3.2.1! udpport = 6343! }! nflogProbability = 0.00048828125! nflogGroup = 33! polling = 300! }

Slide 64

Slide 64 text

• You WILL BGP null-route • Prepare your application for that • DROP all the packets! (only 1 in 10k could be valid!) • With BPF • Partial kernel bypass for better speed • Iptables are powerful • Connlimit, hashlimits, ipsets (please fill the attendee excitement form!) marek@cloudflare.com @majek04 Thanks!

Slide 65

Slide 65 text

65

Slide 66

Slide 66 text

Exciting system tweaks 66 Appendix A

Slide 67

Slide 67 text

! ethtool -N eth3 flow-type udp4 \! dst-ip 192.168.254.30 \! dst-port 53 action -1! NIC: discard with flow steering 67

Slide 68

Slide 68 text

Tip: Flow steering for priority 68 ! ethtool -X eth3 weight 0 1 1 1 1 1 1 1 1 1 1! ethtool -N eth3 flow-type tcp4 \! dst-port 22 action 0!

Slide 69

Slide 69 text

SYN backlog size 1. Listen backlog size ! 2. Capped by somaxconn ! 3. SYN backlog capped with ! 4. Rounded to next power of two 69 sysctl -w net.ipv4.tcp_max_syn_backlog = 65535 listen(int sockfd, int backlog) sysctl -w net.core.somaxconn = 65535 127 --> 128 128 -->256

Slide 70

Slide 70 text

SYN backlog decay 70 ! sysctl -w net.ipv4.tcp_synack_retries=1!

Slide 71

Slide 71 text

L7 connection count 71 ! sysctl -w net.ipv4.tcp_max_orphans=262144! sysctl -w net.ipv4.tcp_orphan_retries=1! ! sysctl -w net.ipv4.tcp_max_tw_buckets=360000! sysctl -w net.ipv4.tcp_tw_reuse=1! sysctl -w net.ipv4.tcp_fin_timeout=5!

Slide 72

Slide 72 text

Iptables examples 72 Appendix B

Slide 73

Slide 73 text

L3: u32 73 ! iptables -A INPUT \! --dst 1.2.3.4 \! -p udp -m udp --dport 53 \! -m u32 --u32 "6&0xFF=0x6 && 4&0x1FFF=0 && 0>>22&0x3C@4=0x29"\! -j DROP!

Slide 74

Slide 74 text

L4: Conntrack 74 ! iptables -t raw -A PREROUTING \! -i eth2 \! --dst 1.2.3.4 \! -j ACCEPT! ! iptables -t raw -A PREROUTING \! -i eth2 \! -j NOTRACK! ! ! iptables -A INPUT \! --dst 1.2.3.4 \! -m conntrack --ctstate INVALID \! -j DROP!

Slide 75

Slide 75 text

Tuning conntrack 75 ! sysctl -w net.netfilter.nf_conntrack_helper=0! ! sysctl -w net.nf_conntrack_max=2000000! echo 2500000 > /sys/module/nf_conntrack/parameters/hashsize! ! sysctl -w net/netfilter/nf_conntrack_tcp_loose=0!

Slide 76

Slide 76 text

L7: Connlimit 76 ! iptables -t raw -A PREROUTING \! -i eth2 \! --dst 1.2.3.4 \! -j ACCEPT! ! iptables -A INPUT \! --dst 1.2.3.4 \! -p tcp -m tcp --dport 80 \! -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN \! -m connlimit \! --connlimit-above 10 \! --connlimit-mask 32 \! --connlimit-saddr \! -j DROP!

Slide 77

Slide 77 text

L7: ipset for blacklisting 77 ! ipset -exist create ta_d335c5 hash:net family inet! ! ipset add ta_d335c5 192.168.0.0/16! ipset add ta_d335c5 10.0.0/8! ! iptables -A INPUT \! -m set --match-set ta_d335c5 src \! -j DROP!

Slide 78

Slide 78 text

L7: being evil - TARPIT 78 ! iptables -A INPUT \! -m set --match-set ta_d335c5 src \! -j TARPIT!

Slide 79

Slide 79 text

L7: hashlimit for rate limiting 79 ! iptables -A INPUT \! --dst 1.2.3.4 -p tcp -m tcp --dport 80\! --tcp-flags FIN,SYN,RST,PSH,ACK,URG SYN \! -m hashlimit \! --hashlimit-above 123/sec \! --hashlimit-burst 5 \! --hashlimit-mode srcip \! --hashlimit-srcmask 24 \! --hashlimit-name 341654b1d4af9bf \! -j DROP!

Slide 80

Slide 80 text

L7: auto-blacklisting 80 ! ipset -exist create blacklist hash:net timeout 60! ! iptables -A INPUT \! --dst 1.2.3.4 \! -m set --match-set blacklist src \! -j DROP! ! iptables -A INPUT \! --dst 1.2.3.4 -p tcp -m tcp --dport 80\! --tcp-flags FIN,SYN,RST,PSH,ACK,URG SYN \! -m hashlimit \! --hashlimit-above 100/sec \! --hashlimit-mode srcip \! --hashlimit-srcmask 24 \! --hashlimit-name hl_blacklist \! -j SET --add-set blacklist src!

Slide 81

Slide 81 text

L7+: payload in TCP - string 81 ! iptables -A INPUT \! ! --dst 1.2.3.4 \! ! -p tcp --dport 80 \! ! -m string \! ! --hex-string 486f73743a207777772e787878787878782e... \! --from 231 --to 300 \! -j DROP!

Slide 82

Slide 82 text

L7+: payload in TCP - BPF 82 ! $ ./fixed_offset.py 'Host: www.xxxxxxx.com:80\r\n' 231! ! ip[231:4] == 0x486f7374 and ip[235:4] == 0x3a207777 and ip[239:4] == 0x772e7878 and ip[243:4] == 0x78787878 and ip[247:4] == 0x782e636f and ip[251:4] == 0x6d3a3830 and ip[255:2] == 0x0d0a! (source: fixed_offset.py)