Slide 1

Slide 1 text

Automatic DoS mitigation (streaming algorithms) Marek Majkowski marek@cloudflare.com @majek04

Slide 2

Slide 2 text

2 Who we are

Slide 3

Slide 3 text

3 Large network

Slide 4

Slide 4 text

4 Content neutral

Slide 5

Slide 5 text

5 DoS is a problem DoS events per day

Slide 6

Slide 6 text

6 X example.com Defending from DoS is hard

Slide 7

Slide 7 text

• L3 - spoofed IP packets • source IP addresses are fake • very large • this is what you hear in news • L7 - fully established TCP connections • IP reputation is effective 7 Two DoS types

Slide 8

Slide 8 text

8 L3 Volume per server Packets per second

Slide 9

Slide 9 text

9 Automatic attack handling Attack Detection Mitigation database Reactive Automation sflow iptables

Slide 10

Slide 10 text

10 Automatic attack detection Attack Detection sflow

Slide 11

Slide 11 text

• infinite data stream on input • approximate 11 Streaming algorithms Streaming algorithm Data stream Results

Slide 12

Slide 12 text

• sflow packets samples as input • detected attacks on output 12 Attack detection is streaming! Streaming algorithms Packet samples Attacks

Slide 13

Slide 13 text

• EWMA - Exponentially weighted moving average • Counting rates of packets • Space saving • Known as Top-N or Heavy Hitters • Simplified hierarchical heavy hitters • Hyper log log • Cardinality estimation - Counting unique things 13 Streaming algorithms

Slide 14

Slide 14 text

14 The problem: PPS ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878 --ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...!

Slide 15

Slide 15 text

15 Naive approach pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1

Slide 16

Slide 16 text

16 There is no such thing as pps

Slide 17

Slide 17 text

17 Naive: Moving average 1.0s 1.1s 1.3s 1.8s 1.99s 2.1s 2.4s 2.41s t=2.50s Precisely 5 samples

Slide 18

Slide 18 text

18 Not-smoothed values 1.0s 1.1s 1.3s 1.8s 1.99s 2.1s 2.4s 2.41s 100 3 50 5 2 5 10 raw pps=

Slide 19

Slide 19 text

19 Not-smoothed values

Slide 20

Slide 20 text

20 Linux load average - charge

Slide 21

Slide 21 text

21 Linux load average - discharge

Slide 22

Slide 22 text

22 Better: EWMA old load difference dampening factor measurement frequency half-life time

Slide 23

Slide 23 text

23

Slide 24

Slide 24 text

24

Slide 25

Slide 25 text

• Smoothed average • The same maths as Linux "load average" • Charges slow (half-life) • Discharges quickly • Can be also used to count rates of packets 25 EWMA - summary

Slide 26

Slide 26 text

26 The problem: PPS ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878 --ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...!

Slide 27

Slide 27 text

27 The problem: Memory pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 ...

Slide 28

Slide 28 text

• aka: heavy hitters • A fixed-memory data structure • That can "count" top-N items • think: top url's, top customer IP's, etc • Count-Min sketch, Space Saving 28 Top-N problem

Slide 29

Slide 29 text

29 Space saving error count key

Slide 30

Slide 30 text

30 Space saving error count key 0 1 Alice Alice

Slide 31

Slide 31 text

31 Space saving error count key 0 2 Alice Alice

Slide 32

Slide 32 text

32 Space saving error count key 0 2 Alice 0 1 Ben Ben

Slide 33

Slide 33 text

33 Space saving error count key 0 2 Alice 0 1 Ben 0 1 Charlie Charlie

Slide 34

Slide 34 text

34 Space saving error count key 0 2 Alice 0 1 Ben 0 1 Charlie Eric?

Slide 35

Slide 35 text

35 Space saving error count key 0 2 Alice 0 1 Ben 0 1 Charlie Eric?

Slide 36

Slide 36 text

36 Space saving error count key 0 2 Alice 1 0 Eric 0 1 Charlie + Eric

Slide 37

Slide 37 text

37 Space saving error count key 0 2 Alice 1 1 Eric 0 1 Charlie Eric

Slide 38

Slide 38 text

38 Space saving error count key 0 2 Alice 1 1 Eric 0 1 Charlie 2 Counter? 1 .. 2 1

Slide 39

Slide 39 text

39

Slide 40

Slide 40 text

What about rates? 40 • It's hard • was: GetAll() • now: GetAll(time.Time) • No longer O(1) • Instead O(log n)

Slide 41

Slide 41 text

41

Slide 42

Slide 42 text

• Top-N / Heavy-hitter algorithm • Fixed memory size • Strong error guarantees 42 Summary - Space saving

Slide 43

Slide 43 text

43 Aggregating attacks ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878 --ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...! ! Mpps Descr! 35.878 --ip=141.245.59.0/24! vs

Slide 44

Slide 44 text

44 Hierarchical Heavy Hitters

Slide 45

Slide 45 text

45 Simplified HHH

Slide 46

Slide 46 text

46 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M 2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24

Slide 47

Slide 47 text

47 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M 2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 incoming sample: 42.1.2.4:80

Slide 48

Slide 48 text

48 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M 2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 reporting threshold: 1M

Slide 49

Slide 49 text

49 Attack report ! Mpps Descr! 12.2 --ip=1.2.3.4 --port=53! 2.4 --ip=42.1.2.4 --port=80! 12.2 --ip=1.2.3.4! 2.4 --ip=42.1.2.4! 12.2 --ip=1.2.3.0/24! 2.4 --ip=42.1.2.0/24!

Slide 50

Slide 50 text

50 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M 2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 0.1M 1.2.3.4 0M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 0.1M 1.2.3.0/24 0M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 incoming sample: 42.1.2.4:80

Slide 51

Slide 51 text

51 Attack report ! Mpps Descr! 12.2 --ip=1.2.3.4 --port=53! 2.4 --ip=42.1.2.4 --port=80!

Slide 52

Slide 52 text

52 Scales well

Slide 53

Slide 53 text

• Approximate • High error in pps • Works well in practice • Scales well • Fast and simple to implement 53 Summary - SHHH

Slide 54

Slide 54 text

54 Spoofed Source IP? ! Mpps Description! 23.833 --target=173.245.59.2 --agent=WAW --iface=659 Est= 57364! 23.067 --target=173.245.58.1 --agent=WAW --iface=659 Est= 56995! 7.139 --target=173.245.58.1 --agent=DUS --iface=893 Est= 11493! 6.366 --target=173.245.59.2 --agent=DUS --iface=893 Est= 11240! 2.590 --target=173.245.58.1 --agent=SIN --iface=657 Est=219987! 2.557 --target=173.245.59.2 --agent=SIN --iface=657 Est=220380! 1.045 --target=173.245.58.1 --agent=MAN --iface=756 Est= 207! 1.039 --target=173.245.59.2 --agent=MAN --iface=756 Est= 200!

Slide 55

Slide 55 text

55 Hyper log log "Alice" 22 unique items! HLL

Slide 56

Slide 56 text

56 Hyper log log OR 44 unique items ( ) = HLL#1 HLL#2

Slide 57

Slide 57 text

57

Slide 58

Slide 58 text

58 What about rates? HLL #1 HLL #2 HLL #3 HLL #4

Slide 59

Slide 59 text

59 Hard drives

Slide 60

Slide 60 text

• Attack detection is a streaming problem • Streaming algorithms are awesome • Applicable to many more problems 60 Summary Thanks! marek@cloudflare.com