Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Streaming Algorithms
Search
majek04
March 15, 2016
Technology
600
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Streaming Algorithms
majek04
March 15, 2016
More Decks by majek04
See All by majek04
Keeloq_AD_2025__1_.pdf
majek04
0
43
BPF programmable socket lookup
majek04
0
680
Linux at Cloudflare
majek04
3
8.8k
DDoS Landscape
majek04
0
440
Inside Cloudbleed
majek04
3
3k
Golang sucks
majek04
21
53k
Gatelogic - Somewhat functional reactive framework in Python
majek04
1
5.3k
How Cloudflare deals with largest DDoS attacks?
majek04
2
3.6k
Why we chose Service Worker API
majek04
0
2.9k
Other Decks in Technology
See All in Technology
個人最適 から 全体最適 へ AI情報共有会・AIギルド・AI-DLC で進める カンリーの組織展開
rfdnxbro
0
1.7k
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
1.9k
Agentic Defenseとともにセキュリティエンジニアが輝き続けるには / How Security Engineers Can Keep Excelling with Agentic Defense
yuj1osm
0
110
ルールやカスタム機能、どう使う?理想の出力を引き出すために今知りたいIBM Bob 5つの機能
muehara
1
340
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
トークン数だけでは測れない — Claude Code 組織展開の効果検証から学んだこと
makikub
0
130
「嘘をつくテスト」の失敗例から学ぶ 良いテストコード #frontend_phpcon_do
asumikam
0
510
探して_入れて_作って_使う_Agent_Skills___LT.pdf
peintangos
2
160
Rubyで音を視る
ydah
1
100
「気づいたら仕事が終わっている」バクラクAIエージェント本番運用の裏側 / layerx-bakuraku-aie2026
yuya4
19
11k
OCI Oracle AI Database Services新機能アップデート(2026/03-2026/05)
oracle4engineer
PRO
0
250
Platform engineering for developers, architects & the rest of us (AI agents)
danielbryantuk
0
190
Featured
See All Featured
Six Lessons from altMBA
skipperchong
29
4.3k
A designer walks into a library…
pauljervisheath
211
24k
Optimizing for Happiness
mojombo
378
71k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.4k
Skip the Path - Find Your Career Trail
mkilby
1
140
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
A Modern Web Designer's Workflow
chriscoyier
698
190k
A Soul's Torment
seathinner
6
2.9k
Designing for Timeless Needs
cassininazir
1
250
Testing 201, or: Great Expectations
jmmastey
46
8.2k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
330
Transcript
Automatic DoS mitigation (streaming algorithms) Marek Majkowski marek@cloudflare.com @majek04
2 Who we are
3 Large network
4 Content neutral
5 DoS is a problem DoS events per day
6 X example.com Defending from DoS is hard
• L3 - spoofed IP packets • source IP addresses
are fake • very large • this is what you hear in news • L7 - fully established TCP connections • IP reputation is effective 7 Two DoS types
8 L3 Volume per server Packets per second
9 Automatic attack handling Attack Detection Mitigation database Reactive Automation
sflow iptables
10 Automatic attack detection Attack Detection sflow
• infinite data stream on input • approximate 11 Streaming
algorithms Streaming algorithm Data stream Results
• sflow packets samples as input • detected attacks on
output 12 Attack detection is streaming! Streaming algorithms Packet samples Attacks
• EWMA - Exponentially weighted moving average • Counting rates
of packets • Space saving • Known as Top-N or Heavy Hitters • Simplified hierarchical heavy hitters • Hyper log log • Cardinality estimation - Counting unique things 13 Streaming algorithms
14 The problem: PPS ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878
--ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...!
15 Naive approach pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M
2.4.3.1 0.01M 192.168.1.1
16 There is no such thing as pps
17 Naive: Moving average 1.0s 1.1s 1.3s 1.8s 1.99s 2.1s
2.4s 2.41s t=2.50s Precisely 5 samples
18 Not-smoothed values 1.0s 1.1s 1.3s 1.8s 1.99s 2.1s 2.4s
2.41s 100 3 50 5 2 5 10 raw pps=
19 Not-smoothed values
20 Linux load average - charge
21 Linux load average - discharge
22 Better: EWMA old load difference dampening factor measurement frequency
half-life time
23
24
• Smoothed average • The same maths as Linux "load
average" • Charges slow (half-life) • Discharges quickly • Can be also used to count rates of packets 25 EWMA - summary
26 The problem: PPS ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878
--ip=141.245.59.192/32! 1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...!
27 The problem: Memory pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4
0.01M 2.4.3.1 0.01M 192.168.1.1 ...
• aka: heavy hitters • A fixed-memory data structure •
That can "count" top-N items • think: top url's, top customer IP's, etc • Count-Min sketch, Space Saving 28 Top-N problem
29 Space saving error count key
30 Space saving error count key 0 1 Alice Alice
31 Space saving error count key 0 2 Alice Alice
32 Space saving error count key 0 2 Alice 0
1 Ben Ben
33 Space saving error count key 0 2 Alice 0
1 Ben 0 1 Charlie Charlie
34 Space saving error count key 0 2 Alice 0
1 Ben 0 1 Charlie Eric?
35 Space saving error count key 0 2 Alice 0
1 Ben 0 1 Charlie Eric?
36 Space saving error count key 0 2 Alice 1
0 Eric 0 1 Charlie + Eric
37 Space saving error count key 0 2 Alice 1
1 Eric 0 1 Charlie Eric
38 Space saving error count key 0 2 Alice 1
1 Eric 0 1 Charlie 2 Counter? 1 .. 2 1
39
What about rates? 40 • It's hard • was: GetAll()
• now: GetAll(time.Time) • No longer O(1) • Instead O(log n)
41
• Top-N / Heavy-hitter algorithm • Fixed memory size •
Strong error guarantees 42 Summary - Space saving
43 Aggregating attacks ! Mpps Descr! 3.878 --ip=141.245.59.191/32! 2.878 --ip=141.245.59.192/32!
1.878 --ip=141.245.59.193/32! 1.878 --ip=141.245.59.194/32! 1.878 --ip=141.245.59.195/32! 1.878 --ip=141.245.59.196/32! 1.878 --ip=141.245.59.197/32! 1.878 --ip=141.245.59.198/32! 1.878 --ip=141.245.59.199/32! ...! ! Mpps Descr! 35.878 --ip=141.245.59.0/24! vs
44 Hierarchical Heavy Hitters
45 Simplified HHH
46 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24
47 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 incoming sample: 42.1.2.4:80
48 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 12.2M 1.2.3.4 2.4M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 12.2M 1.2.3.0/24 2.4M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 reporting threshold: 1M
49 Attack report ! Mpps Descr! 12.2 --ip=1.2.3.4 --port=53! 2.4
--ip=42.1.2.4 --port=80! 12.2 --ip=1.2.3.4! 2.4 --ip=42.1.2.4! 12.2 --ip=1.2.3.0/24! 2.4 --ip=42.1.2.0/24!
50 Multiple dimensions pps IP:port 12.2M 1.2.3.4:53 2.4M 42.1.2.4:80 0.01M
2.4.3.1:80 0.01M 192.168.1.1:443 pps IP 0.1M 1.2.3.4 0M 42.1.2.4 0.01M 2.4.3.1 0.01M 192.168.1.1 pps subnet 0.1M 1.2.3.0/24 0M 42.1.2.0/24 0.01M 2.4.3.0/24 0.01M 192.168.1.0/24 incoming sample: 42.1.2.4:80
51 Attack report ! Mpps Descr! 12.2 --ip=1.2.3.4 --port=53! 2.4
--ip=42.1.2.4 --port=80!
52 Scales well
• Approximate • High error in pps • Works well
in practice • Scales well • Fast and simple to implement 53 Summary - SHHH
54 Spoofed Source IP? ! Mpps Description! 23.833 --target=173.245.59.2 --agent=WAW
--iface=659 Est= 57364! 23.067 --target=173.245.58.1 --agent=WAW --iface=659 Est= 56995! 7.139 --target=173.245.58.1 --agent=DUS --iface=893 Est= 11493! 6.366 --target=173.245.59.2 --agent=DUS --iface=893 Est= 11240! 2.590 --target=173.245.58.1 --agent=SIN --iface=657 Est=219987! 2.557 --target=173.245.59.2 --agent=SIN --iface=657 Est=220380! 1.045 --target=173.245.58.1 --agent=MAN --iface=756 Est= 207! 1.039 --target=173.245.59.2 --agent=MAN --iface=756 Est= 200!
55 Hyper log log "Alice" 22 unique items! HLL
56 Hyper log log OR 44 unique items ( )
= HLL#1 HLL#2
57
58 What about rates? HLL #1 HLL #2 HLL #3
HLL #4
59 Hard drives
• Attack detection is a streaming problem • Streaming algorithms
are awesome • Applicable to many more problems 60 Summary Thanks! marek@cloudflare.com