Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#10 “Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency”
Search
cafenero_777
June 14, 2023
Technology
0
42
#10 “Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency”
SOCC ’14
ACM Symposium on Cloud Computing
https://sites.google.com/site/2014socc/home/program
cafenero_777
June 14, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
270
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
56
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
51
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
63
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
26
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
89
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
15
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
180
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
130
Other Decks in Technology
See All in Technology
2024春 注目のWeb系 OSS & SaaS 3選
makies
0
210
AWS アーキテクチャ作図入門/aws-architecture-diagram-101
ma2shita
16
6.6k
.NET Profiler in 2024.
kkamegawa
2
2.9k
TypescriptでのContextualな構造化ロギングと社内全体への導入
leveragestech
3
300
NewSQL Landscape
oracle4engineer
PRO
5
2.9k
From here to resilience - a travel guide
ufried
1
120
中年男性がメインフレームから クラウドへキャリアシフトしてみた
uechishingo
1
440
RailsConf 2024 Keynote "Startups on Rails in 2024"
irinanazarova
0
120
Babylon.jsと色々なものを組み合わせる:ブラウザのAPIやガジェットや2D描画ライブラリなど / Babylon.js 勉強会 vol.3
you
PRO
0
200
cgroup v2 で何が変わったのか / TechFeed Experts Night #28
tenforward
2
120
パスワードを保存しますか?
hanacchi
0
230
認知症フレンドリーテックとスタックチャン
naokiuc
0
380
Featured
See All Featured
Web Components: a chance to create the future
zenorocha
306
41k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.9k
A Tale of Four Properties
chriscoyier
153
22k
Large-scale JavaScript Application Architecture
addyosmani
504
110k
Creatively Recalculating Your Daily Design Routine
revolveconf
211
11k
Making Projects Easy
brettharned
109
5.5k
It's Worth the Effort
3n
180
27k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
14
8.4k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
221
21k
We Have a Design System, Now What?
morganepeng
43
6.8k
A Philosophy of Restraint
colly
197
16k
How to Ace a Technical Interview
jacobian
273
22k
Transcript
Research Paper Introduction #10 “Tales of the Tail: Hardware, OS,
and Application-level Sources of Tail Latency” @cafenero_777 2020/05/12
• ॕʂ10ճʂʢࢲͷͰࢉʣ
$ which • Tales of the Tail: Hardware, OS, and
Application-level Sources of Tail Latency • Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble • University of Washington • SOCC ’14 • ACM Symposium on Cloud Computing • https://sites.google.com/site/2014socc/home/program
Agenda • ֓ཁͱಡ͏ͱͨ͠ཧ༝ • Introduction • Queuing Models and Predicted
Latency • Measurement Method • Sources of Tail Latency • Related Work • Discussion • Conclusion
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • ϚϧνίΞ্ͷHW/OS/AppϨΠϠʔ͔ΒlatencyΛௐࠪ • ϞσϧԽͯ͠RPC/Memcached/NginxͰଌఆ͠ɺݪҼͱτϨʔυΦϑΛௐࠪ • ಡ͏ͱͨ͠ཧ༝ •
Tail latencyͷݟํΛΓ͔͔ͨͬͨΒɻ • େنࢄγεςϜTail latency͕ͨΓલͷੈքʢΒ͍͠ʣͷͰɻ • Podcastܦ༝ • https://misreading.chat/2019/03/27/episode-54-tales-of-the-tail/
Introduction • ωοτϫʔΫӽ͠ʹΓऔΓ͢ΔࢄγεςϜ • େنڥͩͱதԝ͕ܻҧ͍ʹେ͖͘ͳΔʢ=99%ileͰ֬తʹेେ͖͍ʣ • ઍͷmemcached@facebook, ̍ສͷindexαʔό@MS Bing •
null-RPC, Memcached, Nginx (web-server)Ͱݕূ • ཧϞσϧΑΓѱ͍݁Ռʹͳͬͨ • ݪҼΛௐͯtail-latencyΛվળ • ྫɿMemcached 99.9%ile latency: 14ms -> 32us • ྫɿthroughputͱlatencyͷτϨʔυΦϑ
Queuing Models and Predicted Latency (1/3) • ϕʔεϥΠϯʢཧͷԆʣԿ͔ʁ • ϞσϧԽ
• γϯάϧΩϡʔ͕cݸͷworker (core, thread, process, etc)ͰFIFOͤ͞Δ • A/S/c queue (Kendallදه) • Arrival distributionʢ౸ணʣ, Service time distributionʢαʔϏε࣌ؒ ʣ, ಠཱͨ͠cݸͷϫʔΧʔ • ॲཧʹݻఆ͔͔࣌ؒΔωοτϫʔΫӽ͠ͷαʔϏεΛఆ
Queuing Models and Predicted Latency (2/3) • Arrival distributions •
ྫɿϦΫΤετॲཧ͕50usͰྃ͢ΔFIFOαʔϏε • 50usҎʹϥϯμϜϦΫΤετ͕དྷΔͱ”ͪ”͕ൃੜ • -> tail-latencyൃੜ • Utilization • ϫʔΧʔ1ݻఆͷ··ɺฏۉϦΫΤετΛ૿͢=utilΛ্͛Δ • ಉ࣌ʹϦΫΤετ͕དྷΔ͕֬૿͑Δ • -> tail-latencyൃੜ • ͔͠util 50%->95%Ͱ99%ile latency͕10ഒ૿͑Δʂ • ϚΠΫϩόʔετ • ҰճͰϦΫΤετ͕”͔ͿΔ”ͱɺԆΛҾ͖ى͜͢ΩϡʔΛ࡞ͬͯ͠·͏ ܭࢉ ܭࢉ
Queuing Models and Predicted Latency (3/3) • Parallel servers feeding
from one queue • ϫʔΧʔc૿ͤ1/cͰlatencyݮΔ • ڞ༗ΩϡʔΛ͍ͬͯΔ߹ (ϑΥʔΫฒͼ) • ݸผΩϡʔͩͱlatencyมΘΒͣɺthroughput͕૿͑Δ • Queuing discipline • Random worker: ݸผFIFOΩϡʔΛ࣋ͬͯΔ֤ϫʔΧʔׂΓͯ • Random request: ڞ༗ΩϡʔͷϦΫΤετΛϥϯμϜʹબׂ͠Γͯʢ౸ண࣌ؒؔͳ͠ʣ • Ωϡʔ͔ΒҾ͖ग़͢ํࣜʹΑͬͯlatency͕มΘΔ • medianͱ99%ileͰlatencyٯస͢Δ߹͋Γ • FIFO V.S. LIFO (stack) • FIFO V.S. Random request ܭࢉ ܭࢉ
Measurement Method • Null RPC server • TCPͰϦΫΤετ128byteΛड͚ͯ128byteϨεϙϯεฦ͢ • ΞΫηϓτεϨου->ϫʔΧʔੜ->
read/write system call • OSґଘɿTCP, εϨουεέδϡʔϥ • Memcached • O(1)ͳhash-tableΛ࣋ͭin-memory KVSΞϓϦέʔγϣϯɻϫʔΧʔίΞʹൺྫ • UDPϞʔυɿ֤ϫʔΧʔεϨου͕FIFO • TCPϞʔυɿTCPίωΫγϣϯຖʹϫʔΧʔ͕ܾ·͍ͬͯΔʢׂॲཧ1-2usఔʣ • Nginx • ඇಉظI/O system callΛଟ༻ • ϫʔΧʔຖʢίΞຖʣʹΫϥΠΞϯτΛׂΓͯ • 85byte http request -> 849byte http response, ੩తϑΝΠϧΛฦͨ͢ΊɺόοϑΝΩϟογϡʹͨΔʢετϨʔδӨڹແࢹͰ͖Δʣ • epoll systemcallΛ͍ͬͯΔ=४උ͕Ͱ͖ͨॱʹϑΝΠϧσΟεΫϦϓλΛฦ͢ -> FIFO • ֤ΞϓϦͰCPU100%༻ͳঢ়ଶʹͯ͠ɺεϧʔϓοτΛଌఆ͠ɺϦΫΤετॲཧ࣌ؒΛݟੵΔ
Sources of Tail Latency (Background Processes) • 1CPU, 1core, HT
disabled • εέδϡʔϥ͕linuxσʔϞϯʹׂ࣌ؒ->ϦΫΤετ͕ͨ·Δ->tail-latency૿Ճʂ • niceͰεέδϡʔϥͷ༏ઌʢׂ࣌ؒʣΛௐɻׂΓͯΒΕͳ͍ͱͪɻ • ϦΞϧλΠϜεέδϡʔϥɿϦΞϧλΠϜϓϩηεͱͯ͠ࢦఆ͢Δͱ”ׂ࣌ؒΓࠐΈ”͕Ͱ͖Δ • ઐ༻ίΞɿεέδϡʔϥ͕ͪͳ͍ͷͰવ͍ɻίϯςΩετεΠονແ͠
Sources of Tail Latency (Non-FIFO Scheduling) • CFS (Completely Fair
Scheduler) -> ॱংΑΓެฏੑॏࢹɺඇFIFO • ϚϧνεϨουΞϓϦ: ͲͷεϨουʹ࣌ؒΛׂΓͯΔ͔OS࣍ୈ • ૣ͘ऴΘ͔ͬͨɺͰͳ͍ • ϦΞϧλΠϜεέδϡʔϥʹ͢ΔͱɺFIFO͔ͭόοΫάϥϯυׯবݮ ଌఆ ܭࢉ
Sources of Tail Latency (Multicore) • ಉҰNUMA্Ͱ1~4core͏ • Null RPC
serverվળ • γϯάϧΩϡʔ • ଞ2ͭ1coreͱมΘΒͣ • ϦΫΤετ͕TCPίωΫγϣϯ͍·Θ͠ • TCP͕ಛఆϫʔΧʔʹׂΓͯͷͨΊɺϫʔΧʔ͕ภΔ • Memcached • UDPͰγϯάϧΩϡʔʹͳΔ->վળ • Nginx • TCP (http)Λ్தͰcloseͯ͠ɺ࠶ͭͳ͗͠ɺͰվળ • workload࣍ୈɻɻ
Sources of Tail Latency (Interrupt Processing) • packetड৴ͰΧʔωϧׂࠐൃੜ -> irqbalance͕શcoreʹ͜ΕΛࢄ
• ׂࠐൃੜ༧ظͰ͖ͳ͍ʢ=ॲཧ͕࣌ؒҰఆͰͳ͘ͳΔʣ • ڞ༗ΩϡʔͷFIFOͰͳ͍ • ઐ༻ίΞͳΒ͜ΕΛճආ • load͕͍ͱແବʢεϧʔϓοτ͕͍ʣ • େنϚϧνίΞCPUͩͱઐ༻ίΞར༻ʁ
Sources of Tail Latency (NUMA Effects) • 8coreΛ2CPUʹࢄ • σϑΥϧτͰϝϞϦׂΓ͕ͯnode0͔Βɻ
• memcachedεϨουͷϝϞϦΞΫηε͕NUMAΛ·͙ͨ • -> latency૿Ճ • null RPC/NginxϝϞϦ༻ྔ͕গͳ͔ͬͨͷͰӨڹͳ͔ͬͨ • numactlͰcore/memory nodeΛࢦఆ • վળʂ
Sources of Tail Latency (Power Saving Optimizations) • CPU༻10%Ͱଌఆ •
CPU stateɿ C-state͔ΒcoreΛ”ى͜͢”͕͔͔࣌ؒΔ -> tail-latencyʹͳΔ • C3-state͔Βͷwakeup200usɺ͜ΕΛଌఆ • पͷԼɿͬͯͳ͍ͱCPUΫϩοΫपΛݮΒ͢ • NginxCPUෛՙ͕ߴ͍ͨΊɺएׯվળ
Sources of Tail Latency: Summary • nice͚ͩͰෆेɻϦΞϧλΠϜεέδϡʔϥ༗ޮ • ϚϧνεϨουΞϓϦέʔγϣϯFIFOεέδϡʔϥͳΒ༗ޮ •
ϚϧνίΞ༗ޮ͕ͩɺҰൠతʹʢTCPͳͲಛఆίωΫγϣϯΛಛఆ ϫʔΧʹׂΓͯΔΞʔΩςΫνϟͩͱʣޮՌ͕ऑ͍ • NUMAεϨουͱϝϞϦׂΓͯnodeΛ߹ΘͤΔ • ిྗͱtail latencyτϨʔυΦϑ
Related Work • MapReduce/Spark • Ϩεϙϯε͕͍ͱผϗετʹ࠶ϦΫΤετൃߦ • શϨϓϦΧʹಉ࣌ʹ͖͛ͬͯͨͷΛ࠾༻ʢεϧʔϓοτͷແବݣ͍ʣ • ෆશͳ݁ՌΛڐ༰͢Δ
• Ϛϧνςφϯτڥ • latency sensitive VMͱCPU sensitive VMͰϗετΛ͚Δ • DCNWͷεΠονͷΩϡʔᷓΕ • DCTCPతͳΞϓϩʔν • ిྗ • LBͰ௨৴دͤΔɻͬͯͳ͍αʔόফඅిྗঢ়ଶ
Discussion • Ϧιʔε֬อͷํ • ࣌ؒతʢCFSʣ V.S. ۭؒతʢCPUίΞઐ༗ʣ • εϨου V.S.
Πϕϯτ • εϨου+FIFO • Πϕϯτ+ϦΫΤετͷϫʔΧʔׂΓͯͷ࠷దԽ
Conclusion • Tail latencyͷݪҼΛϚϧνίΞHW, OS, ΞϓϦέʔγϣϯϨϕϧͰௐࠪ • ཧͱൺֱ • όοΫάϥϯυϓϩηεͷׯব
• ΩϡʔΠϯάͷํ๏ͱεέδϡʔϥ • ΧʔωϧׂࠐNUMAɺCPUলిྗػೳ • ࠷దԽ͢Δͱ99.9%ileΛେ෯ʹݮՄೳ
EoP
༧උεϥΠυ
Queuing Models and Predicted Latency (1/3) • ϕʔεϥΠϯʢཧͷԆʣԿ͔ʁ • શͯͷϦΫΤετʹಉ͡Ԡ࣌ؒͰॲཧ͢Δ
-> ࣮ࡍ͋Γ͑ͳ͍ • ϦΫΤετ͕དྷΔλΠϛϯά͕όϥόϥʢ=ಉ࣌ʹདྷΔͱ͖͋Γʣ-> ϚΠΫϩόʔετԆൃੜ • ϦΫΤετॲཧ͕࣌ؒಉ͡Ͱશମͷlatency͕ҧ͏͜ͱ͕͋Δ • ߴ͍loadͷͱ͖latencyߴ͍ʁ • ϚϧνίΞԽͰlatencyվળʁ • ΩϡʔΠϯάFIFO͕࠷ʁ • ϞσϧԽ • γϯάϧΩϡʔ͕cݸͷworker (core, thread, process, etc)ͰFIFOͤ͞Δ • A/S/c queue (Kendallදه) • Arrival distributionʢ౸ணʣ, Service time distributionʢαʔϏε࣌ؒʣ, ಠཱͨ͠cݸͷϫʔΧʔ • ॲཧʹݻఆ͔͔࣌ؒΔωοτϫʔΫӽ͠ͷαʔϏεΛఆ
Measurement Method (Timestamping) • NICͰड͚ͯɺNIC͔Βग़͍ͯ͘·Ͱͷ࣌ࠁT1 ~ T6 • ΧʔωϧɺNWυϥΠόɺL7ϓϩτίϧΛमਖ਼ͯ͠ϦΫΤετύέοτʹ30byteՃ •
NTP disabled • T1: NWυϥΠό͕packetॲཧՄೳͱ௨ͨ࣌͠ • T2: TCP/UDPॲཧޙɺΞϓϦॲཧલ • T3: ΞϓϦ͕ίΞʹεέδϡʔϦϯά͞Εͨޙ • T4: ΞϓϦ͕read system callൃߦޙ=Ϣʔβϥϯυʹσʔλ͕ίϐʔ͞Εͨޙ • T5: ΞϓϦ͕write system callൃߦޙ • T6: packetΛૹ৴͢Δ࣌