Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#10 “Tales of the Tail: Hardware, OS, and Appli...
Search
cafenero_777
June 14, 2023
Technology
0
55
#10 “Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency”
SOCC ’14
ACM Symposium on Cloud Computing
https://sites.google.com/site/2014socc/home/program
cafenero_777
June 14, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
440
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
110
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
110
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
83
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
49
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
120
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
34
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
210
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
230
Other Decks in Technology
See All in Technology
Aurora PostgreSQLがCloudWatch Logsに 出力するログの課金を削減してみる #jawsdays2025
non97
1
190
コンピュータビジョンの社会実装について考えていたらゲームを作っていた話
takmin
1
590
RemoveだらけのPHPUnit 12に備えよう
cocoeyes02
0
270
設計を積み重ねてシステムを刷新する
sansantech
PRO
0
160
分解して理解する Aspire
nenonaninu
2
1k
入門 PEAK Threat Hunting @SECCON
odorusatoshi
0
150
実は強い 非ViTな画像認識モデル
tattaka
2
1.2k
遷移の高速化 ヤフートップの試行錯誤
narirou
6
1.1k
組織におけるCCoEの役割とAWS活用事例
nrinetcom
PRO
4
120
IAMのマニアックな話2025
nrinetcom
PRO
1
230
Potential EM 制度を始めた理由、そして2年後にやめた理由 - EMConf JP 2025
hoyo
2
2.6k
OSS構成管理ツールCMDBuildを使ったAWSリソース管理の自動化
satorufunai
0
630
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
461
33k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
10
510
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
We Have a Design System, Now What?
morganepeng
51
7.4k
Optimising Largest Contentful Paint
csswizardry
34
3.1k
Build The Right Thing And Hit Your Dates
maggiecrowley
34
2.5k
GitHub's CSS Performance
jonrohan
1030
460k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
33
2.8k
RailsConf 2023
tenderlove
29
1k
Adopting Sorbet at Scale
ufuk
74
9.2k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Writing Fast Ruby
sferik
628
61k
Transcript
Research Paper Introduction #10 “Tales of the Tail: Hardware, OS,
and Application-level Sources of Tail Latency” @cafenero_777 2020/05/12
• ॕʂ10ճʂʢࢲͷͰࢉʣ
$ which • Tales of the Tail: Hardware, OS, and
Application-level Sources of Tail Latency • Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble • University of Washington • SOCC ’14 • ACM Symposium on Cloud Computing • https://sites.google.com/site/2014socc/home/program
Agenda • ֓ཁͱಡ͏ͱͨ͠ཧ༝ • Introduction • Queuing Models and Predicted
Latency • Measurement Method • Sources of Tail Latency • Related Work • Discussion • Conclusion
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • ϚϧνίΞ্ͷHW/OS/AppϨΠϠʔ͔ΒlatencyΛௐࠪ • ϞσϧԽͯ͠RPC/Memcached/NginxͰଌఆ͠ɺݪҼͱτϨʔυΦϑΛௐࠪ • ಡ͏ͱͨ͠ཧ༝ •
Tail latencyͷݟํΛΓ͔͔ͨͬͨΒɻ • େنࢄγεςϜTail latency͕ͨΓલͷੈքʢΒ͍͠ʣͷͰɻ • Podcastܦ༝ • https://misreading.chat/2019/03/27/episode-54-tales-of-the-tail/
Introduction • ωοτϫʔΫӽ͠ʹΓऔΓ͢ΔࢄγεςϜ • େنڥͩͱதԝ͕ܻҧ͍ʹେ͖͘ͳΔʢ=99%ileͰ֬తʹेେ͖͍ʣ • ઍͷmemcached@facebook, ̍ສͷindexαʔό@MS Bing •
null-RPC, Memcached, Nginx (web-server)Ͱݕূ • ཧϞσϧΑΓѱ͍݁Ռʹͳͬͨ • ݪҼΛௐͯtail-latencyΛվળ • ྫɿMemcached 99.9%ile latency: 14ms -> 32us • ྫɿthroughputͱlatencyͷτϨʔυΦϑ
Queuing Models and Predicted Latency (1/3) • ϕʔεϥΠϯʢཧͷԆʣԿ͔ʁ • ϞσϧԽ
• γϯάϧΩϡʔ͕cݸͷworker (core, thread, process, etc)ͰFIFOͤ͞Δ • A/S/c queue (Kendallදه) • Arrival distributionʢ౸ணʣ, Service time distributionʢαʔϏε࣌ؒ ʣ, ಠཱͨ͠cݸͷϫʔΧʔ • ॲཧʹݻఆ͔͔࣌ؒΔωοτϫʔΫӽ͠ͷαʔϏεΛఆ
Queuing Models and Predicted Latency (2/3) • Arrival distributions •
ྫɿϦΫΤετॲཧ͕50usͰྃ͢ΔFIFOαʔϏε • 50usҎʹϥϯμϜϦΫΤετ͕དྷΔͱ”ͪ”͕ൃੜ • -> tail-latencyൃੜ • Utilization • ϫʔΧʔ1ݻఆͷ··ɺฏۉϦΫΤετΛ૿͢=utilΛ্͛Δ • ಉ࣌ʹϦΫΤετ͕དྷΔ͕֬૿͑Δ • -> tail-latencyൃੜ • ͔͠util 50%->95%Ͱ99%ile latency͕10ഒ૿͑Δʂ • ϚΠΫϩόʔετ • ҰճͰϦΫΤετ͕”͔ͿΔ”ͱɺԆΛҾ͖ى͜͢ΩϡʔΛ࡞ͬͯ͠·͏ ܭࢉ ܭࢉ
Queuing Models and Predicted Latency (3/3) • Parallel servers feeding
from one queue • ϫʔΧʔc૿ͤ1/cͰlatencyݮΔ • ڞ༗ΩϡʔΛ͍ͬͯΔ߹ (ϑΥʔΫฒͼ) • ݸผΩϡʔͩͱlatencyมΘΒͣɺthroughput͕૿͑Δ • Queuing discipline • Random worker: ݸผFIFOΩϡʔΛ࣋ͬͯΔ֤ϫʔΧʔׂΓͯ • Random request: ڞ༗ΩϡʔͷϦΫΤετΛϥϯμϜʹબׂ͠Γͯʢ౸ண࣌ؒؔͳ͠ʣ • Ωϡʔ͔ΒҾ͖ग़͢ํࣜʹΑͬͯlatency͕มΘΔ • medianͱ99%ileͰlatencyٯస͢Δ߹͋Γ • FIFO V.S. LIFO (stack) • FIFO V.S. Random request ܭࢉ ܭࢉ
Measurement Method • Null RPC server • TCPͰϦΫΤετ128byteΛड͚ͯ128byteϨεϙϯεฦ͢ • ΞΫηϓτεϨου->ϫʔΧʔੜ->
read/write system call • OSґଘɿTCP, εϨουεέδϡʔϥ • Memcached • O(1)ͳhash-tableΛ࣋ͭin-memory KVSΞϓϦέʔγϣϯɻϫʔΧʔίΞʹൺྫ • UDPϞʔυɿ֤ϫʔΧʔεϨου͕FIFO • TCPϞʔυɿTCPίωΫγϣϯຖʹϫʔΧʔ͕ܾ·͍ͬͯΔʢׂॲཧ1-2usఔʣ • Nginx • ඇಉظI/O system callΛଟ༻ • ϫʔΧʔຖʢίΞຖʣʹΫϥΠΞϯτΛׂΓͯ • 85byte http request -> 849byte http response, ੩తϑΝΠϧΛฦͨ͢ΊɺόοϑΝΩϟογϡʹͨΔʢετϨʔδӨڹແࢹͰ͖Δʣ • epoll systemcallΛ͍ͬͯΔ=४උ͕Ͱ͖ͨॱʹϑΝΠϧσΟεΫϦϓλΛฦ͢ -> FIFO • ֤ΞϓϦͰCPU100%༻ͳঢ়ଶʹͯ͠ɺεϧʔϓοτΛଌఆ͠ɺϦΫΤετॲཧ࣌ؒΛݟੵΔ
Sources of Tail Latency (Background Processes) • 1CPU, 1core, HT
disabled • εέδϡʔϥ͕linuxσʔϞϯʹׂ࣌ؒ->ϦΫΤετ͕ͨ·Δ->tail-latency૿Ճʂ • niceͰεέδϡʔϥͷ༏ઌʢׂ࣌ؒʣΛௐɻׂΓͯΒΕͳ͍ͱͪɻ • ϦΞϧλΠϜεέδϡʔϥɿϦΞϧλΠϜϓϩηεͱͯ͠ࢦఆ͢Δͱ”ׂ࣌ؒΓࠐΈ”͕Ͱ͖Δ • ઐ༻ίΞɿεέδϡʔϥ͕ͪͳ͍ͷͰવ͍ɻίϯςΩετεΠονແ͠
Sources of Tail Latency (Non-FIFO Scheduling) • CFS (Completely Fair
Scheduler) -> ॱংΑΓެฏੑॏࢹɺඇFIFO • ϚϧνεϨουΞϓϦ: ͲͷεϨουʹ࣌ؒΛׂΓͯΔ͔OS࣍ୈ • ૣ͘ऴΘ͔ͬͨɺͰͳ͍ • ϦΞϧλΠϜεέδϡʔϥʹ͢ΔͱɺFIFO͔ͭόοΫάϥϯυׯবݮ ଌఆ ܭࢉ
Sources of Tail Latency (Multicore) • ಉҰNUMA্Ͱ1~4core͏ • Null RPC
serverվળ • γϯάϧΩϡʔ • ଞ2ͭ1coreͱมΘΒͣ • ϦΫΤετ͕TCPίωΫγϣϯ͍·Θ͠ • TCP͕ಛఆϫʔΧʔʹׂΓͯͷͨΊɺϫʔΧʔ͕ภΔ • Memcached • UDPͰγϯάϧΩϡʔʹͳΔ->վળ • Nginx • TCP (http)Λ్தͰcloseͯ͠ɺ࠶ͭͳ͗͠ɺͰվળ • workload࣍ୈɻɻ
Sources of Tail Latency (Interrupt Processing) • packetड৴ͰΧʔωϧׂࠐൃੜ -> irqbalance͕શcoreʹ͜ΕΛࢄ
• ׂࠐൃੜ༧ظͰ͖ͳ͍ʢ=ॲཧ͕࣌ؒҰఆͰͳ͘ͳΔʣ • ڞ༗ΩϡʔͷFIFOͰͳ͍ • ઐ༻ίΞͳΒ͜ΕΛճආ • load͕͍ͱແବʢεϧʔϓοτ͕͍ʣ • େنϚϧνίΞCPUͩͱઐ༻ίΞར༻ʁ
Sources of Tail Latency (NUMA Effects) • 8coreΛ2CPUʹࢄ • σϑΥϧτͰϝϞϦׂΓ͕ͯnode0͔Βɻ
• memcachedεϨουͷϝϞϦΞΫηε͕NUMAΛ·͙ͨ • -> latency૿Ճ • null RPC/NginxϝϞϦ༻ྔ͕গͳ͔ͬͨͷͰӨڹͳ͔ͬͨ • numactlͰcore/memory nodeΛࢦఆ • վળʂ
Sources of Tail Latency (Power Saving Optimizations) • CPU༻10%Ͱଌఆ •
CPU stateɿ C-state͔ΒcoreΛ”ى͜͢”͕͔͔࣌ؒΔ -> tail-latencyʹͳΔ • C3-state͔Βͷwakeup200usɺ͜ΕΛଌఆ • पͷԼɿͬͯͳ͍ͱCPUΫϩοΫपΛݮΒ͢ • NginxCPUෛՙ͕ߴ͍ͨΊɺएׯվળ
Sources of Tail Latency: Summary • nice͚ͩͰෆेɻϦΞϧλΠϜεέδϡʔϥ༗ޮ • ϚϧνεϨουΞϓϦέʔγϣϯFIFOεέδϡʔϥͳΒ༗ޮ •
ϚϧνίΞ༗ޮ͕ͩɺҰൠతʹʢTCPͳͲಛఆίωΫγϣϯΛಛఆ ϫʔΧʹׂΓͯΔΞʔΩςΫνϟͩͱʣޮՌ͕ऑ͍ • NUMAεϨουͱϝϞϦׂΓͯnodeΛ߹ΘͤΔ • ిྗͱtail latencyτϨʔυΦϑ
Related Work • MapReduce/Spark • Ϩεϙϯε͕͍ͱผϗετʹ࠶ϦΫΤετൃߦ • શϨϓϦΧʹಉ࣌ʹ͖͛ͬͯͨͷΛ࠾༻ʢεϧʔϓοτͷແବݣ͍ʣ • ෆશͳ݁ՌΛڐ༰͢Δ
• Ϛϧνςφϯτڥ • latency sensitive VMͱCPU sensitive VMͰϗετΛ͚Δ • DCNWͷεΠονͷΩϡʔᷓΕ • DCTCPతͳΞϓϩʔν • ిྗ • LBͰ௨৴دͤΔɻͬͯͳ͍αʔόফඅిྗঢ়ଶ
Discussion • Ϧιʔε֬อͷํ • ࣌ؒతʢCFSʣ V.S. ۭؒతʢCPUίΞઐ༗ʣ • εϨου V.S.
Πϕϯτ • εϨου+FIFO • Πϕϯτ+ϦΫΤετͷϫʔΧʔׂΓͯͷ࠷దԽ
Conclusion • Tail latencyͷݪҼΛϚϧνίΞHW, OS, ΞϓϦέʔγϣϯϨϕϧͰௐࠪ • ཧͱൺֱ • όοΫάϥϯυϓϩηεͷׯব
• ΩϡʔΠϯάͷํ๏ͱεέδϡʔϥ • ΧʔωϧׂࠐNUMAɺCPUলిྗػೳ • ࠷దԽ͢Δͱ99.9%ileΛେ෯ʹݮՄೳ
EoP
༧උεϥΠυ
Queuing Models and Predicted Latency (1/3) • ϕʔεϥΠϯʢཧͷԆʣԿ͔ʁ • શͯͷϦΫΤετʹಉ͡Ԡ࣌ؒͰॲཧ͢Δ
-> ࣮ࡍ͋Γ͑ͳ͍ • ϦΫΤετ͕དྷΔλΠϛϯά͕όϥόϥʢ=ಉ࣌ʹདྷΔͱ͖͋Γʣ-> ϚΠΫϩόʔετԆൃੜ • ϦΫΤετॲཧ͕࣌ؒಉ͡Ͱશମͷlatency͕ҧ͏͜ͱ͕͋Δ • ߴ͍loadͷͱ͖latencyߴ͍ʁ • ϚϧνίΞԽͰlatencyվળʁ • ΩϡʔΠϯάFIFO͕࠷ʁ • ϞσϧԽ • γϯάϧΩϡʔ͕cݸͷworker (core, thread, process, etc)ͰFIFOͤ͞Δ • A/S/c queue (Kendallදه) • Arrival distributionʢ౸ணʣ, Service time distributionʢαʔϏε࣌ؒʣ, ಠཱͨ͠cݸͷϫʔΧʔ • ॲཧʹݻఆ͔͔࣌ؒΔωοτϫʔΫӽ͠ͷαʔϏεΛఆ
Measurement Method (Timestamping) • NICͰड͚ͯɺNIC͔Βग़͍ͯ͘·Ͱͷ࣌ࠁT1 ~ T6 • ΧʔωϧɺNWυϥΠόɺL7ϓϩτίϧΛमਖ਼ͯ͠ϦΫΤετύέοτʹ30byteՃ •
NTP disabled • T1: NWυϥΠό͕packetॲཧՄೳͱ௨ͨ࣌͠ • T2: TCP/UDPॲཧޙɺΞϓϦॲཧલ • T3: ΞϓϦ͕ίΞʹεέδϡʔϦϯά͞Εͨޙ • T4: ΞϓϦ͕read system callൃߦޙ=Ϣʔβϥϯυʹσʔλ͕ίϐʔ͞Εͨޙ • T5: ΞϓϦ͕write system callൃߦޙ • T6: packetΛૹ৴͢Δ࣌