Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#10 “Tales of the Tail: Hardware, OS, and Appli...
Search
cafenero_777
June 14, 2023
Technology
0
55
#10 “Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency”
SOCC ’14
ACM Symposium on Cloud Computing
https://sites.google.com/site/2014socc/home/program
cafenero_777
June 14, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
440
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
110
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
110
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
81
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
46
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
110
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
31
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
210
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
230
Other Decks in Technology
See All in Technology
クラウドサービス事業者におけるOSS
tagomoris
3
960
Raycast Favorites × Script Command で実現するお手軽情報チェック
smasato
1
100
【内製開発Summit 2025】イオンスマートテクノロジーの内製化組織の作り方/In-house-development-summit-AST
aeonpeople
1
220
ESXi で仮想化した ARM 環境で LLM を動作させてみるぞ
unnowataru
0
140
わたしがEMとして入社した「最初の100日」の過ごし方 / EMConfJp2025
daiksy
4
1.2k
Potential EM 制度を始めた理由、そして2年後にやめた理由 - EMConf JP 2025
hoyo
2
1.1k
EDRの検知の仕組みと検知回避について
chayakonanaika
8
3.4k
分解して理解する Aspire
nenonaninu
2
510
システム・ML活用を広げるdbtのデータモデリング / Expanding System & ML Use with dbt Modeling
i125
1
290
Helm , Kustomize に代わる !? 次世代 k8s パッケージマネージャー Glasskube 入門 / glasskube-entry
parupappa2929
0
280
生成 AI プロダクトを育てる技術 〜データ品質向上による継続的な価値創出の実践〜
icoxfog417
PRO
5
1.8k
脳波を用いた嗜好マッチングシステム
hokkey621
0
230
Featured
See All Featured
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
40
2k
Optimising Largest Contentful Paint
csswizardry
34
3.1k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.3k
For a Future-Friendly Web
brad_frost
176
9.5k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
114
50k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
Building Adaptive Systems
keathley
40
2.4k
Making Projects Easy
brettharned
116
6k
Thoughts on Productivity
jonyablonski
69
4.5k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
Site-Speed That Sticks
csswizardry
4
390
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Transcript
Research Paper Introduction #10 “Tales of the Tail: Hardware, OS,
and Application-level Sources of Tail Latency” @cafenero_777 2020/05/12
• ॕʂ10ճʂʢࢲͷͰࢉʣ
$ which • Tales of the Tail: Hardware, OS, and
Application-level Sources of Tail Latency • Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble • University of Washington • SOCC ’14 • ACM Symposium on Cloud Computing • https://sites.google.com/site/2014socc/home/program
Agenda • ֓ཁͱಡ͏ͱͨ͠ཧ༝ • Introduction • Queuing Models and Predicted
Latency • Measurement Method • Sources of Tail Latency • Related Work • Discussion • Conclusion
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • ϚϧνίΞ্ͷHW/OS/AppϨΠϠʔ͔ΒlatencyΛௐࠪ • ϞσϧԽͯ͠RPC/Memcached/NginxͰଌఆ͠ɺݪҼͱτϨʔυΦϑΛௐࠪ • ಡ͏ͱͨ͠ཧ༝ •
Tail latencyͷݟํΛΓ͔͔ͨͬͨΒɻ • େنࢄγεςϜTail latency͕ͨΓલͷੈքʢΒ͍͠ʣͷͰɻ • Podcastܦ༝ • https://misreading.chat/2019/03/27/episode-54-tales-of-the-tail/
Introduction • ωοτϫʔΫӽ͠ʹΓऔΓ͢ΔࢄγεςϜ • େنڥͩͱதԝ͕ܻҧ͍ʹେ͖͘ͳΔʢ=99%ileͰ֬తʹेେ͖͍ʣ • ઍͷmemcached@facebook, ̍ສͷindexαʔό@MS Bing •
null-RPC, Memcached, Nginx (web-server)Ͱݕূ • ཧϞσϧΑΓѱ͍݁Ռʹͳͬͨ • ݪҼΛௐͯtail-latencyΛվળ • ྫɿMemcached 99.9%ile latency: 14ms -> 32us • ྫɿthroughputͱlatencyͷτϨʔυΦϑ
Queuing Models and Predicted Latency (1/3) • ϕʔεϥΠϯʢཧͷԆʣԿ͔ʁ • ϞσϧԽ
• γϯάϧΩϡʔ͕cݸͷworker (core, thread, process, etc)ͰFIFOͤ͞Δ • A/S/c queue (Kendallදه) • Arrival distributionʢ౸ணʣ, Service time distributionʢαʔϏε࣌ؒ ʣ, ಠཱͨ͠cݸͷϫʔΧʔ • ॲཧʹݻఆ͔͔࣌ؒΔωοτϫʔΫӽ͠ͷαʔϏεΛఆ
Queuing Models and Predicted Latency (2/3) • Arrival distributions •
ྫɿϦΫΤετॲཧ͕50usͰྃ͢ΔFIFOαʔϏε • 50usҎʹϥϯμϜϦΫΤετ͕དྷΔͱ”ͪ”͕ൃੜ • -> tail-latencyൃੜ • Utilization • ϫʔΧʔ1ݻఆͷ··ɺฏۉϦΫΤετΛ૿͢=utilΛ্͛Δ • ಉ࣌ʹϦΫΤετ͕དྷΔ͕֬૿͑Δ • -> tail-latencyൃੜ • ͔͠util 50%->95%Ͱ99%ile latency͕10ഒ૿͑Δʂ • ϚΠΫϩόʔετ • ҰճͰϦΫΤετ͕”͔ͿΔ”ͱɺԆΛҾ͖ى͜͢ΩϡʔΛ࡞ͬͯ͠·͏ ܭࢉ ܭࢉ
Queuing Models and Predicted Latency (3/3) • Parallel servers feeding
from one queue • ϫʔΧʔc૿ͤ1/cͰlatencyݮΔ • ڞ༗ΩϡʔΛ͍ͬͯΔ߹ (ϑΥʔΫฒͼ) • ݸผΩϡʔͩͱlatencyมΘΒͣɺthroughput͕૿͑Δ • Queuing discipline • Random worker: ݸผFIFOΩϡʔΛ࣋ͬͯΔ֤ϫʔΧʔׂΓͯ • Random request: ڞ༗ΩϡʔͷϦΫΤετΛϥϯμϜʹબׂ͠Γͯʢ౸ண࣌ؒؔͳ͠ʣ • Ωϡʔ͔ΒҾ͖ग़͢ํࣜʹΑͬͯlatency͕มΘΔ • medianͱ99%ileͰlatencyٯస͢Δ߹͋Γ • FIFO V.S. LIFO (stack) • FIFO V.S. Random request ܭࢉ ܭࢉ
Measurement Method • Null RPC server • TCPͰϦΫΤετ128byteΛड͚ͯ128byteϨεϙϯεฦ͢ • ΞΫηϓτεϨου->ϫʔΧʔੜ->
read/write system call • OSґଘɿTCP, εϨουεέδϡʔϥ • Memcached • O(1)ͳhash-tableΛ࣋ͭin-memory KVSΞϓϦέʔγϣϯɻϫʔΧʔίΞʹൺྫ • UDPϞʔυɿ֤ϫʔΧʔεϨου͕FIFO • TCPϞʔυɿTCPίωΫγϣϯຖʹϫʔΧʔ͕ܾ·͍ͬͯΔʢׂॲཧ1-2usఔʣ • Nginx • ඇಉظI/O system callΛଟ༻ • ϫʔΧʔຖʢίΞຖʣʹΫϥΠΞϯτΛׂΓͯ • 85byte http request -> 849byte http response, ੩తϑΝΠϧΛฦͨ͢ΊɺόοϑΝΩϟογϡʹͨΔʢετϨʔδӨڹແࢹͰ͖Δʣ • epoll systemcallΛ͍ͬͯΔ=४උ͕Ͱ͖ͨॱʹϑΝΠϧσΟεΫϦϓλΛฦ͢ -> FIFO • ֤ΞϓϦͰCPU100%༻ͳঢ়ଶʹͯ͠ɺεϧʔϓοτΛଌఆ͠ɺϦΫΤετॲཧ࣌ؒΛݟੵΔ
Sources of Tail Latency (Background Processes) • 1CPU, 1core, HT
disabled • εέδϡʔϥ͕linuxσʔϞϯʹׂ࣌ؒ->ϦΫΤετ͕ͨ·Δ->tail-latency૿Ճʂ • niceͰεέδϡʔϥͷ༏ઌʢׂ࣌ؒʣΛௐɻׂΓͯΒΕͳ͍ͱͪɻ • ϦΞϧλΠϜεέδϡʔϥɿϦΞϧλΠϜϓϩηεͱͯ͠ࢦఆ͢Δͱ”ׂ࣌ؒΓࠐΈ”͕Ͱ͖Δ • ઐ༻ίΞɿεέδϡʔϥ͕ͪͳ͍ͷͰવ͍ɻίϯςΩετεΠονແ͠
Sources of Tail Latency (Non-FIFO Scheduling) • CFS (Completely Fair
Scheduler) -> ॱংΑΓެฏੑॏࢹɺඇFIFO • ϚϧνεϨουΞϓϦ: ͲͷεϨουʹ࣌ؒΛׂΓͯΔ͔OS࣍ୈ • ૣ͘ऴΘ͔ͬͨɺͰͳ͍ • ϦΞϧλΠϜεέδϡʔϥʹ͢ΔͱɺFIFO͔ͭόοΫάϥϯυׯবݮ ଌఆ ܭࢉ
Sources of Tail Latency (Multicore) • ಉҰNUMA্Ͱ1~4core͏ • Null RPC
serverվળ • γϯάϧΩϡʔ • ଞ2ͭ1coreͱมΘΒͣ • ϦΫΤετ͕TCPίωΫγϣϯ͍·Θ͠ • TCP͕ಛఆϫʔΧʔʹׂΓͯͷͨΊɺϫʔΧʔ͕ภΔ • Memcached • UDPͰγϯάϧΩϡʔʹͳΔ->վળ • Nginx • TCP (http)Λ్தͰcloseͯ͠ɺ࠶ͭͳ͗͠ɺͰվળ • workload࣍ୈɻɻ
Sources of Tail Latency (Interrupt Processing) • packetड৴ͰΧʔωϧׂࠐൃੜ -> irqbalance͕શcoreʹ͜ΕΛࢄ
• ׂࠐൃੜ༧ظͰ͖ͳ͍ʢ=ॲཧ͕࣌ؒҰఆͰͳ͘ͳΔʣ • ڞ༗ΩϡʔͷFIFOͰͳ͍ • ઐ༻ίΞͳΒ͜ΕΛճආ • load͕͍ͱແବʢεϧʔϓοτ͕͍ʣ • େنϚϧνίΞCPUͩͱઐ༻ίΞར༻ʁ
Sources of Tail Latency (NUMA Effects) • 8coreΛ2CPUʹࢄ • σϑΥϧτͰϝϞϦׂΓ͕ͯnode0͔Βɻ
• memcachedεϨουͷϝϞϦΞΫηε͕NUMAΛ·͙ͨ • -> latency૿Ճ • null RPC/NginxϝϞϦ༻ྔ͕গͳ͔ͬͨͷͰӨڹͳ͔ͬͨ • numactlͰcore/memory nodeΛࢦఆ • վળʂ
Sources of Tail Latency (Power Saving Optimizations) • CPU༻10%Ͱଌఆ •
CPU stateɿ C-state͔ΒcoreΛ”ى͜͢”͕͔͔࣌ؒΔ -> tail-latencyʹͳΔ • C3-state͔Βͷwakeup200usɺ͜ΕΛଌఆ • पͷԼɿͬͯͳ͍ͱCPUΫϩοΫपΛݮΒ͢ • NginxCPUෛՙ͕ߴ͍ͨΊɺएׯվળ
Sources of Tail Latency: Summary • nice͚ͩͰෆेɻϦΞϧλΠϜεέδϡʔϥ༗ޮ • ϚϧνεϨουΞϓϦέʔγϣϯFIFOεέδϡʔϥͳΒ༗ޮ •
ϚϧνίΞ༗ޮ͕ͩɺҰൠతʹʢTCPͳͲಛఆίωΫγϣϯΛಛఆ ϫʔΧʹׂΓͯΔΞʔΩςΫνϟͩͱʣޮՌ͕ऑ͍ • NUMAεϨουͱϝϞϦׂΓͯnodeΛ߹ΘͤΔ • ిྗͱtail latencyτϨʔυΦϑ
Related Work • MapReduce/Spark • Ϩεϙϯε͕͍ͱผϗετʹ࠶ϦΫΤετൃߦ • શϨϓϦΧʹಉ࣌ʹ͖͛ͬͯͨͷΛ࠾༻ʢεϧʔϓοτͷແବݣ͍ʣ • ෆશͳ݁ՌΛڐ༰͢Δ
• Ϛϧνςφϯτڥ • latency sensitive VMͱCPU sensitive VMͰϗετΛ͚Δ • DCNWͷεΠονͷΩϡʔᷓΕ • DCTCPతͳΞϓϩʔν • ిྗ • LBͰ௨৴دͤΔɻͬͯͳ͍αʔόফඅిྗঢ়ଶ
Discussion • Ϧιʔε֬อͷํ • ࣌ؒతʢCFSʣ V.S. ۭؒతʢCPUίΞઐ༗ʣ • εϨου V.S.
Πϕϯτ • εϨου+FIFO • Πϕϯτ+ϦΫΤετͷϫʔΧʔׂΓͯͷ࠷దԽ
Conclusion • Tail latencyͷݪҼΛϚϧνίΞHW, OS, ΞϓϦέʔγϣϯϨϕϧͰௐࠪ • ཧͱൺֱ • όοΫάϥϯυϓϩηεͷׯব
• ΩϡʔΠϯάͷํ๏ͱεέδϡʔϥ • ΧʔωϧׂࠐNUMAɺCPUলిྗػೳ • ࠷దԽ͢Δͱ99.9%ileΛେ෯ʹݮՄೳ
EoP
༧උεϥΠυ
Queuing Models and Predicted Latency (1/3) • ϕʔεϥΠϯʢཧͷԆʣԿ͔ʁ • શͯͷϦΫΤετʹಉ͡Ԡ࣌ؒͰॲཧ͢Δ
-> ࣮ࡍ͋Γ͑ͳ͍ • ϦΫΤετ͕དྷΔλΠϛϯά͕όϥόϥʢ=ಉ࣌ʹདྷΔͱ͖͋Γʣ-> ϚΠΫϩόʔετԆൃੜ • ϦΫΤετॲཧ͕࣌ؒಉ͡Ͱશମͷlatency͕ҧ͏͜ͱ͕͋Δ • ߴ͍loadͷͱ͖latencyߴ͍ʁ • ϚϧνίΞԽͰlatencyվળʁ • ΩϡʔΠϯάFIFO͕࠷ʁ • ϞσϧԽ • γϯάϧΩϡʔ͕cݸͷworker (core, thread, process, etc)ͰFIFOͤ͞Δ • A/S/c queue (Kendallදه) • Arrival distributionʢ౸ணʣ, Service time distributionʢαʔϏε࣌ؒʣ, ಠཱͨ͠cݸͷϫʔΧʔ • ॲཧʹݻఆ͔͔࣌ؒΔωοτϫʔΫӽ͠ͷαʔϏεΛఆ
Measurement Method (Timestamping) • NICͰड͚ͯɺNIC͔Βग़͍ͯ͘·Ͱͷ࣌ࠁT1 ~ T6 • ΧʔωϧɺNWυϥΠόɺL7ϓϩτίϧΛमਖ਼ͯ͠ϦΫΤετύέοτʹ30byteՃ •
NTP disabled • T1: NWυϥΠό͕packetॲཧՄೳͱ௨ͨ࣌͠ • T2: TCP/UDPॲཧޙɺΞϓϦॲཧલ • T3: ΞϓϦ͕ίΞʹεέδϡʔϦϯά͞Εͨޙ • T4: ΞϓϦ͕read system callൃߦޙ=Ϣʔβϥϯυʹσʔλ͕ίϐʔ͞Εͨޙ • T5: ΞϓϦ͕write system callൃߦޙ • T6: packetΛૹ৴͢Δ࣌