Agenda
• Resource Management in Kubernetes
• How Kubernetes Resource Control is implemented
• Checking the Throttling
• CPU Throttling Impacts and Fixes
Slide 5
Slide 5 text
Understanding CPU throttling
in Kubernetes
to improve application performance
2020/06/13 KubeFest Tokyo 2020 @ponde_m
Understanding CPU throttling
in Kubernetes
to improve application performance
Slide 6
Slide 6 text
What is Throttling?
Slide 7
Slide 7 text
What is Throttling?
• ฉ͍ͨ͜ͱ͋Δ͕งғؾͰ͍ͬͯΔ
• ͳΜ੍͔ݶ͞Εͯͦ͏͕ͩݴޠԽ͕Ͱ͖ͳ͍
Slide 8
Slide 8 text
What is Throttling?
• εϩοτϧ(ӳ: throttle)ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘
ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢ΔஔͰ͋Δɻओཁͳߏ෦Ͱ͋Δ
หʢόϧϒʣεϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍ߜΓห
ͱݺΕɺหΛૢ࡞͢ΔͨΊͷߏεϩοτϧϨόʔ(ӳ: throttle
lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas
pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺΕ
Δɻ͋Δ͍ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ߹͋Δɻ
https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB
Slide 9
Slide 9 text
What is Throttling?
• εϩοτϧ(ӳ: throttle)ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘
ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢ΔஔͰ͋Δɻओཁͳߏ෦Ͱ͋Δ
หʢόϧϒʣεϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍ߜΓห
ͱݺΕɺหΛૢ࡞͢ΔͨΊͷߏεϩοτϧϨόʔ(ӳ: throttle
lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas
pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺΕ
Δɻ͋Δ͍ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ߹͋Δɻ
https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB
Slide 10
Slide 10 text
What is Throttling?
• ͳͷͰࠓճͷςʔϚΛݴ͍͑Δͱ
• “CPU ͷྲྀྔ੍ޚʹ͍ͭͯཧղͯ͠
ΞϓϦέʔγϣϯͷύϑΥʔϚϯεΛ্ͤ͞Α͏”
• ͱ͍͏ײ͡ʹͳΔ͔ͳͱࢥ͍ͬͯ·͢
Slide 11
Slide 11 text
What is Throttling?
• Throttling CPU usage with Linux cgroups
http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html
Slide 12
Slide 12 text
Resource Management
Slide 13
Slide 13 text
Resource Management
• Kubernetes Ͱ Pod ʹׂΓͯΔ
Resource Λࢦఆ͢Δ͜ͱ͕Ͱ͖Δ
Slide 14
Slide 14 text
Requests
• ࠷ݶ Pod ʹׂΓͯΔ Resource
Slide 15
Slide 15 text
Limits
• Pod ʹׂΓͯΔ Resource ͷ্ݶ
• Limits ʹୡ͢Δͱ CPU ͕ throttle ͞ΕΔ
cpu.shares
• cpu.shares શͯͷίϯςφ͕ϑϧͰ CPU Resource Λ༻͠Α͏ͱͨ͠ࡍͷ
• cpu.shares ʹج͍ܾͮͯΊΒΕͨCPU Resource ͷอূ͞ΕΔ
• Pod ͷ CPU Requests ʹهࡌͨ͠ͷ CPU Resource อূ͞ΕΔ
• ΞΠυϧͷίϯςφ͕ଘࡏ͢Δ߹ۭ͍ͨͷ CPU Λ
ଞͷίϯςφ͕༻͢Δ͜ͱ͕Ͱ͖Δ
• Pod CPU Limits ʹୡ͍ͯ͠ͳ͚Ε༨͍ͬͯΔ CPU Resource Λ
͍͍ײ͡ʹ͏͜ͱ͕Ͱ͖Δ
Slide 38
Slide 38 text
CPU Limits
Slide 39
Slide 39 text
CPU Limits
• CPU Limits CFS (Completely Fair Scheduler) ͷ
ϝΧχζϜΛ༻͍࣮ͯݱ͞Ε͍ͯΔ
• cpu.cfs_period_us ͱ cpu.cfs_quota_us ʹઃఆ͕هࡌ͞Ε͍ͯΔ
Checking the Throttling
• Datadog Ͱ kubernetes.cpu.cfs.throttled.seconds
ͱ͍͏ϝτϦΫε͕͋Δ
Slide 52
Slide 52 text
The Impact of CPU Throttling
Slide 53
Slide 53 text
The Impact of CPU Throttling
• Error Rate Response Time ͷѱԽΛট͘͜ͱ͕͋Δ
• github.com/hjacobs/kubernetes-failure-stories
• Kubernetes ʹؔ͢ΔোࣄྫΛ·ͱΊͨ Repository
• CPU Throttling ͷࣄྫ͕͍͔ͭ͘ొ͢Δ
Slide 54
Slide 54 text
CPU Throttling the Application Pod in Quipper
CPU Limits
CPU Requests
CPU throttled
CPU Usage
Slide 55
Slide 55 text
Why was the CPU Throttling gone?
͋ΔมߋΛՃ͑ͯҎ߱
CPU Throttling ͕ͳ͘ͳͬͨ
Slide 56
Slide 56 text
Fix
Slide 57
Slide 57 text
Fix
• CPU Limits Λফͨ͠
Slide 58
Slide 58 text
Fix
CPU Limits
͜͜Ͱ Limit Λফͨ͠
ͦΕҎ߱
CPU Throttling ͕ͳ͘ͳͬͨ
Slide 59
Slide 59 text
Remove CPU Limits
• CPU Limits ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ
ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
• ۭ͍ͯΔ CPU Λ༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ
جຊతʹΞϓϦέʔγϣϯʹͱͬͯσϝϦοτ
• աͳར༻Λ੍͍ͨ͠߹Ҏ֎ CPU Limits ͍Βͳ͍
• CPU Requests Ͱࢦఆͨ͠༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
Slide 60
Slide 60 text
Remove CPU Limits
• ฐࣾͰ͜͏͍͏ϙϦγʔͰ Pod ͷ Resource Λઃఆ͠Α͏
ͱ͍͏υΩϡϝϯτΛهࡌͯ͠ӡ༻͍ͯ͠Δ
• CPU Limits جຊෆཁͩΑͱ͍͏هࡌΛ͍ͯ͠Δ
• CPU Limits Λআ͢Δ͜ͱʹڧ੍ྗΛಇ͔ͤΔ͜ͱ·͍ͩͯ͠ͳ͍
• શͯͷ Pod ͔Β CPU Limits Λআͨ͠Θ͚Ͱແ͍͕
CPU Throttling ͷϝτϦΫεΛ watch ͍ͯͯ͠
Throttling ͕ൃੜ͢ΕରԠΛ͍ͯ͠Δ
Slide 61
Slide 61 text
Remove CPU Limits
• CPU Limits ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ
ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
• ۭ͍ͯΔ CPU Λ༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ
جຊతʹΞϓϦέʔγϣϯʹͱͬͯσϝϦοτ
• աͳར༻Λ੍͍ͨ͠߹Ҏ֎ CPU Limits ͍Βͳ͍
• CPU Requests Ͱࢦఆͨ͠༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
Slide 62
Slide 62 text
Remove CPU Limits
• CPU Limits ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ
ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
• ۭ͍ͯΔ CPU Λ༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ
جຊతʹΞϓϦέʔγϣϯʹͱͬͯσϝϦοτ
• աͳར༻Λ੍͍ͨ͠߹Ҏ֎ CPU Limits ͍Βͳ͍
• CPU Requests Ͱࢦఆͨ͠༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
CPU Limits ͕ͳ͍ͱ QoS ͕
Guaranteed ʹͳΒͳ͍
QoS Class
• ฐࣾͰݱঢ় CPU Limits Λআͨ͜͠ͱͰ QoS Class ͕ Burstable ʹ
ઃఆ͞ΕΔ͜ͱͰͳʹ͔ࠔ͍ͬͯΔ͜ͱͳ͍
• ݩʑ Guaranteed ͳ Pod ͕ຆͲͳ͔ͬͨͱ͍͏ͷ͋Δ
• Eviction ͕ͦͦͦΜͳʹൃੜ͍ͯ͠ͳ͍
• ࠔΔέʔεଘࡏ͢Δ
Slide 65
Slide 65 text
Static CPU Manager Policy
• Pod ʹ CPU ΛഉଞతʹׂΓͯΔ͜ͱ͕Ͱ͖Δ
• kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ
༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ
• ഉଞతʹ CPU Λ༻͢Δ͜ͱ͕Ͱ͖Δ݅
• Pod ͷ QoS Class ͕ Guaranteed
• Pod ͷ CPU Requests ͕
Slide 66
Slide 66 text
Static CPU Manager Policy
• Pod ʹ CPU ΛഉଞతʹׂΓͯΔ͜ͱ͕Ͱ͖Δ
• kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ
༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ
• ഉଞతʹ CPU Λ༻͢Δ͜ͱ͕Ͱ͖Δ݅
• Pod ͷ QoS Class ͕ Guaranteed
• Pod ͷ CPU Requests ͕
Slide 67
Slide 67 text
cpu-cfs-quota=false
• kubelet ͷ Option Ͱ cpu-cfs-quota=false Λ͢
• CFS Quota ͕ແޮʹͳΔͷͰ CPU Limits ʹΑΔ
Throttle ͕ى͖ͳ͘ͳΔ
• CPU Limits ࣗମࢦఆͰ͖ΔͷͰ QoS Class Guaranteed ʹ
͢Δ͜ͱ͕Ͱ͖Δ
Slide 68
Slide 68 text
cpu-cfs-quota=false
• --cpu-manager-policy=static —cpu-cfs-quota=false
Λͯ͠ kubelet Λىಈ͢Δ
(ผͰ —kube-reserved ͱ —system-reserved Λࢦఆ͍ͯ͠Δ)
• QoS ͕ Guaranteed ͔ͭ CPU Requests ͕ʹ
ͳΔΑ͏ͳ Pod Λ༻ҙ
• 4 Core ͷ Node த 1 Core Λ Nginx Pod ʹઐ༗ͤ͞Δ
Slide 69
Slide 69 text
cpu-cfs-quota=false
• Pod ͷ QoS Class ͕ Guaranteed ͔ͭ cpu.cfs_quota_us ͕ -1 (ແ੍ݶ)
ʹͳ͍ͬͯΔ
Slide 70
Slide 70 text
• cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
Nginx ͷ Pod ͕ CPU 1 ͭΛ͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ
Static CPU Manager Policy
Slide 71
Slide 71 text
• cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
Nginx ͷ Pod ͕ CPU 1 ͭΛ͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ
Static CPU Manager Policy
cpuset σΟϨΫτϦ
cpuset.cpus ϑΝΠϧʹ
ίϯςφ͕ΞΫηεͰ͖Δ CPU ͷ൪߸͕ॻ͔Ε͍ͯΔ
Slide 72
Slide 72 text
Static CPU Manager Policy
CPU 0 CPU 1 CPU 2 CPU 3
$PSF
cpuset.cpus = 2
Pod 2 ൪ͷ CPU ͚ͩ͑Δ
Slide 73
Slide 73 text
• దͳଞͷ Burstable ͳ Pod ͷ cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
2 ൪ͷ CPU ͚͕ͩআ֎͞Ε͍ͯΔ
Static CPU Manager Policy
0 ~ 1 ൪ͱ 3 ൪ͷ CPU Λ༻͢Δ͜ͱ͕Ͱ͖Δ
(Nginx Pod ͕༻͍ͯ͠Δ 2 ൪আ֎͞Ε͍ͯΔ)
Slide 74
Slide 74 text
Static CPU Manager Policy
CPU 0 CPU 1 CPU 2 CPU 3
$PSF
cpuset.cpus = 0-1, 3
Pod 0, 1, 3 ൪ͷ CPU ͚ͩ͑Δ
Other Workarounds
• Pod ͷ CPU Limits Λ૿͢
• ΞϓϦέʔγϣϯͷ Thread ΛݮΒ͢
Slide 77
Slide 77 text
Bood Bye Throttling
Slide 78
Slide 78 text
Conclusion
Slide 79
Slide 79 text
Conclusion
• CPU Throttling ʹΑͬͯΞϓϦέʔγϣϯͷύϑΥʔϚϯε
৴པੑʹѱӨڹ͕ͰΔ͜ͱ͕͋Δ
• CPU Throttling ͷϞϦλϦϯάΛ͠Α͏
• CPU Limits ͷআ CFS Quota Λແޮʹ͢Δ͜ͱͰ
CPU Throttling ͷରࡦΛߦ͏͜ͱ͕Ͱ͖Δ
Slide 80
Slide 80 text
Thank You for Listening
Slide 81
Slide 81 text
References
• Understanding resource limits in kubernetes: cpu time
• https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-
time-9eff74d3161b
• CPU limits and aggressive throttling in Kubernetes - Omio Engineering - Medium
• https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes-
c5b20bd8a718
• Understanding Linux Container Scheduling — Squarespace / Engineering
• https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
Slide 82
Slide 82 text
References
• DockerίϯςφͰར༻Ͱ͖ΔϦιʔεݖݶΛ੍ݶ͢ΔʢDockerͷ࠷৽ػೳΛͬͯΈΑ͏ɿୈ3
ճʣ | ͘͞ΒͷφϨοδ
• https://knowledge.sakura.ad.jp/5118/
• OpenShiftͷResource requestͱlimit - nekop's blog
• https://nekop.hatenablog.com/entry/2017/12/20/182523
• How to Evolve Kubernetes Resource Management Model
• https://www.infoq.com/presentations/evolve-kubernetes-resource-manager/
Slide 83
Slide 83 text
References
• 3.2. cpu Red Hat Enterprise Linux 6 | Red Hat Customer Portal
• https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/
resource_management_guide/sec-cpu
• 3.4. cpuset Red Hat Enterprise Linux 6 | Red Hat Customer Portal
• https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/
resource_management_guide/sec-cpuset
• ୈ4ষ CPU Ϛωʔδϟʔͷ༻ OpenShift Container Platform 4.1 | Red Hat Customer Portal
• https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.1/html/
scalability_and_performance/using-cpu-manager
References
• Kubernetes best practices: Resource requests and limits | Google Cloud Blog
• https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-
and-limits
• community/resource-qos.md at master · kubernetes/community
• https://github.com/kubernetes/community/blob/master/contributors/design-proposals/
node/resource-qos.md#qos-classes
• community/resources.md at master · kubernetes/community
• https://github.com/kubernetes/community/blob/master/contributors/design-proposals/
scheduling/resources.md
Slide 86
Slide 86 text
References
• Control CPU Management Policies on the Node - Kubernetes
• https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/
• Feature Highlight: CPU Manager - Kubernetes
• https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/
• kubelet - Kubernetes
• https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
• Reserve Compute Resources for System Daemons - Kubernetes
• https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
Slide 87
Slide 87 text
References
• CFS quotas can lead to unnecessary throttling · Issue #67577 · kubernetes/kubernetes
• https://github.com/kubernetes/kubernetes/issues/67577
• Throttling: New Developments in Application Performance with CPU Limits - Dave Chiluk, Indeed -
YouTube
• https://www.youtube.com/watch?v=UE7QX98-kO0
• Throttling CPU usage with Linux cgroups
• http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html
• hjacobs/kubernetes-failure-stories: Compilation of public failure/horror stories related to Kubernetes
• https://github.com/hjacobs/kubernetes-failure-stories