Understanding CPU throttling in Kubernetes to improve application performance #k8sjp

Cad656ed619672b702191833dc819943?s=47 d-kuro
June 13, 2020

Understanding CPU throttling in Kubernetes to improve application performance #k8sjp

Cad656ed619672b702191833dc819943?s=128

d-kuro

June 13, 2020
Tweet

Transcript

  1. Understanding CPU throttling in Kubernetes to improve application performance 2020/06/13

    KubeFest Tokyo 2020 @ponde_m
  2. I work at @ponde_m @d-kuro

  3. Introduction • Kubernetes ͷ Resource Management ʹ͍ͭͯ৭ʑ࿩͠·͕͢ ࠓճ͸ CPU ʹϑΥʔΧεͨ͠࿩Λ͠·͢

    • Memory ʹ͍ͭͯͷ࿩͸͠·ͤΜ
  4. Agenda • Resource Management in Kubernetes • How Kubernetes Resource

    Control is implemented • Checking the Throttling • CPU Throttling Impacts and Fixes
  5. Understanding CPU throttling in Kubernetes to improve application performance 2020/06/13

    KubeFest Tokyo 2020 @ponde_m Understanding CPU throttling in Kubernetes to improve application performance
  6. What is Throttling?

  7. What is Throttling? • ฉ͍ͨ͜ͱ͸͋Δ͕งғؾͰ࢖͍ͬͯΔ • ͳΜ੍͔ݶ͞Εͯͦ͏͕ͩݴޠԽ͕Ͱ͖ͳ͍

  8. What is Throttling? • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘ ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห

    ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB
  9. What is Throttling? • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘ ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห

    ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB
  10. What is Throttling? • ͳͷͰࠓճͷςʔϚΛݴ͍׵͑Δͱ • “CPU ͷྲྀྔ੍ޚʹ͍ͭͯཧղͯ͠ ΞϓϦέʔγϣϯͷύϑΥʔϚϯεΛ޲্ͤ͞Α͏” •

    ͱ͍͏ײ͡ʹͳΔ͔ͳͱࢥ͍ͬͯ·͢
  11. What is Throttling? • Throttling CPU usage with Linux cgroups

    http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html
  12. Resource Management

  13. Resource Management • Kubernetes Ͱ͸ Pod ʹׂΓ౰ͯΔ Resource Λࢦఆ͢Δ͜ͱ͕Ͱ͖Δ

  14. Requests • ࠷௿ݶ Pod ʹׂΓ౰ͯΔ Resource

  15. Limits • Pod ʹׂΓ౰ͯΔ Resource ͷ্ݶ • Limits ʹୡ͢Δͱ CPU

    ͕ throttle ͞ΕΔ
  16. Millicores • 1000 millicores == 1 Core • 2 Core

    ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹ • 2000m • 1 Core ͷ ¼ ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹ • 250m
  17. Scheduling

  18. Scheduling • Node ΁ͷ Pod ͷ Scheduling ͸ Requests ΛݩʹߦΘΕΔ

    Requests 1000m Requests 1000m Requests 1000m Requests 1000m resources: requests: cpu: 1000m limits: cpu: 2000m Schedule $PSF Requests 1000m Requests 1000m Requests 1000m $PSF
  19. Scheduling • Node ͷ Resource ΑΓ΋େ͖͍ Requests ͩͱ Pod ͕

    schedule ͞Εͳ͍ resources: requests: cpu: 1000m limits: cpu: 2000m schedule resources: requests: cpu: 5000m limits: cpu: 6000m schedule Requests 1000m Requests 1000m Requests 1000m Requests 1000m $PSF $PSF
  20. Scheduling • Limits ͸ Scheduling ͷࡍʹߟྀ͞Εͳ͍ͷͰ Pod ͕Limits ·Ͱ Resource

    Λ࢖͑Δอূ͸ͳ͍ Node ͷར༻Մೳͳ Resource Λ্ճΔࣄ͕͋Δ (overcommitted state) Requests 1000m Requests 1000m Requests 1000m Requests 1000m $PSF Limits 2000m
  21. Overcommitted State Limits 2000m Requests 1000m Requests 1000m Requests 1000m

    Requests 1000m • Overcommitted State ʹͳͬͨࡍʹͲͷΑ͏ʹৼΔ෣͏ͷ͔ • CPU ͷ৔߹͸ CPU Requests ͷ෼ͷ CPU Λऔಘͯ͠࢒ΓΛ throttle ͤ͞Δ throttle!! CPU ͸ѹॖՄೳͳ Resource https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/resources.md $PSF
  22. Tips: Capacity and Allocatable • Node ͷ status ʹ͸ Capacity

    ͱ Allocatable ͱ͍͏2ͭͷ߲໨͕͋Δ • Allocatable • Pod Λಈ͔ͨ͢Ίʹ࢖༻Մೳͳ Resource • Capacity • Node શମͰ࢖༻Մೳͳ Resource • kubectl describe node ͱ͔Ͱ֬ೝͰ͖Δ
  23. Tips: Capacity and Allocatable • kube-reserved • kubelet ͳͲͷ Node

    ͷίϯϙʔωϯτ༻ʹ ༧໿͞Εͨ Resource • system-reserved • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource • eviction-thresholds • Eviction ͷᮢ஋
  24. Tips: Capacity and Allocatable • kube-reserved • kubelet ͳͲͷ Node

    ͷίϯϙʔωϯτ༻ʹ ༧໿͞Εͨ Resource • system-reserved • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource • eviction-thresholds • Eviction ͷᮢ஋ /PEF$BQBDJUZ LVCFSFTFSWFE TZTUFNSFTFSWFE FWJDUJPOUISFTIPME "MMPDBUBCMF
  25. Tips: Capacity and Allocatable • kube-reserved ͱ system-reserved ʹ ؔͯ͠͸σϑΥϧτ஋͕ͳ͍

    • Scheduler ͸ kubelet ͳͲͷγεςϜίϯϙʔωϯτ͕ ࢖༻͢Δ Resource ͷ͜ͱΛߟྀͤͣʹ Pod Λ Schedule ͢Δ • Node ͕ෆ҆ఆʹͳΔՄೳੑ͕͋ΔͷͰઃఆ͢Δͱ ⭕ • ֤छύϒϦοΫΫϥ΢υͷ Managed Kubernetes Ͱ͸ ͍͍ײ͡ʹઃఆ͞Ε͍ͯͨΓ͢Δ͜ͱ΋͋Δ
  26. How Kubernetes Resource Control is implemented

  27. How Kubernetes Resource Control is implemented • Kubernetes Ͱ͸ cgroup

    Λ༻͍ͯ Resource ͷׂ౰Λ੍ޚ͍ͯ͠Δ • ࠨͷ Pod Λ࢖ͬͯৄࡉΛݟ͍ͯ͘
  28. CPU Requests

  29. CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

  30. CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

  31. CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

  32. CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

  33. CPU Requests

  34. CPU Requests • CPU Requests ͸ cpu.shares ͱ͍͏ϑΝΠϧʹ൓ө͞ΕΔ • 250

    ͱ 256 Ͱ਺ࣈ͕ζϨ͍ͯΔͷ͸ cgroup ͕ CPU 1 Core Λ 1024 ʹ෼ׂ͢Δ͕ Kubernetes Ͱ͸ 1000 ʹ෼ׂ͍ͯ͠ΔͨΊ
  35. cpu.shares cpu.shares 1024 cpu.shares 1024 cpu.shares 1024 cpu.shares 1024 cpu.shares

    1536 512 cpu.shares 1024 cpu.shares 1024 25% 25% 25% 25% 25% 25% 37.5% 12.5% cpu.shares ͕ ίϯςφؒͰશͯಉ͡৔߹ ۉ౳ʹ CPU Resource͕഑෼͞ΕΔ cpu.shares ͕ ίϯςφؒͰҟͳΔ৔߹͸ ൺ཰ʹԠͯ͡ CPU Resource ͕ ഑෼͞ΕΔ • CPU ഑෼ͷॏΈ • cpu.share ͷ߹ܭʹରͯ͠഑෼͕ܭࢉ͞ΕΔͨΊ૬ରతͳ஋
  36. cpu.shares 1536 512 cpu.shares 1024 cpu.shares 1024 cpu.shares cpu.shares 1536

    512 cpu.shares 1024 cpu.shares 1024 25% 25% 1536/4096 = 37.5% 12.5% cpu.shares 2048 cpu.shares 2048 ͷ ίϯςφΛ௥Ճ͢Δ 1536/6144 = 25% ໿8.3% ໿16.7% ໿33.3% ໿16.7%
  37. cpu.shares • cpu.shares ͸શͯͷίϯςφ͕ϑϧͰ CPU Resource Λ࢖༻͠Α͏ͱͨ͠ࡍͷ഑෼ • cpu.shares ʹج͍ܾͮͯΊΒΕͨCPU

    Resource ͷ഑෼͸อূ͞ΕΔ • Pod ͷ CPU Requests ʹهࡌͨ͠෼ͷ CPU Resource ͸อূ͞ΕΔ • ΞΠυϧͷίϯςφ͕ଘࡏ͢Δ৔߹͸ۭ͍ͨ෼ͷ CPU Λ ଞͷίϯςφ͕࢖༻͢Δ͜ͱ͕Ͱ͖Δ • Pod ͸ CPU Limits ʹୡ͍ͯ͠ͳ͚Ε͹༨͍ͬͯΔ CPU Resource Λ ͍͍ײ͡ʹ࢖͏͜ͱ͕Ͱ͖Δ
  38. CPU Limits

  39. CPU Limits • CPU Limits ͸ CFS (Completely Fair Scheduler)

    ͷ ϝΧχζϜΛ༻͍࣮ͯݱ͞Ε͍ͯΔ • cpu.cfs_period_us ͱ cpu.cfs_quota_us ʹઃఆ͕هࡌ͞Ε͍ͯΔ
  40. cpu.cfs_period_us / cpu.cfs_quota_us • Requests ʹ ༻͍ΒΕ͍ͯΔ cpu.shares ͱ͸ҟͳΓ Limits

    ʹ༻͍Β Ε͍ͯΔ CPU ͷׂ౰͸ظؒʹج͍͍ͮͯΔ • cpu.cfs_period_us • ظؒͷఆٛ, σϑΥϧτ͸ 100000us(100ms) • cpu.cfs_quota_us • ظؒ಺Ͱ࣮ߦͰ͖Δ߹ܭ࣌ؒ
  41. cpu.cfs_period_us / cpu.cfs_quota_us • CPU Limits ͸ cpu.cfs_quota_us ʹม׵͞ΕΔ •

    Լͷྫͩͱ 100ms ͷظؒத 50ms, CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ
  42. cpu.cfs_period_us / cpu.cfs_quota_us • CPU Limits: 2000m ͷΑ͏ʹࢦఆͨ͠৔߹͸ cpu.cfs_period_us: 100000

    cpu.cfs_quota_us: 200000 ͱͳΔ
  43. cpu.cfs_period_us / cpu.cfs_quota_us • CPU Limits ͕ͳ͍৔߹ ͸ cpu.cfs_quota_us ʹ͸

    -1 ͕ઃఆ͞ΕΔ • -1 ͕ઃఆ͞ΕΔͱແ੍ݶʹͳΔ
  44. Time (ms) cpu.cfs_period_us / cpu.cfs_quota_us • ॲཧΛ׬ྃ͢ΔͨΊʹ 250ms ͷॲཧ͕࣌ؒඞཁͳ γϯάϧεϨουͷΞϓϦέʔγϣϯ

    250ms 100ms 200ms 300ms 400ms
  45. Time (ms) cpu.cfs_period_us / cpu.cfs_quota_us • cpu.cfs_period_us: 100000 cpu.cfs_quota_us: 50000

    ͷ৔߹ 50ms 100ms 200ms 300ms 400ms 50ms 50ms 50ms 50ms throttled throttled throttled throttled
  46. Time (ms) cpu.cfs_period_us / cpu.cfs_quota_us • cpu.cfs_period_us: 100000 cpu.cfs_quota_us: 50000

    ͷ৔߹ 50ms 100ms 200ms 300ms 400ms 50ms 50ms 50ms 50ms throttled throttled throttled throttled 50ms 100ms 200ms 300ms 400ms 50ms 50ms 50ms 50ms throttled throttled throttled throttled Time (ms) Throttling ͞ΕΔͷͰ ׬ྃ·Ͱʹ 450 ms ͔͔ͬͯ͠·͏
  47. Kernel Bugs

  48. Kernel Bugs • Kernel ʹ CPU ͷ࢖༻཰͕௿ͯ͘΋ίϯςφ͕ Throttling ͞ΕΔόά͕ଘࡏͨ͠ •

    Indeed ͷਓ͕ॻ͍ͨϒϩάʹৄࡉ͔Β patch Λग़ͨ͠࿩·Ͱৄ͘͠ॻ͍ͯ͋Δ • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά https://jp.engineering.indeedblog.com/blog/2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0%e3%81%ae%e8%a7%a3%e9%99%a4- %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85%b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/
  49. Checking the Throttling

  50. Checking the Throttling • cpu.stat Λ֬ೝ͢Δ • nr_periods: ܦաࡁΈͷظؒͷ਺ •

    nr_throttled: nr_periods த Throttling ͷର৅ͱͳͬͨظؒͷ਺ • throttled_time: ૯ Throttle ࣌ؒ (ns)
  51. Checking the Throttling • Datadog Ͱ͸ kubernetes.cpu.cfs.throttled.seconds ͱ͍͏ϝτϦΫε͕͋Δ

  52. The Impact of CPU Throttling

  53. The Impact of CPU Throttling • Error Rate ΍ Response

    Time ͷѱԽΛট͘͜ͱ͕͋Δ • github.com/hjacobs/kubernetes-failure-stories • Kubernetes ʹؔ͢Δো֐ࣄྫΛ·ͱΊͨ Repository • CPU Throttling ͷࣄྫ͕͍͔ͭ͘ొ৔͢Δ
  54. CPU Throttling the Application Pod in Quipper CPU Limits CPU

    Requests CPU throttled CPU Usage
  55. Why was the CPU Throttling gone? ͋ΔมߋΛՃ͑ͯҎ߱ CPU Throttling ͕ͳ͘ͳͬͨ

  56. Fix

  57. Fix • CPU Limits Λফͨ͠

  58. Fix CPU Limits ͜͜Ͱ Limit Λফͨ͠ ͦΕҎ߱ CPU Throttling ͕ͳ͘ͳͬͨ

  59. Remove CPU Limits • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource

    ͕͋ͬͯ΋ ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸ جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍ • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
  60. Remove CPU Limits • ฐࣾͰ͸͜͏͍͏ϙϦγʔͰ Pod ͷ Resource Λઃఆ͠Α͏ ͱ͍͏υΩϡϝϯτΛهࡌͯ͠ӡ༻͍ͯ͠Δ

    • CPU Limits ͸جຊෆཁͩΑͱ͍͏هࡌΛ͍ͯ͠Δ • CPU Limits Λ࡟আ͢Δ͜ͱʹڧ੍ྗΛಇ͔ͤΔ͜ͱ͸·͍ͩͯ͠ͳ͍ • શͯͷ Pod ͔Β CPU Limits Λ࡟আͨ͠Θ͚Ͱ͸ແ͍͕ CPU Throttling ͷϝτϦΫεΛ watch ͍ͯͯ͠ Throttling ͕ൃੜ͢Ε͹౎౓ରԠΛ͍ͯ͠Δ
  61. Remove CPU Limits • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource

    ͕͋ͬͯ΋ ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸ جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍ • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
  62. Remove CPU Limits • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource

    ͕͋ͬͯ΋ ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸ جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍ • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ CPU Limits ͕ͳ͍ͱ QoS ͕ Guaranteed ʹͳΒͳ͍
  63. QoS Class • Eviction ͷ࣌ʹߟྀ͞ΕΔ Pod ͷ༏ઌ౓ • Guaranteed •

    Requests/Limits ͕Ұக • Burstable • Requests/Limits ͕ෆҰக • BestEffort • Requests/Limits ͕ͳʹ΋ͳ͍
  64. QoS Class • ฐࣾͰ͸ݱঢ় CPU Limits Λ࡟আͨ͜͠ͱͰ QoS Class ͕

    Burstable ʹ ઃఆ͞ΕΔ͜ͱͰͳʹ͔ࠔ͍ͬͯΔ͜ͱ͸ͳ͍ • ݩʑ Guaranteed ͳ Pod ͕ຆͲͳ͔ͬͨͱ͍͏ͷ͸͋Δ • Eviction ͸ͦ΋ͦ΋͕ͦΜͳʹൃੜ͍ͯ͠ͳ͍ • ࠔΔέʔε͸ଘࡏ͢Δ
  65. Static CPU Manager Policy • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ •

    kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅ • Pod ͷ QoS Class ͕ Guaranteed • Pod ͷ CPU Requests ͕੔਺஋
  66. Static CPU Manager Policy • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ •

    kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅ • Pod ͷ QoS Class ͕ Guaranteed • Pod ͷ CPU Requests ͕੔਺஋
  67. cpu-cfs-quota=false • kubelet ͷ Option Ͱ cpu-cfs-quota=false Λ౉͢ • CFS

    Quota ͕ແޮʹͳΔͷͰ CPU Limits ʹΑΔ Throttle ͕ى͖ͳ͘ͳΔ • CPU Limits ࣗମ͸ࢦఆͰ͖ΔͷͰ QoS Class ͸ Guaranteed ʹ ͢Δ͜ͱ͕Ͱ͖Δ
  68. cpu-cfs-quota=false • --cpu-manager-policy=static —cpu-cfs-quota=false Λ౉ͯ͠ kubelet Λىಈ͢Δ (ผͰ —kube-reserved ͱ

    —system-reserved Λࢦఆ͍ͯ͠Δ) • QoS ͕ Guaranteed ͔ͭ CPU Requests ͕੔਺஋ʹ ͳΔΑ͏ͳ Pod Λ༻ҙ • 4 Core ͷ Node த 1 Core Λ Nginx Pod ʹઐ༗ͤ͞Δ
  69. cpu-cfs-quota=false • Pod ͷ QoS Class ͕ Guaranteed ͔ͭ cpu.cfs_quota_us

    ͕ -1 (ແ੍ݶ) ʹͳ͍ͬͯΔ
  70. • cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ

    Static CPU Manager Policy
  71. • cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ

    Static CPU Manager Policy cpuset σΟϨΫτϦ cpuset.cpus ϑΝΠϧʹ ίϯςφ͕ΞΫηεͰ͖Δ CPU ͷ൪߸͕ॻ͔Ε͍ͯΔ
  72. Static CPU Manager Policy CPU 0 CPU 1 CPU 2

    CPU 3 $PSF cpuset.cpus = 2 Pod ͸ 2 ൪ͷ CPU ͚ͩ࢖͑Δ
  73. • ద౰ͳଞͷ Burstable ͳ Pod ͷ cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ 2 ൪ͷ

    CPU ͚͕ͩআ֎͞Ε͍ͯΔ Static CPU Manager Policy 0 ~ 1 ൪ͱ 3 ൪ͷ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ (Nginx Pod ͕࢖༻͍ͯ͠Δ 2 ൪͸আ֎͞Ε͍ͯΔ)
  74. Static CPU Manager Policy CPU 0 CPU 1 CPU 2

    CPU 3 $PSF cpuset.cpus = 0-1, 3 Pod ͸ 0, 1, 3 ൪ͷ CPU ͚ͩ࢖͑Δ
  75. cpu-cfs-quota=false • ஫ҙ఺ͱͯ͠͸ kubelet ͷ Option Ͱࢦఆ͢ΔͷͰ Node ୯ҐͰͷ༗ޮԽ/ແޮԽʹͳΔ •

    ༗ޮԽ͢Δ Node ͰͲͷ Pod ͕ಈ͍͍ͯΔ͔ͱ͍͏֬ೝΛ͢Δͱ⭕
  76. Other Workarounds • Pod ͷ CPU Limits Λ૿΍͢ • ΞϓϦέʔγϣϯͷ

    Thread ΛݮΒ͢
  77. Bood Bye Throttling

  78. Conclusion

  79. Conclusion • CPU Throttling ʹΑͬͯΞϓϦέʔγϣϯͷύϑΥʔϚϯε΍ ৴པੑʹѱӨڹ͕ͰΔ͜ͱ͕͋Δ • CPU Throttling ͷϞϦλϦϯάΛ͠Α͏

    • CPU Limits ͷ࡟আ΍ CFS Quota Λແޮʹ͢Δ͜ͱͰ CPU Throttling ΁ͷରࡦΛߦ͏͜ͱ͕Ͱ͖Δ
  80. Thank You for Listening

  81. References • Understanding resource limits in kubernetes: cpu time •

    https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu- time-9eff74d3161b • CPU limits and aggressive throttling in Kubernetes - Omio Engineering - Medium • https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes- c5b20bd8a718 • Understanding Linux Container Scheduling — Squarespace / Engineering • https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling
  82. References • DockerίϯςφͰར༻Ͱ͖ΔϦιʔε΍ݖݶΛ੍ݶ͢ΔʢDockerͷ࠷৽ػೳΛ࢖ͬͯΈΑ͏ɿୈ3 ճʣ | ͘͞ΒͷφϨοδ • https://knowledge.sakura.ad.jp/5118/ • OpenShiftͷResource

    requestͱlimit - nekop's blog • https://nekop.hatenablog.com/entry/2017/12/20/182523 • How to Evolve Kubernetes Resource Management Model • https://www.infoq.com/presentations/evolve-kubernetes-resource-manager/
  83. References • 3.2. cpu Red Hat Enterprise Linux 6 |

    Red Hat Customer Portal • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/ resource_management_guide/sec-cpu • 3.4. cpuset Red Hat Enterprise Linux 6 | Red Hat Customer Portal • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/ resource_management_guide/sec-cpuset • ୈ4ষ CPU Ϛωʔδϟʔͷ࢖༻ OpenShift Container Platform 4.1 | Red Hat Customer Portal • https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.1/html/ scalability_and_performance/using-cpu-manager
  84. References • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά •

    https://jp.engineering.indeedblog.com/blog/ 2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0 %e8%a7%a3%e9%99%a4- %e3%82%af%e3%83%a9%e3%82%a6%e3%83%89%e3%81%ab%e3%81%8a%e3%81%91%e3%82% 8b-cpu-%e3%81%ae%e5%88%b6%e9%99%90%e3%81%ae/ • εϩοτϦϯάͷղআ: ༗ޮͳमਖ਼͕ෆ۩߹ͷݪҼʹͳͬͯ͠·ͬͨཧ༝ - Indeed ΤϯδχΞϦϯάɾϒϩά • https://jp.engineering.indeedblog.com/blog/ 2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0 %e3%81%ae%e8%a7%a3%e9%99%a4- %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85% b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/
  85. References • Kubernetes best practices: Resource requests and limits |

    Google Cloud Blog • https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests- and-limits • community/resource-qos.md at master · kubernetes/community • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/ node/resource-qos.md#qos-classes • community/resources.md at master · kubernetes/community • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/ scheduling/resources.md
  86. References • Control CPU Management Policies on the Node -

    Kubernetes • https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ • Feature Highlight: CPU Manager - Kubernetes • https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ • kubelet - Kubernetes • https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ • Reserve Compute Resources for System Daemons - Kubernetes • https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
  87. References • CFS quotas can lead to unnecessary throttling ·

    Issue #67577 · kubernetes/kubernetes • https://github.com/kubernetes/kubernetes/issues/67577 • Throttling: New Developments in Application Performance with CPU Limits - Dave Chiluk, Indeed - YouTube • https://www.youtube.com/watch?v=UE7QX98-kO0 • Throttling CPU usage with Linux cgroups • http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html • hjacobs/kubernetes-failure-stories: Compilation of public failure/horror stories related to Kubernetes • https://github.com/hjacobs/kubernetes-failure-stories