Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding CPU throttling in Kubernetes to improve application performance #k8sjp

d-kuro
June 13, 2020

Understanding CPU throttling in Kubernetes to improve application performance #k8sjp

d-kuro

June 13, 2020
Tweet

More Decks by d-kuro

Other Decks in Technology

Transcript

  1. Understanding CPU throttling
    in Kubernetes
    to improve application performance
    2020/06/13 KubeFest Tokyo 2020 @ponde_m

    View Slide

  2. I work at
    @ponde_m @d-kuro

    View Slide

  3. Introduction
    • Kubernetes ͷ Resource Management ʹ͍ͭͯ৭ʑ࿩͠·͕͢
    ࠓճ͸ CPU ʹϑΥʔΧεͨ͠࿩Λ͠·͢
    • Memory ʹ͍ͭͯͷ࿩͸͠·ͤΜ

    View Slide

  4. Agenda
    • Resource Management in Kubernetes
    • How Kubernetes Resource Control is implemented
    • Checking the Throttling
    • CPU Throttling Impacts and Fixes

    View Slide

  5. Understanding CPU throttling
    in Kubernetes
    to improve application performance
    2020/06/13 KubeFest Tokyo 2020 @ponde_m
    Understanding CPU throttling
    in Kubernetes
    to improve application performance

    View Slide

  6. What is Throttling?

    View Slide

  7. What is Throttling?
    • ฉ͍ͨ͜ͱ͸͋Δ͕งғؾͰ࢖͍ͬͯΔ
    • ͳΜ੍͔ݶ͞Εͯͦ͏͕ͩݴޠԽ͕Ͱ͖ͳ͍

    View Slide

  8. What is Throttling?
    • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘
    ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ
    หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห
    ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle
    lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas
    pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε
    Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ
    https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB

    View Slide

  9. What is Throttling?
    • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘
    ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ
    หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห
    ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle
    lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas
    pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε
    Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ
    https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB

    View Slide

  10. What is Throttling?
    • ͳͷͰࠓճͷςʔϚΛݴ͍׵͑Δͱ
    • “CPU ͷྲྀྔ੍ޚʹ͍ͭͯཧղͯ͠
    ΞϓϦέʔγϣϯͷύϑΥʔϚϯεΛ޲্ͤ͞Α͏”
    • ͱ͍͏ײ͡ʹͳΔ͔ͳͱࢥ͍ͬͯ·͢

    View Slide

  11. What is Throttling?
    • Throttling CPU usage with Linux cgroups
    http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html

    View Slide

  12. Resource Management

    View Slide

  13. Resource Management
    • Kubernetes Ͱ͸ Pod ʹׂΓ౰ͯΔ
    Resource Λࢦఆ͢Δ͜ͱ͕Ͱ͖Δ

    View Slide

  14. Requests
    • ࠷௿ݶ Pod ʹׂΓ౰ͯΔ Resource

    View Slide

  15. Limits
    • Pod ʹׂΓ౰ͯΔ Resource ͷ্ݶ
    • Limits ʹୡ͢Δͱ CPU ͕ throttle ͞ΕΔ

    View Slide

  16. Millicores
    • 1000 millicores == 1 Core
    • 2 Core ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹
    • 2000m
    • 1 Core ͷ ¼ ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹
    • 250m

    View Slide

  17. Scheduling

    View Slide

  18. Scheduling
    • Node ΁ͷ Pod ͷ Scheduling ͸ Requests ΛݩʹߦΘΕΔ
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    resources:
    requests:
    cpu: 1000m
    limits:
    cpu: 2000m
    Schedule
    $PSF
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    $PSF

    View Slide

  19. Scheduling
    • Node ͷ Resource ΑΓ΋େ͖͍ Requests ͩͱ Pod ͕ schedule ͞Εͳ͍
    resources:
    requests:
    cpu: 1000m
    limits:
    cpu: 2000m
    schedule
    resources:
    requests:
    cpu: 5000m
    limits:
    cpu: 6000m
    schedule
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    $PSF
    $PSF

    View Slide

  20. Scheduling
    • Limits ͸ Scheduling ͷࡍʹߟྀ͞Εͳ͍ͷͰ
    Pod ͕Limits ·Ͱ Resource Λ࢖͑Δอূ͸ͳ͍
    Node ͷར༻Մೳͳ
    Resource Λ্ճΔࣄ͕͋Δ
    (overcommitted state)
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    $PSF
    Limits
    2000m

    View Slide

  21. Overcommitted State
    Limits
    2000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    • Overcommitted State ʹͳͬͨࡍʹͲͷΑ͏ʹৼΔ෣͏ͷ͔
    • CPU ͷ৔߹͸ CPU Requests ͷ෼ͷ CPU Λऔಘͯ͠࢒ΓΛ throttle ͤ͞Δ
    throttle!!
    CPU ͸ѹॖՄೳͳ
    Resource
    https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/resources.md
    $PSF

    View Slide

  22. Tips: Capacity and Allocatable
    • Node ͷ status ʹ͸ Capacity ͱ Allocatable ͱ͍͏2ͭͷ߲໨͕͋Δ
    • Allocatable
    • Pod Λಈ͔ͨ͢Ίʹ࢖༻Մೳͳ Resource
    • Capacity
    • Node શମͰ࢖༻Մೳͳ Resource
    • kubectl describe node ͱ͔Ͱ֬ೝͰ͖Δ

    View Slide

  23. Tips: Capacity and Allocatable
    • kube-reserved
    • kubelet ͳͲͷ Node ͷίϯϙʔωϯτ༻ʹ
    ༧໿͞Εͨ Resource
    • system-reserved
    • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource
    • eviction-thresholds
    • Eviction ͷᮢ஋

    View Slide

  24. Tips: Capacity and Allocatable
    • kube-reserved
    • kubelet ͳͲͷ Node ͷίϯϙʔωϯτ༻ʹ
    ༧໿͞Εͨ Resource
    • system-reserved
    • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource
    • eviction-thresholds
    • Eviction ͷᮢ஋
    /PEF$BQBDJUZ
    LVCFSFTFSWFETZTUFNSFTFSWFEFWJDUJPOUISFTIPME
    "MMPDBUBCMF

    View Slide

  25. Tips: Capacity and Allocatable
    • kube-reserved ͱ system-reserved ʹ
    ؔͯ͠͸σϑΥϧτ஋͕ͳ͍
    • Scheduler ͸ kubelet ͳͲͷγεςϜίϯϙʔωϯτ͕
    ࢖༻͢Δ Resource ͷ͜ͱΛߟྀͤͣʹ Pod Λ Schedule ͢Δ
    • Node ͕ෆ҆ఆʹͳΔՄೳੑ͕͋ΔͷͰઃఆ͢Δͱ ⭕
    • ֤छύϒϦοΫΫϥ΢υͷ Managed Kubernetes Ͱ͸
    ͍͍ײ͡ʹઃఆ͞Ε͍ͯͨΓ͢Δ͜ͱ΋͋Δ

    View Slide

  26. How Kubernetes
    Resource Control is implemented

    View Slide

  27. How Kubernetes
    Resource Control is implemented
    • Kubernetes Ͱ͸ cgroup Λ༻͍ͯ Resource ͷׂ౰Λ੍ޚ͍ͯ͠Δ
    • ࠨͷ Pod Λ࢖ͬͯৄࡉΛݟ͍ͯ͘

    View Slide

  28. CPU Requests

    View Slide

  29. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View Slide

  30. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View Slide

  31. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View Slide

  32. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View Slide

  33. CPU Requests

    View Slide

  34. CPU Requests
    • CPU Requests ͸ cpu.shares ͱ͍͏ϑΝΠϧʹ൓ө͞ΕΔ
    • 250 ͱ 256 Ͱ਺ࣈ͕ζϨ͍ͯΔͷ͸ cgroup ͕ CPU 1 Core Λ
    1024 ʹ෼ׂ͢Δ͕ Kubernetes Ͱ͸ 1000 ʹ෼ׂ͍ͯ͠ΔͨΊ

    View Slide

  35. cpu.shares
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    1536
    512
    cpu.shares
    1024
    cpu.shares
    1024
    25% 25% 25% 25%
    25%
    25%
    37.5% 12.5%
    cpu.shares ͕
    ίϯςφؒͰશͯಉ͡৔߹
    ۉ౳ʹ CPU Resource͕഑෼͞ΕΔ
    cpu.shares ͕
    ίϯςφؒͰҟͳΔ৔߹͸
    ൺ཰ʹԠͯ͡ CPU Resource ͕
    ഑෼͞ΕΔ
    • CPU ഑෼ͷॏΈ
    • cpu.share ͷ߹ܭʹରͯ͠഑෼͕ܭࢉ͞ΕΔͨΊ૬ରతͳ஋

    View Slide

  36. cpu.shares
    1536
    512
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    cpu.shares
    1536
    512
    cpu.shares
    1024
    cpu.shares
    1024
    25%
    25%
    1536/4096
    = 37.5%
    12.5%
    cpu.shares
    2048
    cpu.shares 2048 ͷ
    ίϯςφΛ௥Ճ͢Δ
    1536/6144
    = 25%
    ໿8.3% ໿16.7% ໿33.3%
    ໿16.7%

    View Slide

  37. cpu.shares
    • cpu.shares ͸શͯͷίϯςφ͕ϑϧͰ CPU Resource Λ࢖༻͠Α͏ͱͨ͠ࡍͷ഑෼
    • cpu.shares ʹج͍ܾͮͯΊΒΕͨCPU Resource ͷ഑෼͸อূ͞ΕΔ
    • Pod ͷ CPU Requests ʹهࡌͨ͠෼ͷ CPU Resource ͸อূ͞ΕΔ
    • ΞΠυϧͷίϯςφ͕ଘࡏ͢Δ৔߹͸ۭ͍ͨ෼ͷ CPU Λ
    ଞͷίϯςφ͕࢖༻͢Δ͜ͱ͕Ͱ͖Δ
    • Pod ͸ CPU Limits ʹୡ͍ͯ͠ͳ͚Ε͹༨͍ͬͯΔ CPU Resource Λ
    ͍͍ײ͡ʹ࢖͏͜ͱ͕Ͱ͖Δ

    View Slide

  38. CPU Limits

    View Slide

  39. CPU Limits
    • CPU Limits ͸ CFS (Completely Fair Scheduler) ͷ
    ϝΧχζϜΛ༻͍࣮ͯݱ͞Ε͍ͯΔ
    • cpu.cfs_period_us ͱ cpu.cfs_quota_us ʹઃఆ͕هࡌ͞Ε͍ͯΔ

    View Slide

  40. cpu.cfs_period_us / cpu.cfs_quota_us
    • Requests ʹ ༻͍ΒΕ͍ͯΔ cpu.shares ͱ͸ҟͳΓ Limits ʹ༻͍Β
    Ε͍ͯΔ CPU ͷׂ౰͸ظؒʹج͍͍ͮͯΔ
    • cpu.cfs_period_us
    • ظؒͷఆٛ, σϑΥϧτ͸ 100000us(100ms)
    • cpu.cfs_quota_us
    • ظؒ಺Ͱ࣮ߦͰ͖Δ߹ܭ࣌ؒ

    View Slide

  41. cpu.cfs_period_us / cpu.cfs_quota_us
    • CPU Limits ͸ cpu.cfs_quota_us ʹม׵͞ΕΔ
    • Լͷྫͩͱ 100ms ͷظؒத 50ms,
    CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ

    View Slide

  42. cpu.cfs_period_us / cpu.cfs_quota_us
    • CPU Limits: 2000m ͷΑ͏ʹࢦఆͨ͠৔߹͸
    cpu.cfs_period_us: 100000
    cpu.cfs_quota_us: 200000 ͱͳΔ

    View Slide

  43. cpu.cfs_period_us / cpu.cfs_quota_us
    • CPU Limits ͕ͳ͍৔߹ ͸ cpu.cfs_quota_us ʹ͸ -1 ͕ઃఆ͞ΕΔ
    • -1 ͕ઃఆ͞ΕΔͱແ੍ݶʹͳΔ

    View Slide

  44. Time (ms)
    cpu.cfs_period_us / cpu.cfs_quota_us
    • ॲཧΛ׬ྃ͢ΔͨΊʹ 250ms ͷॲཧ͕࣌ؒඞཁͳ
    γϯάϧεϨουͷΞϓϦέʔγϣϯ
    250ms
    100ms 200ms 300ms 400ms

    View Slide

  45. Time (ms)
    cpu.cfs_period_us / cpu.cfs_quota_us
    • cpu.cfs_period_us: 100000
    cpu.cfs_quota_us: 50000 ͷ৔߹
    50ms
    100ms 200ms 300ms 400ms
    50ms 50ms 50ms 50ms
    throttled throttled throttled throttled

    View Slide

  46. Time (ms)
    cpu.cfs_period_us / cpu.cfs_quota_us
    • cpu.cfs_period_us: 100000
    cpu.cfs_quota_us: 50000 ͷ৔߹
    50ms
    100ms 200ms 300ms 400ms
    50ms 50ms 50ms 50ms
    throttled throttled throttled throttled
    50ms
    100ms 200ms 300ms 400ms
    50ms 50ms 50ms 50ms
    throttled throttled throttled throttled
    Time (ms)
    Throttling ͞ΕΔͷͰ
    ׬ྃ·Ͱʹ 450 ms ͔͔ͬͯ͠·͏

    View Slide

  47. Kernel Bugs

    View Slide

  48. Kernel Bugs
    • Kernel ʹ CPU ͷ࢖༻཰͕௿ͯ͘΋ίϯςφ͕ Throttling ͞ΕΔόά͕ଘࡏͨ͠
    • Indeed ͷਓ͕ॻ͍ͨϒϩάʹৄࡉ͔Β patch Λग़ͨ͠࿩·Ͱৄ͘͠ॻ͍ͯ͋Δ
    • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά
    https://jp.engineering.indeedblog.com/blog/2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0%e3%81%ae%e8%a7%a3%e9%99%a4-
    %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85%b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/

    View Slide

  49. Checking the Throttling

    View Slide

  50. Checking the Throttling
    • cpu.stat Λ֬ೝ͢Δ
    • nr_periods: ܦաࡁΈͷظؒͷ਺
    • nr_throttled: nr_periods த Throttling ͷର৅ͱͳͬͨظؒͷ਺
    • throttled_time: ૯ Throttle ࣌ؒ (ns)

    View Slide

  51. Checking the Throttling
    • Datadog Ͱ͸ kubernetes.cpu.cfs.throttled.seconds
    ͱ͍͏ϝτϦΫε͕͋Δ

    View Slide

  52. The Impact of CPU Throttling

    View Slide

  53. The Impact of CPU Throttling
    • Error Rate ΍ Response Time ͷѱԽΛট͘͜ͱ͕͋Δ
    • github.com/hjacobs/kubernetes-failure-stories
    • Kubernetes ʹؔ͢Δো֐ࣄྫΛ·ͱΊͨ Repository
    • CPU Throttling ͷࣄྫ͕͍͔ͭ͘ొ৔͢Δ

    View Slide

  54. CPU Throttling the Application Pod in Quipper
    CPU Limits
    CPU Requests
    CPU throttled
    CPU Usage

    View Slide

  55. Why was the CPU Throttling gone?
    ͋ΔมߋΛՃ͑ͯҎ߱
    CPU Throttling ͕ͳ͘ͳͬͨ

    View Slide

  56. Fix

    View Slide

  57. Fix
    • CPU Limits Λফͨ͠

    View Slide

  58. Fix
    CPU Limits
    ͜͜Ͱ Limit Λফͨ͠
    ͦΕҎ߱
    CPU Throttling ͕ͳ͘ͳͬͨ

    View Slide

  59. Remove CPU Limits
    • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋
    ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
    • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸
    جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ
    • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍
    • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ

    View Slide

  60. Remove CPU Limits
    • ฐࣾͰ͸͜͏͍͏ϙϦγʔͰ Pod ͷ Resource Λઃఆ͠Α͏
    ͱ͍͏υΩϡϝϯτΛهࡌͯ͠ӡ༻͍ͯ͠Δ
    • CPU Limits ͸جຊෆཁͩΑͱ͍͏هࡌΛ͍ͯ͠Δ
    • CPU Limits Λ࡟আ͢Δ͜ͱʹڧ੍ྗΛಇ͔ͤΔ͜ͱ͸·͍ͩͯ͠ͳ͍
    • શͯͷ Pod ͔Β CPU Limits Λ࡟আͨ͠Θ͚Ͱ͸ແ͍͕
    CPU Throttling ͷϝτϦΫεΛ watch ͍ͯͯ͠
    Throttling ͕ൃੜ͢Ε͹౎౓ରԠΛ͍ͯ͠Δ

    View Slide

  61. Remove CPU Limits
    • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋
    ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
    • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸
    جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ
    • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍
    • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ

    View Slide

  62. Remove CPU Limits
    • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋
    ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
    • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸
    جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ
    • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍
    • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
    CPU Limits ͕ͳ͍ͱ QoS ͕
    Guaranteed ʹͳΒͳ͍

    View Slide

  63. QoS Class
    • Eviction ͷ࣌ʹߟྀ͞ΕΔ Pod ͷ༏ઌ౓
    • Guaranteed
    • Requests/Limits ͕Ұக
    • Burstable
    • Requests/Limits ͕ෆҰக
    • BestEffort
    • Requests/Limits ͕ͳʹ΋ͳ͍

    View Slide

  64. QoS Class
    • ฐࣾͰ͸ݱঢ় CPU Limits Λ࡟আͨ͜͠ͱͰ QoS Class ͕ Burstable ʹ
    ઃఆ͞ΕΔ͜ͱͰͳʹ͔ࠔ͍ͬͯΔ͜ͱ͸ͳ͍
    • ݩʑ Guaranteed ͳ Pod ͕ຆͲͳ͔ͬͨͱ͍͏ͷ͸͋Δ
    • Eviction ͸ͦ΋ͦ΋͕ͦΜͳʹൃੜ͍ͯ͠ͳ͍
    • ࠔΔέʔε͸ଘࡏ͢Δ

    View Slide

  65. Static CPU Manager Policy
    • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ
    • kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ
    ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ
    • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅
    • Pod ͷ QoS Class ͕ Guaranteed
    • Pod ͷ CPU Requests ͕੔਺஋

    View Slide

  66. Static CPU Manager Policy
    • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ
    • kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ
    ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ
    • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅
    • Pod ͷ QoS Class ͕ Guaranteed
    • Pod ͷ CPU Requests ͕੔਺஋

    View Slide

  67. cpu-cfs-quota=false
    • kubelet ͷ Option Ͱ cpu-cfs-quota=false Λ౉͢
    • CFS Quota ͕ແޮʹͳΔͷͰ CPU Limits ʹΑΔ
    Throttle ͕ى͖ͳ͘ͳΔ
    • CPU Limits ࣗମ͸ࢦఆͰ͖ΔͷͰ QoS Class ͸ Guaranteed ʹ
    ͢Δ͜ͱ͕Ͱ͖Δ

    View Slide

  68. cpu-cfs-quota=false
    • --cpu-manager-policy=static —cpu-cfs-quota=false
    Λ౉ͯ͠ kubelet Λىಈ͢Δ
    (ผͰ —kube-reserved ͱ —system-reserved Λࢦఆ͍ͯ͠Δ)
    • QoS ͕ Guaranteed ͔ͭ CPU Requests ͕੔਺஋ʹ
    ͳΔΑ͏ͳ Pod Λ༻ҙ
    • 4 Core ͷ Node த 1 Core Λ Nginx Pod ʹઐ༗ͤ͞Δ

    View Slide

  69. cpu-cfs-quota=false
    • Pod ͷ QoS Class ͕ Guaranteed ͔ͭ cpu.cfs_quota_us ͕ -1 (ແ੍ݶ)
    ʹͳ͍ͬͯΔ

    View Slide

  70. • cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
    Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ
    Static CPU Manager Policy

    View Slide

  71. • cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
    Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ
    Static CPU Manager Policy
    cpuset σΟϨΫτϦ
    cpuset.cpus ϑΝΠϧʹ
    ίϯςφ͕ΞΫηεͰ͖Δ CPU ͷ൪߸͕ॻ͔Ε͍ͯΔ

    View Slide

  72. Static CPU Manager Policy
    CPU 0 CPU 1 CPU 2 CPU 3
    $PSF
    cpuset.cpus = 2
    Pod ͸ 2 ൪ͷ CPU ͚ͩ࢖͑Δ

    View Slide

  73. • ద౰ͳଞͷ Burstable ͳ Pod ͷ cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
    2 ൪ͷ CPU ͚͕ͩআ֎͞Ε͍ͯΔ
    Static CPU Manager Policy
    0 ~ 1 ൪ͱ 3 ൪ͷ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ
    (Nginx Pod ͕࢖༻͍ͯ͠Δ 2 ൪͸আ֎͞Ε͍ͯΔ)

    View Slide

  74. Static CPU Manager Policy
    CPU 0 CPU 1 CPU 2 CPU 3
    $PSF
    cpuset.cpus = 0-1, 3
    Pod ͸ 0, 1, 3 ൪ͷ CPU ͚ͩ࢖͑Δ

    View Slide

  75. cpu-cfs-quota=false
    • ஫ҙ఺ͱͯ͠͸ kubelet ͷ Option Ͱࢦఆ͢ΔͷͰ
    Node ୯ҐͰͷ༗ޮԽ/ແޮԽʹͳΔ
    • ༗ޮԽ͢Δ Node ͰͲͷ Pod ͕ಈ͍͍ͯΔ͔ͱ͍͏֬ೝΛ͢Δͱ⭕

    View Slide

  76. Other Workarounds
    • Pod ͷ CPU Limits Λ૿΍͢
    • ΞϓϦέʔγϣϯͷ Thread ΛݮΒ͢

    View Slide

  77. Bood Bye Throttling

    View Slide

  78. Conclusion

    View Slide

  79. Conclusion
    • CPU Throttling ʹΑͬͯΞϓϦέʔγϣϯͷύϑΥʔϚϯε΍
    ৴པੑʹѱӨڹ͕ͰΔ͜ͱ͕͋Δ
    • CPU Throttling ͷϞϦλϦϯάΛ͠Α͏
    • CPU Limits ͷ࡟আ΍ CFS Quota Λແޮʹ͢Δ͜ͱͰ
    CPU Throttling ΁ͷରࡦΛߦ͏͜ͱ͕Ͱ͖Δ

    View Slide

  80. Thank You for Listening

    View Slide

  81. References
    • Understanding resource limits in kubernetes: cpu time
    • https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-
    time-9eff74d3161b
    • CPU limits and aggressive throttling in Kubernetes - Omio Engineering - Medium
    • https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes-
    c5b20bd8a718
    • Understanding Linux Container Scheduling — Squarespace / Engineering
    • https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling

    View Slide

  82. References
    • DockerίϯςφͰར༻Ͱ͖ΔϦιʔε΍ݖݶΛ੍ݶ͢ΔʢDockerͷ࠷৽ػೳΛ࢖ͬͯΈΑ͏ɿୈ3
    ճʣ | ͘͞ΒͷφϨοδ
    • https://knowledge.sakura.ad.jp/5118/
    • OpenShiftͷResource requestͱlimit - nekop's blog
    • https://nekop.hatenablog.com/entry/2017/12/20/182523
    • How to Evolve Kubernetes Resource Management Model
    • https://www.infoq.com/presentations/evolve-kubernetes-resource-manager/

    View Slide

  83. References
    • 3.2. cpu Red Hat Enterprise Linux 6 | Red Hat Customer Portal
    • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/
    resource_management_guide/sec-cpu
    • 3.4. cpuset Red Hat Enterprise Linux 6 | Red Hat Customer Portal
    • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/
    resource_management_guide/sec-cpuset
    • ୈ4ষ CPU Ϛωʔδϟʔͷ࢖༻ OpenShift Container Platform 4.1 | Red Hat Customer Portal
    • https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.1/html/
    scalability_and_performance/using-cpu-manager

    View Slide

  84. References
    • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά
    • https://jp.engineering.indeedblog.com/blog/
    2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0
    %e8%a7%a3%e9%99%a4-
    %e3%82%af%e3%83%a9%e3%82%a6%e3%83%89%e3%81%ab%e3%81%8a%e3%81%91%e3%82%
    8b-cpu-%e3%81%ae%e5%88%b6%e9%99%90%e3%81%ae/
    • εϩοτϦϯάͷղআ: ༗ޮͳमਖ਼͕ෆ۩߹ͷݪҼʹͳͬͯ͠·ͬͨཧ༝ - Indeed ΤϯδχΞϦϯάɾϒϩά
    • https://jp.engineering.indeedblog.com/blog/
    2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0
    %e3%81%ae%e8%a7%a3%e9%99%a4-
    %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85%
    b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/

    View Slide

  85. References
    • Kubernetes best practices: Resource requests and limits | Google Cloud Blog
    • https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-
    and-limits
    • community/resource-qos.md at master · kubernetes/community
    • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/
    node/resource-qos.md#qos-classes
    • community/resources.md at master · kubernetes/community
    • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/
    scheduling/resources.md

    View Slide

  86. References
    • Control CPU Management Policies on the Node - Kubernetes
    • https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/
    • Feature Highlight: CPU Manager - Kubernetes
    • https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/
    • kubelet - Kubernetes
    • https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
    • Reserve Compute Resources for System Daemons - Kubernetes
    • https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

    View Slide

  87. References
    • CFS quotas can lead to unnecessary throttling · Issue #67577 · kubernetes/kubernetes
    • https://github.com/kubernetes/kubernetes/issues/67577
    • Throttling: New Developments in Application Performance with CPU Limits - Dave Chiluk, Indeed -
    YouTube
    • https://www.youtube.com/watch?v=UE7QX98-kO0
    • Throttling CPU usage with Linux cgroups
    • http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html
    • hjacobs/kubernetes-failure-stories: Compilation of public failure/horror stories related to Kubernetes
    • https://github.com/hjacobs/kubernetes-failure-stories

    View Slide