Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding CPU throttling in Kubernetes to improve application performance #k8sjp

d-kuro
June 13, 2020

Understanding CPU throttling in Kubernetes to improve application performance #k8sjp

d-kuro

June 13, 2020
Tweet

More Decks by d-kuro

Other Decks in Technology

Transcript

  1. Understanding CPU throttling
    in Kubernetes
    to improve application performance
    2020/06/13 KubeFest Tokyo 2020 @ponde_m

    View full-size slide

  2. I work at
    @ponde_m @d-kuro

    View full-size slide

  3. Introduction
    • Kubernetes ͷ Resource Management ʹ͍ͭͯ৭ʑ࿩͠·͕͢
    ࠓճ͸ CPU ʹϑΥʔΧεͨ͠࿩Λ͠·͢
    • Memory ʹ͍ͭͯͷ࿩͸͠·ͤΜ

    View full-size slide

  4. Agenda
    • Resource Management in Kubernetes
    • How Kubernetes Resource Control is implemented
    • Checking the Throttling
    • CPU Throttling Impacts and Fixes

    View full-size slide

  5. Understanding CPU throttling
    in Kubernetes
    to improve application performance
    2020/06/13 KubeFest Tokyo 2020 @ponde_m
    Understanding CPU throttling
    in Kubernetes
    to improve application performance

    View full-size slide

  6. What is Throttling?

    View full-size slide

  7. What is Throttling?
    • ฉ͍ͨ͜ͱ͸͋Δ͕งғؾͰ࢖͍ͬͯΔ
    • ͳΜ੍͔ݶ͞Εͯͦ͏͕ͩݴޠԽ͕Ͱ͖ͳ͍

    View full-size slide

  8. What is Throttling?
    • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘
    ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ
    หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห
    ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle
    lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas
    pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε
    Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ
    https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB

    View full-size slide

  9. What is Throttling?
    • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘
    ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ
    หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห
    ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle
    lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas
    pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε
    Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ
    https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB

    View full-size slide

  10. What is Throttling?
    • ͳͷͰࠓճͷςʔϚΛݴ͍׵͑Δͱ
    • “CPU ͷྲྀྔ੍ޚʹ͍ͭͯཧղͯ͠
    ΞϓϦέʔγϣϯͷύϑΥʔϚϯεΛ޲্ͤ͞Α͏”
    • ͱ͍͏ײ͡ʹͳΔ͔ͳͱࢥ͍ͬͯ·͢

    View full-size slide

  11. What is Throttling?
    • Throttling CPU usage with Linux cgroups
    http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html

    View full-size slide

  12. Resource Management

    View full-size slide

  13. Resource Management
    • Kubernetes Ͱ͸ Pod ʹׂΓ౰ͯΔ
    Resource Λࢦఆ͢Δ͜ͱ͕Ͱ͖Δ

    View full-size slide

  14. Requests
    • ࠷௿ݶ Pod ʹׂΓ౰ͯΔ Resource

    View full-size slide

  15. Limits
    • Pod ʹׂΓ౰ͯΔ Resource ͷ্ݶ
    • Limits ʹୡ͢Δͱ CPU ͕ throttle ͞ΕΔ

    View full-size slide

  16. Millicores
    • 1000 millicores == 1 Core
    • 2 Core ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹
    • 2000m
    • 1 Core ͷ ¼ ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹
    • 250m

    View full-size slide

  17. Scheduling
    • Node ΁ͷ Pod ͷ Scheduling ͸ Requests ΛݩʹߦΘΕΔ
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    resources:
    requests:
    cpu: 1000m
    limits:
    cpu: 2000m
    Schedule
    $PSF
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    $PSF

    View full-size slide

  18. Scheduling
    • Node ͷ Resource ΑΓ΋େ͖͍ Requests ͩͱ Pod ͕ schedule ͞Εͳ͍
    resources:
    requests:
    cpu: 1000m
    limits:
    cpu: 2000m
    schedule
    resources:
    requests:
    cpu: 5000m
    limits:
    cpu: 6000m
    schedule
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    $PSF
    $PSF

    View full-size slide

  19. Scheduling
    • Limits ͸ Scheduling ͷࡍʹߟྀ͞Εͳ͍ͷͰ
    Pod ͕Limits ·Ͱ Resource Λ࢖͑Δอূ͸ͳ͍
    Node ͷར༻Մೳͳ
    Resource Λ্ճΔࣄ͕͋Δ
    (overcommitted state)
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    $PSF
    Limits
    2000m

    View full-size slide

  20. Overcommitted State
    Limits
    2000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    Requests
    1000m
    • Overcommitted State ʹͳͬͨࡍʹͲͷΑ͏ʹৼΔ෣͏ͷ͔
    • CPU ͷ৔߹͸ CPU Requests ͷ෼ͷ CPU Λऔಘͯ͠࢒ΓΛ throttle ͤ͞Δ
    throttle!!
    CPU ͸ѹॖՄೳͳ
    Resource
    https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/resources.md
    $PSF

    View full-size slide

  21. Tips: Capacity and Allocatable
    • Node ͷ status ʹ͸ Capacity ͱ Allocatable ͱ͍͏2ͭͷ߲໨͕͋Δ
    • Allocatable
    • Pod Λಈ͔ͨ͢Ίʹ࢖༻Մೳͳ Resource
    • Capacity
    • Node શମͰ࢖༻Մೳͳ Resource
    • kubectl describe node ͱ͔Ͱ֬ೝͰ͖Δ

    View full-size slide

  22. Tips: Capacity and Allocatable
    • kube-reserved
    • kubelet ͳͲͷ Node ͷίϯϙʔωϯτ༻ʹ
    ༧໿͞Εͨ Resource
    • system-reserved
    • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource
    • eviction-thresholds
    • Eviction ͷᮢ஋

    View full-size slide

  23. Tips: Capacity and Allocatable
    • kube-reserved
    • kubelet ͳͲͷ Node ͷίϯϙʔωϯτ༻ʹ
    ༧໿͞Εͨ Resource
    • system-reserved
    • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource
    • eviction-thresholds
    • Eviction ͷᮢ஋
    /PEF$BQBDJUZ
    LVCFSFTFSWFETZTUFNSFTFSWFEFWJDUJPOUISFTIPME
    "MMPDBUBCMF

    View full-size slide

  24. Tips: Capacity and Allocatable
    • kube-reserved ͱ system-reserved ʹ
    ؔͯ͠͸σϑΥϧτ஋͕ͳ͍
    • Scheduler ͸ kubelet ͳͲͷγεςϜίϯϙʔωϯτ͕
    ࢖༻͢Δ Resource ͷ͜ͱΛߟྀͤͣʹ Pod Λ Schedule ͢Δ
    • Node ͕ෆ҆ఆʹͳΔՄೳੑ͕͋ΔͷͰઃఆ͢Δͱ ⭕
    • ֤छύϒϦοΫΫϥ΢υͷ Managed Kubernetes Ͱ͸
    ͍͍ײ͡ʹઃఆ͞Ε͍ͯͨΓ͢Δ͜ͱ΋͋Δ

    View full-size slide

  25. How Kubernetes
    Resource Control is implemented

    View full-size slide

  26. How Kubernetes
    Resource Control is implemented
    • Kubernetes Ͱ͸ cgroup Λ༻͍ͯ Resource ͷׂ౰Λ੍ޚ͍ͯ͠Δ
    • ࠨͷ Pod Λ࢖ͬͯৄࡉΛݟ͍ͯ͘

    View full-size slide

  27. CPU Requests

    View full-size slide

  28. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View full-size slide

  29. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View full-size slide

  30. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View full-size slide

  31. CPU Requests
    • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

    View full-size slide

  32. CPU Requests

    View full-size slide

  33. CPU Requests
    • CPU Requests ͸ cpu.shares ͱ͍͏ϑΝΠϧʹ൓ө͞ΕΔ
    • 250 ͱ 256 Ͱ਺ࣈ͕ζϨ͍ͯΔͷ͸ cgroup ͕ CPU 1 Core Λ
    1024 ʹ෼ׂ͢Δ͕ Kubernetes Ͱ͸ 1000 ʹ෼ׂ͍ͯ͠ΔͨΊ

    View full-size slide

  34. cpu.shares
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    1536
    512
    cpu.shares
    1024
    cpu.shares
    1024
    25% 25% 25% 25%
    25%
    25%
    37.5% 12.5%
    cpu.shares ͕
    ίϯςφؒͰશͯಉ͡৔߹
    ۉ౳ʹ CPU Resource͕഑෼͞ΕΔ
    cpu.shares ͕
    ίϯςφؒͰҟͳΔ৔߹͸
    ൺ཰ʹԠͯ͡ CPU Resource ͕
    ഑෼͞ΕΔ
    • CPU ഑෼ͷॏΈ
    • cpu.share ͷ߹ܭʹରͯ͠഑෼͕ܭࢉ͞ΕΔͨΊ૬ରతͳ஋

    View full-size slide

  35. cpu.shares
    1536
    512
    cpu.shares
    1024
    cpu.shares
    1024
    cpu.shares
    cpu.shares
    1536
    512
    cpu.shares
    1024
    cpu.shares
    1024
    25%
    25%
    1536/4096
    = 37.5%
    12.5%
    cpu.shares
    2048
    cpu.shares 2048 ͷ
    ίϯςφΛ௥Ճ͢Δ
    1536/6144
    = 25%
    ໿8.3% ໿16.7% ໿33.3%
    ໿16.7%

    View full-size slide

  36. cpu.shares
    • cpu.shares ͸શͯͷίϯςφ͕ϑϧͰ CPU Resource Λ࢖༻͠Α͏ͱͨ͠ࡍͷ഑෼
    • cpu.shares ʹج͍ܾͮͯΊΒΕͨCPU Resource ͷ഑෼͸อূ͞ΕΔ
    • Pod ͷ CPU Requests ʹهࡌͨ͠෼ͷ CPU Resource ͸อূ͞ΕΔ
    • ΞΠυϧͷίϯςφ͕ଘࡏ͢Δ৔߹͸ۭ͍ͨ෼ͷ CPU Λ
    ଞͷίϯςφ͕࢖༻͢Δ͜ͱ͕Ͱ͖Δ
    • Pod ͸ CPU Limits ʹୡ͍ͯ͠ͳ͚Ε͹༨͍ͬͯΔ CPU Resource Λ
    ͍͍ײ͡ʹ࢖͏͜ͱ͕Ͱ͖Δ

    View full-size slide

  37. CPU Limits
    • CPU Limits ͸ CFS (Completely Fair Scheduler) ͷ
    ϝΧχζϜΛ༻͍࣮ͯݱ͞Ε͍ͯΔ
    • cpu.cfs_period_us ͱ cpu.cfs_quota_us ʹઃఆ͕هࡌ͞Ε͍ͯΔ

    View full-size slide

  38. cpu.cfs_period_us / cpu.cfs_quota_us
    • Requests ʹ ༻͍ΒΕ͍ͯΔ cpu.shares ͱ͸ҟͳΓ Limits ʹ༻͍Β
    Ε͍ͯΔ CPU ͷׂ౰͸ظؒʹج͍͍ͮͯΔ
    • cpu.cfs_period_us
    • ظؒͷఆٛ, σϑΥϧτ͸ 100000us(100ms)
    • cpu.cfs_quota_us
    • ظؒ಺Ͱ࣮ߦͰ͖Δ߹ܭ࣌ؒ

    View full-size slide

  39. cpu.cfs_period_us / cpu.cfs_quota_us
    • CPU Limits ͸ cpu.cfs_quota_us ʹม׵͞ΕΔ
    • Լͷྫͩͱ 100ms ͷظؒத 50ms,
    CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ

    View full-size slide

  40. cpu.cfs_period_us / cpu.cfs_quota_us
    • CPU Limits: 2000m ͷΑ͏ʹࢦఆͨ͠৔߹͸
    cpu.cfs_period_us: 100000
    cpu.cfs_quota_us: 200000 ͱͳΔ

    View full-size slide

  41. cpu.cfs_period_us / cpu.cfs_quota_us
    • CPU Limits ͕ͳ͍৔߹ ͸ cpu.cfs_quota_us ʹ͸ -1 ͕ઃఆ͞ΕΔ
    • -1 ͕ઃఆ͞ΕΔͱແ੍ݶʹͳΔ

    View full-size slide

  42. Time (ms)
    cpu.cfs_period_us / cpu.cfs_quota_us
    • ॲཧΛ׬ྃ͢ΔͨΊʹ 250ms ͷॲཧ͕࣌ؒඞཁͳ
    γϯάϧεϨουͷΞϓϦέʔγϣϯ
    250ms
    100ms 200ms 300ms 400ms

    View full-size slide

  43. Time (ms)
    cpu.cfs_period_us / cpu.cfs_quota_us
    • cpu.cfs_period_us: 100000
    cpu.cfs_quota_us: 50000 ͷ৔߹
    50ms
    100ms 200ms 300ms 400ms
    50ms 50ms 50ms 50ms
    throttled throttled throttled throttled

    View full-size slide

  44. Time (ms)
    cpu.cfs_period_us / cpu.cfs_quota_us
    • cpu.cfs_period_us: 100000
    cpu.cfs_quota_us: 50000 ͷ৔߹
    50ms
    100ms 200ms 300ms 400ms
    50ms 50ms 50ms 50ms
    throttled throttled throttled throttled
    50ms
    100ms 200ms 300ms 400ms
    50ms 50ms 50ms 50ms
    throttled throttled throttled throttled
    Time (ms)
    Throttling ͞ΕΔͷͰ
    ׬ྃ·Ͱʹ 450 ms ͔͔ͬͯ͠·͏

    View full-size slide

  45. Kernel Bugs
    • Kernel ʹ CPU ͷ࢖༻཰͕௿ͯ͘΋ίϯςφ͕ Throttling ͞ΕΔόά͕ଘࡏͨ͠
    • Indeed ͷਓ͕ॻ͍ͨϒϩάʹৄࡉ͔Β patch Λग़ͨ͠࿩·Ͱৄ͘͠ॻ͍ͯ͋Δ
    • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά
    https://jp.engineering.indeedblog.com/blog/2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0%e3%81%ae%e8%a7%a3%e9%99%a4-
    %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85%b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/

    View full-size slide

  46. Checking the Throttling

    View full-size slide

  47. Checking the Throttling
    • cpu.stat Λ֬ೝ͢Δ
    • nr_periods: ܦաࡁΈͷظؒͷ਺
    • nr_throttled: nr_periods த Throttling ͷର৅ͱͳͬͨظؒͷ਺
    • throttled_time: ૯ Throttle ࣌ؒ (ns)

    View full-size slide

  48. Checking the Throttling
    • Datadog Ͱ͸ kubernetes.cpu.cfs.throttled.seconds
    ͱ͍͏ϝτϦΫε͕͋Δ

    View full-size slide

  49. The Impact of CPU Throttling

    View full-size slide

  50. The Impact of CPU Throttling
    • Error Rate ΍ Response Time ͷѱԽΛট͘͜ͱ͕͋Δ
    • github.com/hjacobs/kubernetes-failure-stories
    • Kubernetes ʹؔ͢Δো֐ࣄྫΛ·ͱΊͨ Repository
    • CPU Throttling ͷࣄྫ͕͍͔ͭ͘ొ৔͢Δ

    View full-size slide

  51. CPU Throttling the Application Pod in Quipper
    CPU Limits
    CPU Requests
    CPU throttled
    CPU Usage

    View full-size slide

  52. Why was the CPU Throttling gone?
    ͋ΔมߋΛՃ͑ͯҎ߱
    CPU Throttling ͕ͳ͘ͳͬͨ

    View full-size slide

  53. Fix
    • CPU Limits Λফͨ͠

    View full-size slide

  54. Fix
    CPU Limits
    ͜͜Ͱ Limit Λফͨ͠
    ͦΕҎ߱
    CPU Throttling ͕ͳ͘ͳͬͨ

    View full-size slide

  55. Remove CPU Limits
    • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋
    ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
    • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸
    جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ
    • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍
    • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ

    View full-size slide

  56. Remove CPU Limits
    • ฐࣾͰ͸͜͏͍͏ϙϦγʔͰ Pod ͷ Resource Λઃఆ͠Α͏
    ͱ͍͏υΩϡϝϯτΛهࡌͯ͠ӡ༻͍ͯ͠Δ
    • CPU Limits ͸جຊෆཁͩΑͱ͍͏هࡌΛ͍ͯ͠Δ
    • CPU Limits Λ࡟আ͢Δ͜ͱʹڧ੍ྗΛಇ͔ͤΔ͜ͱ͸·͍ͩͯ͠ͳ͍
    • શͯͷ Pod ͔Β CPU Limits Λ࡟আͨ͠Θ͚Ͱ͸ແ͍͕
    CPU Throttling ͷϝτϦΫεΛ watch ͍ͯͯ͠
    Throttling ͕ൃੜ͢Ε͹౎౓ରԠΛ͍ͯ͠Δ

    View full-size slide

  57. Remove CPU Limits
    • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋
    ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
    • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸
    جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ
    • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍
    • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ

    View full-size slide

  58. Remove CPU Limits
    • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋
    ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ
    • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸
    جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ
    • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍
    • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ
    CPU Limits ͕ͳ͍ͱ QoS ͕
    Guaranteed ʹͳΒͳ͍

    View full-size slide

  59. QoS Class
    • Eviction ͷ࣌ʹߟྀ͞ΕΔ Pod ͷ༏ઌ౓
    • Guaranteed
    • Requests/Limits ͕Ұக
    • Burstable
    • Requests/Limits ͕ෆҰக
    • BestEffort
    • Requests/Limits ͕ͳʹ΋ͳ͍

    View full-size slide

  60. QoS Class
    • ฐࣾͰ͸ݱঢ় CPU Limits Λ࡟আͨ͜͠ͱͰ QoS Class ͕ Burstable ʹ
    ઃఆ͞ΕΔ͜ͱͰͳʹ͔ࠔ͍ͬͯΔ͜ͱ͸ͳ͍
    • ݩʑ Guaranteed ͳ Pod ͕ຆͲͳ͔ͬͨͱ͍͏ͷ͸͋Δ
    • Eviction ͸ͦ΋ͦ΋͕ͦΜͳʹൃੜ͍ͯ͠ͳ͍
    • ࠔΔέʔε͸ଘࡏ͢Δ

    View full-size slide

  61. Static CPU Manager Policy
    • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ
    • kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ
    ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ
    • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅
    • Pod ͷ QoS Class ͕ Guaranteed
    • Pod ͷ CPU Requests ͕੔਺஋

    View full-size slide

  62. Static CPU Manager Policy
    • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ
    • kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ
    ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ
    • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅
    • Pod ͷ QoS Class ͕ Guaranteed
    • Pod ͷ CPU Requests ͕੔਺஋

    View full-size slide

  63. cpu-cfs-quota=false
    • kubelet ͷ Option Ͱ cpu-cfs-quota=false Λ౉͢
    • CFS Quota ͕ແޮʹͳΔͷͰ CPU Limits ʹΑΔ
    Throttle ͕ى͖ͳ͘ͳΔ
    • CPU Limits ࣗମ͸ࢦఆͰ͖ΔͷͰ QoS Class ͸ Guaranteed ʹ
    ͢Δ͜ͱ͕Ͱ͖Δ

    View full-size slide

  64. cpu-cfs-quota=false
    • --cpu-manager-policy=static —cpu-cfs-quota=false
    Λ౉ͯ͠ kubelet Λىಈ͢Δ
    (ผͰ —kube-reserved ͱ —system-reserved Λࢦఆ͍ͯ͠Δ)
    • QoS ͕ Guaranteed ͔ͭ CPU Requests ͕੔਺஋ʹ
    ͳΔΑ͏ͳ Pod Λ༻ҙ
    • 4 Core ͷ Node த 1 Core Λ Nginx Pod ʹઐ༗ͤ͞Δ

    View full-size slide

  65. cpu-cfs-quota=false
    • Pod ͷ QoS Class ͕ Guaranteed ͔ͭ cpu.cfs_quota_us ͕ -1 (ແ੍ݶ)
    ʹͳ͍ͬͯΔ

    View full-size slide

  66. • cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
    Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ
    Static CPU Manager Policy

    View full-size slide

  67. • cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
    Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ
    Static CPU Manager Policy
    cpuset σΟϨΫτϦ
    cpuset.cpus ϑΝΠϧʹ
    ίϯςφ͕ΞΫηεͰ͖Δ CPU ͷ൪߸͕ॻ͔Ε͍ͯΔ

    View full-size slide

  68. Static CPU Manager Policy
    CPU 0 CPU 1 CPU 2 CPU 3
    $PSF
    cpuset.cpus = 2
    Pod ͸ 2 ൪ͷ CPU ͚ͩ࢖͑Δ

    View full-size slide

  69. • ద౰ͳଞͷ Burstable ͳ Pod ͷ cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ
    2 ൪ͷ CPU ͚͕ͩআ֎͞Ε͍ͯΔ
    Static CPU Manager Policy
    0 ~ 1 ൪ͱ 3 ൪ͷ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ
    (Nginx Pod ͕࢖༻͍ͯ͠Δ 2 ൪͸আ֎͞Ε͍ͯΔ)

    View full-size slide

  70. Static CPU Manager Policy
    CPU 0 CPU 1 CPU 2 CPU 3
    $PSF
    cpuset.cpus = 0-1, 3
    Pod ͸ 0, 1, 3 ൪ͷ CPU ͚ͩ࢖͑Δ

    View full-size slide

  71. cpu-cfs-quota=false
    • ஫ҙ఺ͱͯ͠͸ kubelet ͷ Option Ͱࢦఆ͢ΔͷͰ
    Node ୯ҐͰͷ༗ޮԽ/ແޮԽʹͳΔ
    • ༗ޮԽ͢Δ Node ͰͲͷ Pod ͕ಈ͍͍ͯΔ͔ͱ͍͏֬ೝΛ͢Δͱ⭕

    View full-size slide

  72. Other Workarounds
    • Pod ͷ CPU Limits Λ૿΍͢
    • ΞϓϦέʔγϣϯͷ Thread ΛݮΒ͢

    View full-size slide

  73. Bood Bye Throttling

    View full-size slide

  74. Conclusion
    • CPU Throttling ʹΑͬͯΞϓϦέʔγϣϯͷύϑΥʔϚϯε΍
    ৴པੑʹѱӨڹ͕ͰΔ͜ͱ͕͋Δ
    • CPU Throttling ͷϞϦλϦϯάΛ͠Α͏
    • CPU Limits ͷ࡟আ΍ CFS Quota Λແޮʹ͢Δ͜ͱͰ
    CPU Throttling ΁ͷରࡦΛߦ͏͜ͱ͕Ͱ͖Δ

    View full-size slide

  75. Thank You for Listening

    View full-size slide

  76. References
    • Understanding resource limits in kubernetes: cpu time
    • https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-
    time-9eff74d3161b
    • CPU limits and aggressive throttling in Kubernetes - Omio Engineering - Medium
    • https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes-
    c5b20bd8a718
    • Understanding Linux Container Scheduling — Squarespace / Engineering
    • https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling

    View full-size slide

  77. References
    • DockerίϯςφͰར༻Ͱ͖ΔϦιʔε΍ݖݶΛ੍ݶ͢ΔʢDockerͷ࠷৽ػೳΛ࢖ͬͯΈΑ͏ɿୈ3
    ճʣ | ͘͞ΒͷφϨοδ
    • https://knowledge.sakura.ad.jp/5118/
    • OpenShiftͷResource requestͱlimit - nekop's blog
    • https://nekop.hatenablog.com/entry/2017/12/20/182523
    • How to Evolve Kubernetes Resource Management Model
    • https://www.infoq.com/presentations/evolve-kubernetes-resource-manager/

    View full-size slide

  78. References
    • 3.2. cpu Red Hat Enterprise Linux 6 | Red Hat Customer Portal
    • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/
    resource_management_guide/sec-cpu
    • 3.4. cpuset Red Hat Enterprise Linux 6 | Red Hat Customer Portal
    • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/
    resource_management_guide/sec-cpuset
    • ୈ4ষ CPU Ϛωʔδϟʔͷ࢖༻ OpenShift Container Platform 4.1 | Red Hat Customer Portal
    • https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.1/html/
    scalability_and_performance/using-cpu-manager

    View full-size slide

  79. References
    • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά
    • https://jp.engineering.indeedblog.com/blog/
    2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0
    %e8%a7%a3%e9%99%a4-
    %e3%82%af%e3%83%a9%e3%82%a6%e3%83%89%e3%81%ab%e3%81%8a%e3%81%91%e3%82%
    8b-cpu-%e3%81%ae%e5%88%b6%e9%99%90%e3%81%ae/
    • εϩοτϦϯάͷղআ: ༗ޮͳमਖ਼͕ෆ۩߹ͷݪҼʹͳͬͯ͠·ͬͨཧ༝ - Indeed ΤϯδχΞϦϯάɾϒϩά
    • https://jp.engineering.indeedblog.com/blog/
    2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0
    %e3%81%ae%e8%a7%a3%e9%99%a4-
    %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85%
    b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/

    View full-size slide

  80. References
    • Kubernetes best practices: Resource requests and limits | Google Cloud Blog
    • https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-
    and-limits
    • community/resource-qos.md at master · kubernetes/community
    • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/
    node/resource-qos.md#qos-classes
    • community/resources.md at master · kubernetes/community
    • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/
    scheduling/resources.md

    View full-size slide

  81. References
    • Control CPU Management Policies on the Node - Kubernetes
    • https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/
    • Feature Highlight: CPU Manager - Kubernetes
    • https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/
    • kubelet - Kubernetes
    • https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
    • Reserve Compute Resources for System Daemons - Kubernetes
    • https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

    View full-size slide

  82. References
    • CFS quotas can lead to unnecessary throttling · Issue #67577 · kubernetes/kubernetes
    • https://github.com/kubernetes/kubernetes/issues/67577
    • Throttling: New Developments in Application Performance with CPU Limits - Dave Chiluk, Indeed -
    YouTube
    • https://www.youtube.com/watch?v=UE7QX98-kO0
    • Throttling CPU usage with Linux cgroups
    • http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html
    • hjacobs/kubernetes-failure-stories: Compilation of public failure/horror stories related to Kubernetes
    • https://github.com/hjacobs/kubernetes-failure-stories

    View full-size slide