Slide 1

Slide 1 text

Understanding CPU throttling in Kubernetes to improve application performance 2020/06/13 KubeFest Tokyo 2020 @ponde_m

Slide 2

Slide 2 text

I work at @ponde_m @d-kuro

Slide 3

Slide 3 text

Introduction • Kubernetes ͷ Resource Management ʹ͍ͭͯ৭ʑ࿩͠·͕͢ ࠓճ͸ CPU ʹϑΥʔΧεͨ͠࿩Λ͠·͢ • Memory ʹ͍ͭͯͷ࿩͸͠·ͤΜ

Slide 4

Slide 4 text

Agenda • Resource Management in Kubernetes • How Kubernetes Resource Control is implemented • Checking the Throttling • CPU Throttling Impacts and Fixes

Slide 5

Slide 5 text

Understanding CPU throttling in Kubernetes to improve application performance 2020/06/13 KubeFest Tokyo 2020 @ponde_m Understanding CPU throttling in Kubernetes to improve application performance

Slide 6

Slide 6 text

What is Throttling?

Slide 7

Slide 7 text

What is Throttling? • ฉ͍ͨ͜ͱ͸͋Δ͕งғؾͰ࢖͍ͬͯΔ • ͳΜ੍͔ݶ͞Εͯͦ͏͕ͩݴޠԽ͕Ͱ͖ͳ͍

Slide 8

Slide 8 text

What is Throttling? • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘ ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB

Slide 9

Slide 9 text

What is Throttling? • εϩοτϧ(ӳ: throttle)͸ྲྀମΛ੍ޚ͢ΔػߏͷͻͱͭͰɺྲྀ࿏அ໘ ੵΛมԽͤͯ͞ྲྀྔΛ੍ޚ͢Δ૷ஔͰ͋Δɻओཁͳߏ੒෦඼Ͱ͋Δ หʢόϧϒʣ͸εϩοτϧόϧϒ(ӳ: throttle valve)͋Δ͍͸ߜΓห ͱݺ͹ΕɺหΛૢ࡞͢ΔͨΊͷߏ଄͸εϩοτϧϨόʔ(ӳ: throttle lever)ɺεϩοτϧϖμϧ(ӳ: throttle pedal)ɺΨεϖμϧ(ถ: gas pedal)ɺεϩοτϧάϦοϓ(ӳ: throttle grip)ͳͲͷΑ͏ʹݺ͹Ε Δɻ͋Δ͍͸ૢ࡞෦Λࢦͯ͠εϩοτϧͱུশ͢Δ৔߹΋͋Δɻ https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%AD%E3%83%83%E3%83%88%E3%83%AB

Slide 10

Slide 10 text

What is Throttling? • ͳͷͰࠓճͷςʔϚΛݴ͍׵͑Δͱ • “CPU ͷྲྀྔ੍ޚʹ͍ͭͯཧղͯ͠ ΞϓϦέʔγϣϯͷύϑΥʔϚϯεΛ޲্ͤ͞Α͏” • ͱ͍͏ײ͡ʹͳΔ͔ͳͱࢥ͍ͬͯ·͢

Slide 11

Slide 11 text

What is Throttling? • Throttling CPU usage with Linux cgroups http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html

Slide 12

Slide 12 text

Resource Management

Slide 13

Slide 13 text

Resource Management • Kubernetes Ͱ͸ Pod ʹׂΓ౰ͯΔ Resource Λࢦఆ͢Δ͜ͱ͕Ͱ͖Δ

Slide 14

Slide 14 text

Requests • ࠷௿ݶ Pod ʹׂΓ౰ͯΔ Resource

Slide 15

Slide 15 text

Limits • Pod ʹׂΓ౰ͯΔ Resource ͷ্ݶ • Limits ʹୡ͢Δͱ CPU ͕ throttle ͞ΕΔ

Slide 16

Slide 16 text

Millicores • 1000 millicores == 1 Core • 2 Core ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹ • 2000m • 1 Core ͷ ¼ ෼ͷ CPU Λ࢖༻͍ͨ͠৔߹ • 250m

Slide 17

Slide 17 text

Scheduling

Slide 18

Slide 18 text

Scheduling • Node ΁ͷ Pod ͷ Scheduling ͸ Requests ΛݩʹߦΘΕΔ Requests 1000m Requests 1000m Requests 1000m Requests 1000m resources: requests: cpu: 1000m limits: cpu: 2000m Schedule $PSF Requests 1000m Requests 1000m Requests 1000m $PSF

Slide 19

Slide 19 text

Scheduling • Node ͷ Resource ΑΓ΋େ͖͍ Requests ͩͱ Pod ͕ schedule ͞Εͳ͍ resources: requests: cpu: 1000m limits: cpu: 2000m schedule resources: requests: cpu: 5000m limits: cpu: 6000m schedule Requests 1000m Requests 1000m Requests 1000m Requests 1000m $PSF $PSF

Slide 20

Slide 20 text

Scheduling • Limits ͸ Scheduling ͷࡍʹߟྀ͞Εͳ͍ͷͰ Pod ͕Limits ·Ͱ Resource Λ࢖͑Δอূ͸ͳ͍ Node ͷར༻Մೳͳ Resource Λ্ճΔࣄ͕͋Δ (overcommitted state) Requests 1000m Requests 1000m Requests 1000m Requests 1000m $PSF Limits 2000m

Slide 21

Slide 21 text

Overcommitted State Limits 2000m Requests 1000m Requests 1000m Requests 1000m Requests 1000m • Overcommitted State ʹͳͬͨࡍʹͲͷΑ͏ʹৼΔ෣͏ͷ͔ • CPU ͷ৔߹͸ CPU Requests ͷ෼ͷ CPU Λऔಘͯ͠࢒ΓΛ throttle ͤ͞Δ throttle!! CPU ͸ѹॖՄೳͳ Resource https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/resources.md $PSF

Slide 22

Slide 22 text

Tips: Capacity and Allocatable • Node ͷ status ʹ͸ Capacity ͱ Allocatable ͱ͍͏2ͭͷ߲໨͕͋Δ • Allocatable • Pod Λಈ͔ͨ͢Ίʹ࢖༻Մೳͳ Resource • Capacity • Node શମͰ࢖༻Մೳͳ Resource • kubectl describe node ͱ͔Ͱ֬ೝͰ͖Δ

Slide 23

Slide 23 text

Tips: Capacity and Allocatable • kube-reserved • kubelet ͳͲͷ Node ͷίϯϙʔωϯτ༻ʹ ༧໿͞Εͨ Resource • system-reserved • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource • eviction-thresholds • Eviction ͷᮢ஋

Slide 24

Slide 24 text

Tips: Capacity and Allocatable • kube-reserved • kubelet ͳͲͷ Node ͷίϯϙʔωϯτ༻ʹ ༧໿͞Εͨ Resource • system-reserved • ࢒ΓͷγεςϜίϯϙʔωϯτ༻ʹ༧໿͞Εͨ Resource • eviction-thresholds • Eviction ͷᮢ஋ /PEF$BQBDJUZ LVCFSFTFSWFETZTUFNSFTFSWFEFWJDUJPOUISFTIPME "MMPDBUBCMF

Slide 25

Slide 25 text

Tips: Capacity and Allocatable • kube-reserved ͱ system-reserved ʹ ؔͯ͠͸σϑΥϧτ஋͕ͳ͍ • Scheduler ͸ kubelet ͳͲͷγεςϜίϯϙʔωϯτ͕ ࢖༻͢Δ Resource ͷ͜ͱΛߟྀͤͣʹ Pod Λ Schedule ͢Δ • Node ͕ෆ҆ఆʹͳΔՄೳੑ͕͋ΔͷͰઃఆ͢Δͱ ⭕ • ֤छύϒϦοΫΫϥ΢υͷ Managed Kubernetes Ͱ͸ ͍͍ײ͡ʹઃఆ͞Ε͍ͯͨΓ͢Δ͜ͱ΋͋Δ

Slide 26

Slide 26 text

How Kubernetes Resource Control is implemented

Slide 27

Slide 27 text

How Kubernetes Resource Control is implemented • Kubernetes Ͱ͸ cgroup Λ༻͍ͯ Resource ͷׂ౰Λ੍ޚ͍ͯ͠Δ • ࠨͷ Pod Λ࢖ͬͯৄࡉΛݟ͍ͯ͘

Slide 28

Slide 28 text

CPU Requests

Slide 29

Slide 29 text

CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

Slide 30

Slide 30 text

CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

Slide 31

Slide 31 text

CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

Slide 32

Slide 32 text

CPU Requests • Pod ͕ Schedule ͞Εͨ Node ্ͰҎԼͷσΟϨΫτϦʹҠಈ͢Δ

Slide 33

Slide 33 text

CPU Requests

Slide 34

Slide 34 text

CPU Requests • CPU Requests ͸ cpu.shares ͱ͍͏ϑΝΠϧʹ൓ө͞ΕΔ • 250 ͱ 256 Ͱ਺ࣈ͕ζϨ͍ͯΔͷ͸ cgroup ͕ CPU 1 Core Λ 1024 ʹ෼ׂ͢Δ͕ Kubernetes Ͱ͸ 1000 ʹ෼ׂ͍ͯ͠ΔͨΊ

Slide 35

Slide 35 text

cpu.shares cpu.shares 1024 cpu.shares 1024 cpu.shares 1024 cpu.shares 1024 cpu.shares 1536 512 cpu.shares 1024 cpu.shares 1024 25% 25% 25% 25% 25% 25% 37.5% 12.5% cpu.shares ͕ ίϯςφؒͰશͯಉ͡৔߹ ۉ౳ʹ CPU Resource͕഑෼͞ΕΔ cpu.shares ͕ ίϯςφؒͰҟͳΔ৔߹͸ ൺ཰ʹԠͯ͡ CPU Resource ͕ ഑෼͞ΕΔ • CPU ഑෼ͷॏΈ • cpu.share ͷ߹ܭʹରͯ͠഑෼͕ܭࢉ͞ΕΔͨΊ૬ରతͳ஋

Slide 36

Slide 36 text

cpu.shares 1536 512 cpu.shares 1024 cpu.shares 1024 cpu.shares cpu.shares 1536 512 cpu.shares 1024 cpu.shares 1024 25% 25% 1536/4096 = 37.5% 12.5% cpu.shares 2048 cpu.shares 2048 ͷ ίϯςφΛ௥Ճ͢Δ 1536/6144 = 25% ໿8.3% ໿16.7% ໿33.3% ໿16.7%

Slide 37

Slide 37 text

cpu.shares • cpu.shares ͸શͯͷίϯςφ͕ϑϧͰ CPU Resource Λ࢖༻͠Α͏ͱͨ͠ࡍͷ഑෼ • cpu.shares ʹج͍ܾͮͯΊΒΕͨCPU Resource ͷ഑෼͸อূ͞ΕΔ • Pod ͷ CPU Requests ʹهࡌͨ͠෼ͷ CPU Resource ͸อূ͞ΕΔ • ΞΠυϧͷίϯςφ͕ଘࡏ͢Δ৔߹͸ۭ͍ͨ෼ͷ CPU Λ ଞͷίϯςφ͕࢖༻͢Δ͜ͱ͕Ͱ͖Δ • Pod ͸ CPU Limits ʹୡ͍ͯ͠ͳ͚Ε͹༨͍ͬͯΔ CPU Resource Λ ͍͍ײ͡ʹ࢖͏͜ͱ͕Ͱ͖Δ

Slide 38

Slide 38 text

CPU Limits

Slide 39

Slide 39 text

CPU Limits • CPU Limits ͸ CFS (Completely Fair Scheduler) ͷ ϝΧχζϜΛ༻͍࣮ͯݱ͞Ε͍ͯΔ • cpu.cfs_period_us ͱ cpu.cfs_quota_us ʹઃఆ͕هࡌ͞Ε͍ͯΔ

Slide 40

Slide 40 text

cpu.cfs_period_us / cpu.cfs_quota_us • Requests ʹ ༻͍ΒΕ͍ͯΔ cpu.shares ͱ͸ҟͳΓ Limits ʹ༻͍Β Ε͍ͯΔ CPU ͷׂ౰͸ظؒʹج͍͍ͮͯΔ • cpu.cfs_period_us • ظؒͷఆٛ, σϑΥϧτ͸ 100000us(100ms) • cpu.cfs_quota_us • ظؒ಺Ͱ࣮ߦͰ͖Δ߹ܭ࣌ؒ

Slide 41

Slide 41 text

cpu.cfs_period_us / cpu.cfs_quota_us • CPU Limits ͸ cpu.cfs_quota_us ʹม׵͞ΕΔ • Լͷྫͩͱ 100ms ͷظؒத 50ms, CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ

Slide 42

Slide 42 text

cpu.cfs_period_us / cpu.cfs_quota_us • CPU Limits: 2000m ͷΑ͏ʹࢦఆͨ͠৔߹͸ cpu.cfs_period_us: 100000 cpu.cfs_quota_us: 200000 ͱͳΔ

Slide 43

Slide 43 text

cpu.cfs_period_us / cpu.cfs_quota_us • CPU Limits ͕ͳ͍৔߹ ͸ cpu.cfs_quota_us ʹ͸ -1 ͕ઃఆ͞ΕΔ • -1 ͕ઃఆ͞ΕΔͱແ੍ݶʹͳΔ

Slide 44

Slide 44 text

Time (ms) cpu.cfs_period_us / cpu.cfs_quota_us • ॲཧΛ׬ྃ͢ΔͨΊʹ 250ms ͷॲཧ͕࣌ؒඞཁͳ γϯάϧεϨουͷΞϓϦέʔγϣϯ 250ms 100ms 200ms 300ms 400ms

Slide 45

Slide 45 text

Time (ms) cpu.cfs_period_us / cpu.cfs_quota_us • cpu.cfs_period_us: 100000 cpu.cfs_quota_us: 50000 ͷ৔߹ 50ms 100ms 200ms 300ms 400ms 50ms 50ms 50ms 50ms throttled throttled throttled throttled

Slide 46

Slide 46 text

Time (ms) cpu.cfs_period_us / cpu.cfs_quota_us • cpu.cfs_period_us: 100000 cpu.cfs_quota_us: 50000 ͷ৔߹ 50ms 100ms 200ms 300ms 400ms 50ms 50ms 50ms 50ms throttled throttled throttled throttled 50ms 100ms 200ms 300ms 400ms 50ms 50ms 50ms 50ms throttled throttled throttled throttled Time (ms) Throttling ͞ΕΔͷͰ ׬ྃ·Ͱʹ 450 ms ͔͔ͬͯ͠·͏

Slide 47

Slide 47 text

Kernel Bugs

Slide 48

Slide 48 text

Kernel Bugs • Kernel ʹ CPU ͷ࢖༻཰͕௿ͯ͘΋ίϯςφ͕ Throttling ͞ΕΔόά͕ଘࡏͨ͠ • Indeed ͷਓ͕ॻ͍ͨϒϩάʹৄࡉ͔Β patch Λग़ͨ͠࿩·Ͱৄ͘͠ॻ͍ͯ͋Δ • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά https://jp.engineering.indeedblog.com/blog/2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0%e3%81%ae%e8%a7%a3%e9%99%a4- %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85%b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/

Slide 49

Slide 49 text

Checking the Throttling

Slide 50

Slide 50 text

Checking the Throttling • cpu.stat Λ֬ೝ͢Δ • nr_periods: ܦաࡁΈͷظؒͷ਺ • nr_throttled: nr_periods த Throttling ͷର৅ͱͳͬͨظؒͷ਺ • throttled_time: ૯ Throttle ࣌ؒ (ns)

Slide 51

Slide 51 text

Checking the Throttling • Datadog Ͱ͸ kubernetes.cpu.cfs.throttled.seconds ͱ͍͏ϝτϦΫε͕͋Δ

Slide 52

Slide 52 text

The Impact of CPU Throttling

Slide 53

Slide 53 text

The Impact of CPU Throttling • Error Rate ΍ Response Time ͷѱԽΛট͘͜ͱ͕͋Δ • github.com/hjacobs/kubernetes-failure-stories • Kubernetes ʹؔ͢Δো֐ࣄྫΛ·ͱΊͨ Repository • CPU Throttling ͷࣄྫ͕͍͔ͭ͘ొ৔͢Δ

Slide 54

Slide 54 text

CPU Throttling the Application Pod in Quipper CPU Limits CPU Requests CPU throttled CPU Usage

Slide 55

Slide 55 text

Why was the CPU Throttling gone? ͋ΔมߋΛՃ͑ͯҎ߱ CPU Throttling ͕ͳ͘ͳͬͨ

Slide 56

Slide 56 text

Fix

Slide 57

Slide 57 text

Fix • CPU Limits Λফͨ͠

Slide 58

Slide 58 text

Fix CPU Limits ͜͜Ͱ Limit Λফͨ͠ ͦΕҎ߱ CPU Throttling ͕ͳ͘ͳͬͨ

Slide 59

Slide 59 text

Remove CPU Limits • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋ ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸ جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍ • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ

Slide 60

Slide 60 text

Remove CPU Limits • ฐࣾͰ͸͜͏͍͏ϙϦγʔͰ Pod ͷ Resource Λઃఆ͠Α͏ ͱ͍͏υΩϡϝϯτΛهࡌͯ͠ӡ༻͍ͯ͠Δ • CPU Limits ͸جຊෆཁͩΑͱ͍͏هࡌΛ͍ͯ͠Δ • CPU Limits Λ࡟আ͢Δ͜ͱʹڧ੍ྗΛಇ͔ͤΔ͜ͱ͸·͍ͩͯ͠ͳ͍ • શͯͷ Pod ͔Β CPU Limits Λ࡟আͨ͠Θ͚Ͱ͸ແ͍͕ CPU Throttling ͷϝτϦΫεΛ watch ͍ͯͯ͠ Throttling ͕ൃੜ͢Ε͹౎౓ରԠΛ͍ͯ͠Δ

Slide 61

Slide 61 text

Remove CPU Limits • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋ ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸ جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍ • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ

Slide 62

Slide 62 text

Remove CPU Limits • CPU Limits ͸ઃఆ͢Δͱ CPU ʹۭ͖ Resource ͕͋ͬͯ΋ ͦΕҎ্ར༻͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΔ • ۭ͍ͯΔ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖ͳ͍ͷ͸ جຊతʹΞϓϦέʔγϣϯʹͱͬͯ͸σϝϦοτ • ա৒ͳར༻Λ཈੍͍ͨ͠৔߹Ҏ֎͸ CPU Limits ͸͍Βͳ͍ • CPU Requests Ͱࢦఆͨ͠෼͸࢖༻Ͱ͖Δ͜ͱ͕อূ͞ΕΔ CPU Limits ͕ͳ͍ͱ QoS ͕ Guaranteed ʹͳΒͳ͍

Slide 63

Slide 63 text

QoS Class • Eviction ͷ࣌ʹߟྀ͞ΕΔ Pod ͷ༏ઌ౓ • Guaranteed • Requests/Limits ͕Ұக • Burstable • Requests/Limits ͕ෆҰக • BestEffort • Requests/Limits ͕ͳʹ΋ͳ͍

Slide 64

Slide 64 text

QoS Class • ฐࣾͰ͸ݱঢ় CPU Limits Λ࡟আͨ͜͠ͱͰ QoS Class ͕ Burstable ʹ ઃఆ͞ΕΔ͜ͱͰͳʹ͔ࠔ͍ͬͯΔ͜ͱ͸ͳ͍ • ݩʑ Guaranteed ͳ Pod ͕ຆͲͳ͔ͬͨͱ͍͏ͷ͸͋Δ • Eviction ͸ͦ΋ͦ΋͕ͦΜͳʹൃੜ͍ͯ͠ͳ͍ • ࠔΔέʔε͸ଘࡏ͢Δ

Slide 65

Slide 65 text

Static CPU Manager Policy • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ • kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅ • Pod ͷ QoS Class ͕ Guaranteed • Pod ͷ CPU Requests ͕੔਺஋

Slide 66

Slide 66 text

Static CPU Manager Policy • Pod ʹ CPU ΛഉଞతʹׂΓ౰ͯΔ͜ͱ͕Ͱ͖Δ • kubelet ͷ Option ͷ cpu-manager-policy=static Ͱ ༗ޮʹ͢Δ͜ͱ͕Ͱ͖Δ • ഉଞతʹ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ৚݅ • Pod ͷ QoS Class ͕ Guaranteed • Pod ͷ CPU Requests ͕੔਺஋

Slide 67

Slide 67 text

cpu-cfs-quota=false • kubelet ͷ Option Ͱ cpu-cfs-quota=false Λ౉͢ • CFS Quota ͕ແޮʹͳΔͷͰ CPU Limits ʹΑΔ Throttle ͕ى͖ͳ͘ͳΔ • CPU Limits ࣗମ͸ࢦఆͰ͖ΔͷͰ QoS Class ͸ Guaranteed ʹ ͢Δ͜ͱ͕Ͱ͖Δ

Slide 68

Slide 68 text

cpu-cfs-quota=false • --cpu-manager-policy=static —cpu-cfs-quota=false Λ౉ͯ͠ kubelet Λىಈ͢Δ (ผͰ —kube-reserved ͱ —system-reserved Λࢦఆ͍ͯ͠Δ) • QoS ͕ Guaranteed ͔ͭ CPU Requests ͕੔਺஋ʹ ͳΔΑ͏ͳ Pod Λ༻ҙ • 4 Core ͷ Node த 1 Core Λ Nginx Pod ʹઐ༗ͤ͞Δ

Slide 69

Slide 69 text

cpu-cfs-quota=false • Pod ͷ QoS Class ͕ Guaranteed ͔ͭ cpu.cfs_quota_us ͕ -1 (ແ੍ݶ) ʹͳ͍ͬͯΔ

Slide 70

Slide 70 text

• cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ Static CPU Manager Policy

Slide 71

Slide 71 text

• cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ Nginx ͷ Pod ͕ CPU 1 ͭΛ࢖͏Α͏ʹͳ͍ͬͯΔ͜ͱ͕֬ೝͰ͖Δ Static CPU Manager Policy cpuset σΟϨΫτϦ cpuset.cpus ϑΝΠϧʹ ίϯςφ͕ΞΫηεͰ͖Δ CPU ͷ൪߸͕ॻ͔Ε͍ͯΔ

Slide 72

Slide 72 text

Static CPU Manager Policy CPU 0 CPU 1 CPU 2 CPU 3 $PSF cpuset.cpus = 2 Pod ͸ 2 ൪ͷ CPU ͚ͩ࢖͑Δ

Slide 73

Slide 73 text

• ద౰ͳଞͷ Burstable ͳ Pod ͷ cpuset.cpus ϑΝΠϧΛ֬ೝ͢Δͱ 2 ൪ͷ CPU ͚͕ͩআ֎͞Ε͍ͯΔ Static CPU Manager Policy 0 ~ 1 ൪ͱ 3 ൪ͷ CPU Λ࢖༻͢Δ͜ͱ͕Ͱ͖Δ (Nginx Pod ͕࢖༻͍ͯ͠Δ 2 ൪͸আ֎͞Ε͍ͯΔ)

Slide 74

Slide 74 text

Static CPU Manager Policy CPU 0 CPU 1 CPU 2 CPU 3 $PSF cpuset.cpus = 0-1, 3 Pod ͸ 0, 1, 3 ൪ͷ CPU ͚ͩ࢖͑Δ

Slide 75

Slide 75 text

cpu-cfs-quota=false • ஫ҙ఺ͱͯ͠͸ kubelet ͷ Option Ͱࢦఆ͢ΔͷͰ Node ୯ҐͰͷ༗ޮԽ/ແޮԽʹͳΔ • ༗ޮԽ͢Δ Node ͰͲͷ Pod ͕ಈ͍͍ͯΔ͔ͱ͍͏֬ೝΛ͢Δͱ⭕

Slide 76

Slide 76 text

Other Workarounds • Pod ͷ CPU Limits Λ૿΍͢ • ΞϓϦέʔγϣϯͷ Thread ΛݮΒ͢

Slide 77

Slide 77 text

Bood Bye Throttling

Slide 78

Slide 78 text

Conclusion

Slide 79

Slide 79 text

Conclusion • CPU Throttling ʹΑͬͯΞϓϦέʔγϣϯͷύϑΥʔϚϯε΍ ৴པੑʹѱӨڹ͕ͰΔ͜ͱ͕͋Δ • CPU Throttling ͷϞϦλϦϯάΛ͠Α͏ • CPU Limits ͷ࡟আ΍ CFS Quota Λແޮʹ͢Δ͜ͱͰ CPU Throttling ΁ͷରࡦΛߦ͏͜ͱ͕Ͱ͖Δ

Slide 80

Slide 80 text

Thank You for Listening

Slide 81

Slide 81 text

References • Understanding resource limits in kubernetes: cpu time • https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu- time-9eff74d3161b • CPU limits and aggressive throttling in Kubernetes - Omio Engineering - Medium • https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes- c5b20bd8a718 • Understanding Linux Container Scheduling — Squarespace / Engineering • https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling

Slide 82

Slide 82 text

References • DockerίϯςφͰར༻Ͱ͖ΔϦιʔε΍ݖݶΛ੍ݶ͢ΔʢDockerͷ࠷৽ػೳΛ࢖ͬͯΈΑ͏ɿୈ3 ճʣ | ͘͞ΒͷφϨοδ • https://knowledge.sakura.ad.jp/5118/ • OpenShiftͷResource requestͱlimit - nekop's blog • https://nekop.hatenablog.com/entry/2017/12/20/182523 • How to Evolve Kubernetes Resource Management Model • https://www.infoq.com/presentations/evolve-kubernetes-resource-manager/

Slide 83

Slide 83 text

References • 3.2. cpu Red Hat Enterprise Linux 6 | Red Hat Customer Portal • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/ resource_management_guide/sec-cpu • 3.4. cpuset Red Hat Enterprise Linux 6 | Red Hat Customer Portal • https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/ resource_management_guide/sec-cpuset • ୈ4ষ CPU Ϛωʔδϟʔͷ࢖༻ OpenShift Container Platform 4.1 | Red Hat Customer Portal • https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.1/html/ scalability_and_performance/using-cpu-manager

Slide 84

Slide 84 text

References • εϩοτϦϯάղআ: Ϋϥ΢υʹ͓͚Δ CPU ͷ੍ݶͷमਖ਼ - Indeed ΤϯδχΞϦϯάɾϒϩά • https://jp.engineering.indeedblog.com/blog/ 2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0 %e8%a7%a3%e9%99%a4- %e3%82%af%e3%83%a9%e3%82%a6%e3%83%89%e3%81%ab%e3%81%8a%e3%81%91%e3%82% 8b-cpu-%e3%81%ae%e5%88%b6%e9%99%90%e3%81%ae/ • εϩοτϦϯάͷղআ: ༗ޮͳमਖ਼͕ෆ۩߹ͷݪҼʹͳͬͯ͠·ͬͨཧ༝ - Indeed ΤϯδχΞϦϯάɾϒϩά • https://jp.engineering.indeedblog.com/blog/ 2019/12/%e3%82%b9%e3%83%ad%e3%83%83%e3%83%88%e3%83%aa%e3%83%b3%e3%82%b0 %e3%81%ae%e8%a7%a3%e9%99%a4- %e6%9c%89%e5%8a%b9%e3%81%aa%e4%bf%ae%e6%ad%a3%e3%81%8c%e4%b8%8d%e5%85% b7%e5%90%88%e3%81%ae%e5%8e%9f%e5%9b%a0/

Slide 85

Slide 85 text

References • Kubernetes best practices: Resource requests and limits | Google Cloud Blog • https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests- and-limits • community/resource-qos.md at master · kubernetes/community • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/ node/resource-qos.md#qos-classes • community/resources.md at master · kubernetes/community • https://github.com/kubernetes/community/blob/master/contributors/design-proposals/ scheduling/resources.md

Slide 86

Slide 86 text

References • Control CPU Management Policies on the Node - Kubernetes • https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ • Feature Highlight: CPU Manager - Kubernetes • https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ • kubelet - Kubernetes • https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ • Reserve Compute Resources for System Daemons - Kubernetes • https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

Slide 87

Slide 87 text

References • CFS quotas can lead to unnecessary throttling · Issue #67577 · kubernetes/kubernetes • https://github.com/kubernetes/kubernetes/issues/67577 • Throttling: New Developments in Application Performance with CPU Limits - Dave Chiluk, Indeed - YouTube • https://www.youtube.com/watch?v=UE7QX98-kO0 • Throttling CPU usage with Linux cgroups • http://kennystechtalk.blogspot.com/2015/04/throttling-cpu-usage-with-linux-cgroups.html • hjacobs/kubernetes-failure-stories: Compilation of public failure/horror stories related to Kubernetes • https://github.com/hjacobs/kubernetes-failure-stories