Slide 1

Slide 1 text

͘͞ΒΠϯλʔωοτ גࣜձࣾ (C) Copyright 1996-2019 SAKURA Internet Inc ͘͞ΒΠϯλʔωοτ ݚڀॴ SRE΁ͷػցֶशద༻ʹؔ͢Δ αʔϕΠ 2019/03/27 ݚڀһ ௶಺ ༎थ Machine Learning Meetup KANSAI #4 LT @yuuk1t / id:y_uuki

Slide 2

Slide 2 text

1. Site Reliability Engineering (SRE)

Slide 3

Slide 3 text

3 Site Reliability Engineeringͱ͸ ɾReliability = ৴པੑ: Ϣʔβʔ͕շదʹ αʔϏεΛར༻Ͱ͖Δ౓߹͍ ɾίϯϐϡʔλγεςϜͷ৴པੑΛ੍ޚ ͢Δ͜ͱΛ໨ࢦͨ͠޻ֶ෼໺ ɾैདྷͷγεςϜ؅ཧΛιϑτ΢ΣΞΤ ϯδχΞϦϯάʹΑΓ࠶ߏங ɾϞχλϦϯά, ΠϯγσϯτରԠ, มߋ ؅ཧ, ΩϟύγςΟϓϥϯχϯά, ϓϩ Ϗδϣχϯά, ޮ཰ͱύϑΥʔϚϯε…

Slide 4

Slide 4 text

4 ػցֶशద༻ͷಈػ ɾϚχϡΞϧ࡞ۀΛιϑτ΢ΣΞͰࣗಈԽ͍ͨ͠ ɾ͔͠͠ɺख़࿅ͷ৬ਓ͕΍͍ͬͯΔߴ౓ͳ൑அΛࣗಈԽ͢Δͷ͸ ೉͍͠ ɾͦ͜Ͱɺػցֶश΍ϑΟʔυόοΫ੍ޚͳͲͷΠϯςϦδΣϯ τͳख๏ʹண໨͢Δ

Slide 5

Slide 5 text

5 ߴ౓ͳ൑அͱ͸ͳʹ͔ ɾαʔϏεͷෛՙ૿ݮʹԠͯ͡ܭࢉػϦιʔεΛ͍ͭɺͲͷఔ౓૿ݮ͞ ͤΔ͔ͷ൑அ ɾVMʹͲͷఔ౓ͷϦιʔεΛׂΓ౰ͯΔͷ൑அ ɾCPUར༻཰ͳͲͷ֤छϝτϦοΫΛ֬ೝ্ͨ͠Ͱͷҟৗͷ൑அ ɾ෼ࢄͨ͠ϊʔυؒͷґଘؔ܎ͷ஌ࣝΛ΋ͱʹͨ͠ҟৗͷ൑அ ɾϛυϧ΢ΣΞͷઃఆ஋ͰͲΕ͕ྑ͍͔ͷ൑அ ɾetc

Slide 6

Slide 6 text

6 ɾ৴པੑΛอূ͢ΔΑ͏ʹࣗಈ੍ޚ ɾܭࢉػϦιʔεͷޮ཰ར༻ ΠϯςϦδΣϯτʹ΍Γ͍ͨ͜ͱ

Slide 7

Slide 7 text

7 ͳʹ͔Β࢝ΊΔͷ͔ ɾػցֶशΛษڧ࢝͠ΊΔલʹɺԠ༻ͷΠϝʔδΛ͓͖͍ͭͬͯͨ͘ ɾSREͷݱ৔ͰΠϯςϦδΣϯτͳ࢓૊Έͷಋೖࣄྫ͸ଟ͘ͳ͍ ɾαʔό؂ࢹαʔϏεMackerelͷϩʔϧ಺ҟৗݕ஌ػೳͳͲ ɾͦ͜Ͱɺݚڀ࿦จΛαʔϕΠ͠ɺͲͷΑ͏ͳख๏͕ར༻͞Ε͍ͯΔ͔ Λ஌Δ

Slide 8

Slide 8 text

2. ػցֶशͷSRE΁ͷద༻

Slide 9

Slide 9 text

9 ୅දతͳΞϓϦέʔγϣϯͷ෼ྨ ϞχλϦϯά Φʔτ εέʔϦϯά ෼ࢄγεςϜ ͷґଘؔ܎ ࠓ೔ͷείʔϓ Ϋϥ΢υͷ Ϧιʔε੍ޚ ϛυϧ΢ΣΞઃఆ ͷࣗಈνϡʔχϯά

Slide 10

Slide 10 text

10 αʔόͷΦʔτεέʔϦϯά ɾ[1]: PerfEnforce: a dynamic scaling engine for analytics with performance guarantees ɾRedShiftͷΑ͏ͳOLAPͷΫΤϦηογϣϯதʹ੍ޚث͕໨ඪͱΫΤϦ࣌ؒ Λ্ճΒͳ͍Α͏ʹɺDBαʔόͷ୆਺ΛεέʔϦϯάͤ͞ΔΤϯδϯ ɾ༧ଌతख๏ͱͯ͠ΦϯϥΠϯֶश(ύʔηϓτϩϯ)ɺ൓Ԡతख๏ͱͯ͠ڧԽ ֶश(Qֶश)·ͨ͸ϑΟʔυόοΫ੍ޚ(PI)Λར༻͠ൺֱ͢Δ [3]: Figure 1. PerfEnforce deployment ɾධՁͷ݁Ռɺύʔηϓτϩϯ͕ྑ͍ ݁Ռͱͳͬͨ ɾ൓Ԡతख๏͸΍͸ΓಥൃతͳมԽ ΁ͷରԠ͕஗͍

Slide 11

Slide 11 text

11 Ϋϥ΢υϦιʔε੍ޚ ɾ[2]: Self-Adaptive and Self-Configured CPU Resource Provisioning for Virtualized Servers Using Kalman Filters ɾΧϧϚϯϑΟϧλʔʹΑΓɺదԠతʹVMͷCPUϦιʔεΛׂΓ౰ͯΔ [2]: Figure 1. Virtualized prototype and control system. ɾCPUݸ਺Λ੩తʹܾఆ͍ͯͯ͠Ϧιʔε͕ ଍Γͳ͔ͬͨΓ༨Δ໰୊͕͋Δ ɾController͕VMͷCPU࢖༻཰ΛτϥοΩϯ ά͠ɺᮢ஋ʹୡ͢ΔͱɺΧϧϚϯϑΟϧλʔ ʹै͍ɺCPU਺Λมߋ͢Δ

Slide 12

Slide 12 text

12 ϛυϧ΢ΣΞઃఆͷࣗಈνϡʔχϯά ɾ[3]: Automatic Database Management System Tuning Through Large-Scale Machine Learning. ɾMySQL/PostgresͷઃఆΛࣗಈνϡʔχϯάɻઐ໳Ոͷઃఆʹ͍ۙੑೳʹɻ ɾϝτϦοΫΛҼࢠ෼ੳ͠ɺK-MeansΫϥελϦϯάͯ͠ॏཁͳ΋ͷΛநग़ ɾLassoʹΑΓγεςϜશମͷੑೳʹରͯ͠૬ؔͷେ͖͍ઃఆ߲໨Λಛ௃બ୒ ɾνϡʔφʔ͕ઃఆΛมߋ࣮ͭͭ͠ࡍʹܭଌͯ͠ྑ͍஋Λܾఆ [3]: Figure 4.

Slide 13

Slide 13 text

13 αʔϕΠ࿦จ ɾ[4]: A Control Theoretical View of Cloud Elasticity: Taxonomy, Survey and Challenges (2018) ɾΫϥ΢υͷ৳ॖੑʹ੍ޚཧ࿦ͷख๏Λద༻ͨ͠ݚڀΛ·ͱΊͨαʔϕΠ ɾػցֶशΑΓ΋ϑΟʔυόοΫ੍ޚ΍ϑΝδʔ੍ޚ͕த৺ ɾ[5]: Adaptation in Cloud Resource Configuration: A Survey (2016) ɾΫϥ΢υͷϦιʔεઃఆ΁దԠతख๏Λద༻ͨ͠ݚڀΛ·ͱΊͨαʔϕΠ ɾώϡʔϦεςΟοΫɺ੍ޚཧ࿦ɺػցֶशɺ଴ͪߦྻཧ࿦ʹ෼ྨ ɾ[6]: Resource Management in Clouds: Survey and Research Challenges (2015) ɾ[7]: What Does Control Theory Bring to Systems Research? (2009)

Slide 14

Slide 14 text

3. ·ͱΊ

Slide 15

Slide 15 text

15 ·ͱΊ ɾSRE͸ίϯϐϡʔλγεςϜͷ৴པੑΛ੍ޚ͢Δ޻ֶ෼໺ ɾػցֶशΛؚΊͨΠϯςϦδΣϯτͳख๏ΛSRE΁ద༻͢Δಈػͱ ͯ͠ɺख़࿅ͷ৬ਓͷϚχϡΞϧ࡞ۀͷࣗಈԽΛڍ͛ͨ ɾطଘख๏ͱͯ͠ɺαʔόͷΦʔτεέʔϦϯάɺΫϥ΢υͷϦιʔ ε੍ޚɺϛυϧ΢ΣΞઃఆͷࣗಈνϡʔχϯάΛ঺հͨ͠ ɾཧ૝తͳঢ়گͰ͸ػೳ͢Δ͕ɺࠓޙ͸ɺଟ͘ͷύϥϝʔλ΍֎ཚɺ ଴ͪߦྻ͕ෳࡶʹབྷΈ߹͏ຊ൪؀ڥʹ͍͔ʹద༻͍͔͕ͯ͘͠՝୊

Slide 16

Slide 16 text

https://www.slideshare.net/syou6162/mackerel-108429592

Slide 17

Slide 17 text

https://speakerdeck.com/rrreeeyyy/a-survey-of-anomaly-detection-methodologies-for-web-system

Slide 18

Slide 18 text

https://speakerdeck.com/tsurubee/ji-jie-xue-xi-tesahafalsefu-he-zhuang-tai-woba-wo-sitai

Slide 19

Slide 19 text

ػցֶश΍੍ޚ޻ֶͷద༻ઌͱͯ͠ͷ SRE

Slide 20

Slide 20 text

ࢀߟจݙ

Slide 21

Slide 21 text

21 ࢀߟจݙ ɾ[1]: ORTIZ, Jennifer, et al. PerfEnforce: a dynamic scaling engine for analytics with performance guarantees. arXiv preprint arXiv:1605.09753, 2016. ɾ[2]: KALYVIANAKI, Evangelia; CHARALAMBOUS, Themistoklis; HAND, Steven. Self-adaptive and self-configured CPU resource provisioning for virtualized servers using Kalman filters. In: Proceedings of the 6th international conference on Autonomic computing. ACM, 2009. p. 117-126. ɾ[3]: VAN AKEN, Dana, et al. Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 2017. p. 1009-1024.

Slide 22

Slide 22 text

22 ࢀߟจݙ ɾ[4]: ULLAH, Amjad, et al. A control theoretical view of cloud elasticity: taxonomy, survey and challenges. Cluster Computing, 2018, 21.4: 1735-1764. ɾ[5]: HUMMAIDA, Abdul R.; PATON, Norman W.; SAKELLARIOU, Rizos. Adaptation in cloud resource configuration: a survey. Journal of Cloud Computing, 2016, 5.1: 7. ɾ[6]: JENNINGS, Brendan; STADLER, Rolf. Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management, 2015, 23.3: 567-619. ɾ[7]: ZHU, Xiaoyun, et al. What does control theory bring to systems research?. ACM SIGOPS Operating Systems Review, 2009, 43.1: 62-69.