XFLAG スタジオにおけるSRE / xflag-studio-sre

XFLAG スタジオにおけるSRE / xflag-studio-sre

第76回: SRE大全: XFLAG スタジオ編

46839cf590a549efe13547c17a6b2fde?s=128

Isao Shimizu

August 24, 2017
Tweet

Transcript

  1. XFLAG ελδΦʹ͓͚ΔSRE XFLAG ࣄۀຊ෦ ήʔϜ։ൃࣨ SREάϧʔϓ ਗ਼ਫ ܄ @isaoshimizu hbstudy

    ୈ76ճɿ SREେશ: XFLAG ελδΦฤ XFLAG STUDIO
  2. ࣗݾ঺հ 2

  3. About me • ਗ਼ਫ ܄ / Isao SHIMIZU @isaoshimizu •

    גࣜձࣾϛΫγΟ XFLAG ࣄۀຊ෦ ήʔϜ։ൃࣨ SREάϧʔϓ • SIerͰडୗ։ൃɺࣗࣾϓϩμΫτ։ൃɺӡ༻Λ໿8೥ • גࣜձࣾϛΫγΟ • 2011.8ʙ ӡ༻෦ ΞϓϦӡ༻άϧʔϓɻSNSͷӡ༻ɻ • Fedora 8 -> 17ΞοϓσʔτɺsystemdಋೖɺLXCಋೖɺͦͷଞվળͳͲ • 2014.4ʙ ϞϯελʔετϥΠΫͷӡ༻ʹδϣΠϯɻ • 2015.8ʙ XFLAG ελδΦ͕૑ઃ͞ΕΔɻ • 2016.7ʙ XFLAG ελδΦʹSREάϧʔϓ૑ઃɻ • ϓϥΠϕʔτ͸ɺΫϥϑτϏʔϧ ɺָثԋ૗ʢTromboneʣɺि຤ྉཧɺήʔϜͳͲ 3
  4. XFLAG ελδΦʹ͍ͭͯ 4

  5. XFLAG ελδΦ • εϚʔτϑΥϯ޲͚ήʔϜ • ϞϯελʔετϥΠΫʢ2013.10ʙʣ • ϞϯετελδΞϜʢ2015.4ʙʣ • ϑΝΠτϦʔάʢ2017.6ʙʣ

    • ಈը • ϞϯετΞχϝ • YouTube഑৴ʢ2017.6.14ʹੈքྦྷܭ࠶ੜճ਺2ԯճಥഁʣ • ࡢ೥຤ʹ͸ܶ৔൛΋ެ։ • XFLAG STORE SHIBUYAʢৗઃళฮʣɺXFLAG STOREʢΦϯϥΠϯετΞʣ • ͦͷଞ 5
  6. SREʹ͍ͭͯ 6

  7. Googleʹ͓͚ΔSRE 43&͸ɺྺ࢙తʹΦϖϨʔγϣϯۀ຿Λجຊతʹߦ͍ͬͯΔ͕ɺ
 ਓखͰߦ͍ͬͯͨ͜ͱΛࣗಈԽʹஔ͖׵͑Διϑτ΢ΣΞΤϯδχΞͷ໾ׂ΋୲͍ͬͯΔ
 https://landing.google.com/sre/interview/ben-treynor.html "Fundamentally, it's what happens when you

    ask a software engineer to design an operations function." Ben Treynor Sloss, Vice President, Google Engineering, founder of Google SRE جຊతʹɺιϑτ΢ΣΞΤϯδχΞʹΦϖϨʔγϣϯۀ຿ͷઃܭΛґཔ͢Δͱ͖ʹඞཁͱ͞ΕΔ΋ͷ "Traditional software engineers tend to focus on one particular system, and understand it in great depth. Software engineers in Site Reliability Engineering tend to spread their focus across a broad range of systems." Nida Farrukh, Site Reliability Engineer, Zurich ैདྷ͔Βͷιϑτ΢ΣΞΤϯδχΞ͸ɺ͋ΔͭͷγεςϜʹϑΥʔΧε͠ɺਂ͘ཧղ͢Δ܏޲͕͋Δɻ43&͸෯޿͍γεςϜʹϑΥʔΧε͢Δ܏޲͕͋Δɻ "Our work is like being part of the world's most intense pit crew. We change the tires of a race car as it's going 100 mph." Andrew Widdowson, Site Reliability Engineer, Mountain View ࢲୡ43&ͷ࢓ࣄ͸ੈքͰ࠷΋ܹ͍͠ϐοτΫϧʔͷҰһͰ͋ΔΑ͏ͳ΋ͷɻզʑ͸ɺϚΠϧຖ࣌ʢ࣌଎໿LNʣͰ૸ΔϨʔεΧʔͷλΠϠΛม͑Δɻ "SREs engineer services, instead of binaries. This is a shift in perspective that exploits unusual skills and creativity. SREs are specialists in making changes safely." John T. Reese, Site Reliability Engineer, San Francisco όΠφϦͰ͸ͳ͘43&͕ఏڙ͢Δ΋ͷɻ͜Ε͸ಛघͳεΩϧͱ૑଄ੑΛੜΈग़͢มԽͰ͋Δɻ43&͸҆શʹมԽΛ΋ͨΒ͢εϖγϟϦετͰ͋Δɻ https://landing.google.com/sre/ 7
  8. ֤ࣾͷSREࣄ৘ • WantedlyͰืूΛSREͰݕࡧ͢Δͱ60݅ώοτ • https://www.wantedly.com/search?t=projects&q=SRE • Rettyʮैདྷͷӡ༻ͱ͞΄ͲมΘΒͳ͍ʯʮ։ൃੜ࢈ੑϓϩδΣΫτʯ • http://itpro.nikkeibp.co.jp/atcl/column/14/346926/030600869/ •

    freeeʮΠϯϑϥ෦ୂΛղࢄ͢Δͷ͕໨ඪʯ ʮՔಇ཰99.9%Λ໨ࢦ͢͜ͱʯ • http://itpro.nikkeibp.co.jp/atcl/column/14/346926/030600869/ • ϝϧΧϦʮӡ༻ۀ຿ͱιϑτ΢ΣΞΤϯδχΞͷ໾ׂ͕ٻΊΒΕΔʯ • http://tech.mercari.com/entry/2015/11/18/153421 • αΠϘ΢ζʮ໾ׂΛݻఆ͗͢͠ͳ͍ʯʮιϑτ΢ΣΞ։ൃεΩϧΛຏ͘ʯʮToil͸ͳ͍ͯ͘͘͠ʯ • http://blog.cybozu.io/entry/2016/09/01/080000 8
  9. XFLAG ελδΦʹ͓͚ΔSRE 9

  10. XFLAG ελδΦʹ͓͚ΔSREͱ͍͏૊৫ • SREάϧʔϓ͕Ͱ͖ͯ໿1೥ʢ2016.7݄ελʔτʣ • ਓ਺͸7໊ʢ2017.8࣌఺ʣ • όοΫάϥ΢ϯυ͸༷ʑ • ৽ଔೖࣾͱத్ೖ͕ࣾ൒ʑ͘Β͍

    • ಘҙ෼໺͸ͦΕͧΕɻϑϧελοΫΛٻΊͯ͸͍ͳ͍ɻ • Ϛωʔδϟʔ͸͓͍͍ͯͳ͍ • ΍Δ΂͖࢓ࣄ͸ࣗ෼Ͱݟ͚ͭɺೳಈతʹ࣮ߦ͢Δ • ༩͑ΒΕͨ࢓ࣄ͚ͩ͜ͳ͍ͯͯ͠΋ධՁ͸͞Εͳ͍ʢ༩͑ΒΕΔ͜ͱ͸كʣ • ۀ຿্ͷίϛϡχέʔγϣϯ͸Slack͕த৺ • ࡞ۀ͸GitHub Issue/Pull Requestͱͯ͠Ξ΢τϓοτ͠ɺٞ࿦΍ϨϏϡʔΛ͓͜ͳ͏ 10
  11. ౰൪੍ • 2ਓମ੍Ͱ1िؒަ୅ɻ೔༵೔࢝·Γ౔༵೔ऴΘΓʢຖ݄1ճ͘Β͍ͷϖʔεʣ • جຊతʹ1࣍ରԠ • PagerDutyͷ׆༻ • ༷ʑͳ௨஌ʢి࿩ɺϝʔϧɺϓογϡͳͲʣ •

    ౰൪͕௨஌ʹؾ͔ͮͳ͔ͬͨ৔߹ɺ౰൪֎΁ࣗಈΤεΧϨʔγϣϯ • ౰൪த͸ΦϯϥΠϯͰ͋Δ͜ͱʢΞϥʔτ͕ड͚औΕΔΑ͏ʹʣɺΞϥʔτൃใ࣌ ʹଈ࣌ʹରԠͰ͖Δ͜ͱ • ྫ͑͹ɺ౰൪த͸өըΛ؍ͨΓɺӡస͢ΔͳͲ͸ආ͚Δ • ඞཁͳ࡞ۀखॱ͸WikiʹυΩϡϝϯτͱͯ͠هࡌɻπʔϧԽͰखॱΛγϯϓϧʹɻ 11
  12. SRE͕ؔΘ͍ͬͯΔ͜ͱ • ෛՙରࡦ • ίʔυվળɺDB෼ׂɺϨϏϡʔ • ϋʔυ΢ΣΞબఆɺεέʔϧΞ΢τ/ΞοϓɺKernel΍ϛυϧ΢ΣΞͷνϡʔχϯά • ؂ࢹɺ౰൪ •

    ؂ࢹπʔϧͷվળ • ιϑτ΢ΣΞͷϝτϦΫεҎ֎ʹిྗ΋ؾʹ͢Δ • ো֐ରԠ • Ϋϥ΢υো֐ɺϋʔυ΢ΣΞނো • σʔλΠϯϙʔτ • ήʔϜσʔλɺϦιʔεͷߋ৽ • σϓϩΠ • εςʔδϯάɺຊ൪΁ͷίʔυσϓϩΠ 12
  13. SRE͕ؔΘ͍ͬͯΔ͜ͱ • WebαΠτߏஙɺCDNઃఆ • ChefͰCMSߏஙɺCloudFrontઃఆ • ϝϯςφϯε • όʔδϣϯΞοϓϝϯςφϯε •

    ։ൃ؀ڥɺCI؀ڥ • ։ൃ༻ΠϯελϯεɺCIπʔϧʢJenkinsͳͲʣͷ੔උ • ηΩϡϦςΟରࡦ • ੬ऑੑ਍அґཔɺ੬ऑੑͷ͋Διϑτ΢ΣΞͷΞοϓσʔτͳͲ • πʔϧ։ൃ • νϟοτϘοτɺCLIπʔϧͳͲ • ֤छ૬ஊ • ৽Ωϟϯϖʔϯͷෛՙ૬ஊͳͲ 13
  14. SRE͕ࢧ͍͑ͯΔϞϯελʔετϥΠΫͷ؀ڥʢ࠷৽൛ʣ 14

  15. ϞϯελʔετϥΠΫ೔ຊ൛ͷΠϯϑϥ 15 DC1 DC2 GMO ΞϓϦΫϥ΢υ AWS App/Batch/DB/ Memcached/Redis DB/Memcached/Redis

    Backup App App ن໛ͱͯ͠͸1,400୆͘Β͍ DC1-2ؒ͸40Gbps ৗ࣌ DB 350୆ɺApp 10,000ίΞલޙʢ৔߹ʹΑͬͯ૿ݮʣɺMemcached 90୆લޙఔ౓ ΦϯϓϨϛε Ϋϥ΢υ
  16. ΦϯϓϨϛε؀ڥ • ෳ਺ͷDCΛར༻͠ɺσʔλετΞͷόοΫΞοϓʢϨϓϦέʔγϣϯʣΛ഑ஔ • App/DBαʔόͱͯ͠24ʙ56ίΞʢXeon E5-2670v4ʣͷϚγϯɺMemcached͸8ίΞͷϚγϯ • OSΠϯετʔϧɺ࠶Πϯετʔϧ͸Cobbler+KoanͰϦϞʔτ࣮ߦ • αʔόͱͯ͠࠷খߏ੒ͷ؀ڥ͕Ͱ͖͕͋Δ

    • ߴෛՙͳDBαʔόͰ͸ɺioMemory SX350 1.3TB΍ioDriveΛ׆༻ʢNVMe SSD΋ࢹ໺ʣ • ނো཰ΛԼ͛ΔͨΊʹɺجຊతʹͲΜͳαʔόͰ΋SSDΛ࢖͍ɺSASͳͲͷ࣓ؾσΟεΫσόΠ ε͸࢖Θͳ͍ʢ౰વɺSSDͰ΋յΕΔ࣌͸յΕΔʣ • ϋʔυ΢ΣΞRAID͸࢖Θͳ͍ɺιϑτ΢ΣΞRAID͸΄ͱΜͲ࢖Θͳ͍ • SRE͕௚઀DCͰ࡞ۀΛ͢Δ͜ͱ͸͋·Γͳ͍ • ֤ϥοΫͷిྗ࢖༻ྔΛ࠷దԽͨ͠αʔό഑ஔΛ৺͕͚͍ͯΔ 16
  17. Ϋϥ΢υ؀ڥ • ΦϯϓϨϛεͷ؀ڥͱઐ༻ઢͰ઀ଓ͠ɺϓϥΠϕʔτͰ௨৴Ͱ͖ΔΑ͏ʹ • ֤DCͱΫϥ΢υͷϨΠςϯγେࣄ • APIͱυΩϡϝϯτʢͰ͖ͨΒαϯϓϧίʔυ΋ʣͷॆ࣮ͨ͠Ϋϥ΢υ͸ѻ͍΍͍͢ • AWSɺGMOΞϓϦΫϥ΢υɺͲͪΒ΋APIΛ࢖ͬͨಠࣗπʔϧʹΑͬͯૢ࡞ •

    100୆Ұؾʹىಈͯ͠αʔϏεΠϯͤ͞Δ͜ͱ΋ • GMOΞϓϦΫϥ΢υ͸ݱঢ়Appαʔόͱͯ͠ͷΈར༻ʢݱࡏ40ίΞͷλΠϓΛϝΠϯʹར༻ʣ • ϚϧνϓϨΠͰ༻͍ΔTURNαʔό͸AWSͰӡ༻ • ։ൃ؀ڥɺεςʔδϯά؀ڥ͸AWSʹ౷Ұ • ৚͕݅߹͑͹ͲΜͳΫϥ΢υͰ΋׆༻͍ͨ͠ 17
  18. ΞʔΩςΫνϟʢ؆қ൛ʣ 18 A10 Load Balancer Unicorn Fluentd Redis MariaDB Memcached

    Batch Worker Cron APIΞΫηε
  19. CDN • σʔλϦιʔε • AkamaiɺCloudFrontΛซ༻ • ར༻ൺ཰͸αʔόͷConfigΛσϓϩΠ੍ͯ͠ޚ • ঢ়گʹΑͬͯར༻ൺ཰Λม͑Δ͜ͱ΋ʢۃكʣ •

    ΦϦδϯ͸Amazon S3 • WebαΠτ • CloudFrontʹΩϟογϡ • ΦϦδϯ͸Amazon EC2 19
  20. Provisioning • جຊ͸ChefʢҰ෦Ansibleʣ • ಠࣗaptϨϙδτϦ • AptlyΛ࢖ͬͯS3্ʹߏங • ࣗ࡞πʔϧͷdebύοέʔδԽ •

    Chefͷ՝୊ • ϝϯς͞Εͳ͍Cookbookɺdeprecated warningͷཛྷ • ChefͷϝδϟʔόʔδϣϯΞοϓͷλΠϛϯά • Կ͔͍͍ํ๏͸ͳ͍͔໛ࡧத • ผͷProvisioning Tool΁ͷҠߦίετ໰୊ 20
  21. ϩάసૹ • Fluentd • Amazon S3΁సૹ͠ɺղੳιʔεͱͯ͠ར༻ • Elasticsearch΁సૹ͠ɺKibanaΛ؆қతͳϩάղੳπʔϧͱͯ͠ • td-agent

    3΁ͷҠߦʢ·ͩstable଴ͪʣ 21
  22. SRE͕΍͖ͬͯͨ͜ͱʢൈਮʣ 22

  23. SRE͕΍͖ͬͯͨ͜ͱʢൈਮʣ • ௨ৗͷӡ༻ۀ຿ɺෛՙରࡦҎ֎Ͱ΍͖ͬͯͨ͜ͱͷൈਮʢৄ͘͠͸࣍ͷύʔτͰ঺հ&ࢀߟࢿྉʣ • DBγϟʔσΟϯάʢαʔϏεແఀࢭʣ • THPʢTransparent Huge Pageʣ໰୊ͷௐࠪɺղܾ •

    Kernel΍MySQLͷνϡʔχϯάʹΑΔIOվળ • ιϑτ΢ΣΞΞοϓσʔτɺϦϓϨΠε • Memcachedʢ1.4.37ɺmodernΦϓγϣϯར༻ʣ • Ubuntu ServerɺNginxɺElasticsearchɺKibana • ֤छϋʔυ΢ΣΞɺΠϯελϯελΠϓ • ࢀߟ • https://www.slideshare.net/FumihiroIto/sre-78912803 • https://speakerdeck.com/isaoshimizu/sregurupugadekitekofalseban-nian-jian-yatutekitakoto 23
  24. ·ͱΊ 24

  25. ·ͱΊ • XFLAG ελδΦͷࣄۀ͸೔ʑ֦େத • SREͱͯ͠ͷ΍Δ΂͖͜ͱɺ՝୊͸ଟ͋͘Δ • ࢓ࣄʹରͯ͠SRE͸͜͏͔ͩΒͱนΛઃ͚ͳ͍ • ࣄۀʹੵۃతʹؔΘ͍ͬͯ͘ɺߩݙ͢Δ

    • ιϑτ΢ΣΞΤϯδχΞϦϯάͰ༷ʑͳ՝୊Λղܾ͢Δ 25
  26. We're hiring!! XFLAG ελδΦͷSREʹڵຯ͕͋Δํ͸͓ؾܰʹ͓੠͕͚͍ͩ͘͞ https://xflag.com/recruit/engineer/404.html ※ϑΝΠϧ໊͕404ͳͷ͸ۮવͰ͢ʢসʣ 26

  27. Thank you!