$30 off During Our Annual Pro Sale. View Details »

XFLAG スタジオにおけるSRE / xflag-studio-sre

XFLAG スタジオにおけるSRE / xflag-studio-sre

第76回: SRE大全: XFLAG スタジオ編

Isao Shimizu

August 24, 2017
Tweet

More Decks by Isao Shimizu

Other Decks in Technology

Transcript

  1. XFLAG ελδΦʹ͓͚ΔSRE
    XFLAG ࣄۀຊ෦ ήʔϜ։ൃࣨ SREάϧʔϓ
    ਗ਼ਫ ܄ @isaoshimizu
    hbstudy ୈ76ճɿ SREେશ: XFLAG ελδΦฤ
    XFLAG STUDIO

    View Slide

  2. ࣗݾ঺հ
    2

    View Slide

  3. About me
    • ਗ਼ਫ ܄ / Isao SHIMIZU @isaoshimizu
    • גࣜձࣾϛΫγΟ XFLAG ࣄۀຊ෦ ήʔϜ։ൃࣨ SREάϧʔϓ
    • SIerͰडୗ։ൃɺࣗࣾϓϩμΫτ։ൃɺӡ༻Λ໿8೥
    • גࣜձࣾϛΫγΟ
    • 2011.8ʙ ӡ༻෦ ΞϓϦӡ༻άϧʔϓɻSNSͷӡ༻ɻ
    • Fedora 8 -> 17ΞοϓσʔτɺsystemdಋೖɺLXCಋೖɺͦͷଞվળͳͲ
    • 2014.4ʙ ϞϯελʔετϥΠΫͷӡ༻ʹδϣΠϯɻ
    • 2015.8ʙ XFLAG ελδΦ͕૑ઃ͞ΕΔɻ
    • 2016.7ʙ XFLAG ελδΦʹSREάϧʔϓ૑ઃɻ
    • ϓϥΠϕʔτ͸ɺΫϥϑτϏʔϧ ɺָثԋ૗ʢTromboneʣɺि຤ྉཧɺήʔϜͳͲ
    3

    View Slide

  4. XFLAG ελδΦʹ͍ͭͯ
    4

    View Slide

  5. XFLAG ελδΦ
    • εϚʔτϑΥϯ޲͚ήʔϜ
    • ϞϯελʔετϥΠΫʢ2013.10ʙʣ
    • ϞϯετελδΞϜʢ2015.4ʙʣ
    • ϑΝΠτϦʔάʢ2017.6ʙʣ
    • ಈը
    • ϞϯετΞχϝ
    • YouTube഑৴ʢ2017.6.14ʹੈքྦྷܭ࠶ੜճ਺2ԯճಥഁʣ
    • ࡢ೥຤ʹ͸ܶ৔൛΋ެ։
    • XFLAG STORE SHIBUYAʢৗઃళฮʣɺXFLAG STOREʢΦϯϥΠϯετΞʣ
    • ͦͷଞ
    5

    View Slide

  6. SREʹ͍ͭͯ
    6

    View Slide

  7. Googleʹ͓͚ΔSRE
    43&͸ɺྺ࢙తʹΦϖϨʔγϣϯۀ຿Λجຊతʹߦ͍ͬͯΔ͕ɺ

    ਓखͰߦ͍ͬͯͨ͜ͱΛࣗಈԽʹஔ͖׵͑Διϑτ΢ΣΞΤϯδχΞͷ໾ׂ΋୲͍ͬͯΔ

    https://landing.google.com/sre/interview/ben-treynor.html
    "Fundamentally, it's what happens when you ask a software engineer to design an operations function." Ben Treynor Sloss, Vice President, Google Engineering, founder of
    Google SRE
    جຊతʹɺιϑτ΢ΣΞΤϯδχΞʹΦϖϨʔγϣϯۀ຿ͷઃܭΛґཔ͢Δͱ͖ʹඞཁͱ͞ΕΔ΋ͷ
    "Traditional software engineers tend to focus on one particular system, and understand it in great depth. Software engineers in Site Reliability Engineering tend to spread their
    focus across a broad range of systems." Nida Farrukh, Site Reliability Engineer, Zurich
    ैདྷ͔Βͷιϑτ΢ΣΞΤϯδχΞ͸ɺ͋ΔͭͷγεςϜʹϑΥʔΧε͠ɺਂ͘ཧղ͢Δ܏޲͕͋Δɻ43&͸෯޿͍γεςϜʹϑΥʔΧε͢Δ܏޲͕͋Δɻ
    "Our work is like being part of the world's most intense pit crew. We change the tires of a race car as it's going 100 mph." Andrew Widdowson, Site Reliability Engineer,
    Mountain View
    ࢲୡ43&ͷ࢓ࣄ͸ੈքͰ࠷΋ܹ͍͠ϐοτΫϧʔͷҰһͰ͋ΔΑ͏ͳ΋ͷɻզʑ͸ɺϚΠϧຖ࣌ʢ࣌଎໿LNʣͰ૸ΔϨʔεΧʔͷλΠϠΛม͑Δɻ
    "SREs engineer services, instead of binaries. This is a shift in perspective that exploits unusual skills and creativity. SREs are specialists in making changes safely." John T.
    Reese, Site Reliability Engineer, San Francisco
    όΠφϦͰ͸ͳ͘43&͕ఏڙ͢Δ΋ͷɻ͜Ε͸ಛघͳεΩϧͱ૑଄ੑΛੜΈग़͢มԽͰ͋Δɻ43&͸҆શʹมԽΛ΋ͨΒ͢εϖγϟϦετͰ͋Δɻ
    https://landing.google.com/sre/
    7

    View Slide

  8. ֤ࣾͷSREࣄ৘
    • WantedlyͰืूΛSREͰݕࡧ͢Δͱ60݅ώοτ
    • https://www.wantedly.com/search?t=projects&q=SRE
    • Rettyʮैདྷͷӡ༻ͱ͞΄ͲมΘΒͳ͍ʯʮ։ൃੜ࢈ੑϓϩδΣΫτʯ
    • http://itpro.nikkeibp.co.jp/atcl/column/14/346926/030600869/
    • freeeʮΠϯϑϥ෦ୂΛղࢄ͢Δͷ͕໨ඪʯ ʮՔಇ཰99.9%Λ໨ࢦ͢͜ͱʯ
    • http://itpro.nikkeibp.co.jp/atcl/column/14/346926/030600869/
    • ϝϧΧϦʮӡ༻ۀ຿ͱιϑτ΢ΣΞΤϯδχΞͷ໾ׂ͕ٻΊΒΕΔʯ
    • http://tech.mercari.com/entry/2015/11/18/153421
    • αΠϘ΢ζʮ໾ׂΛݻఆ͗͢͠ͳ͍ʯʮιϑτ΢ΣΞ։ൃεΩϧΛຏ͘ʯʮToil͸ͳ͍ͯ͘͘͠ʯ
    • http://blog.cybozu.io/entry/2016/09/01/080000
    8

    View Slide

  9. XFLAG ελδΦʹ͓͚ΔSRE
    9

    View Slide

  10. XFLAG ελδΦʹ͓͚ΔSREͱ͍͏૊৫
    • SREάϧʔϓ͕Ͱ͖ͯ໿1೥ʢ2016.7݄ελʔτʣ
    • ਓ਺͸7໊ʢ2017.8࣌఺ʣ
    • όοΫάϥ΢ϯυ͸༷ʑ
    • ৽ଔೖࣾͱத్ೖ͕ࣾ൒ʑ͘Β͍
    • ಘҙ෼໺͸ͦΕͧΕɻϑϧελοΫΛٻΊͯ͸͍ͳ͍ɻ
    • Ϛωʔδϟʔ͸͓͍͍ͯͳ͍
    • ΍Δ΂͖࢓ࣄ͸ࣗ෼Ͱݟ͚ͭɺೳಈతʹ࣮ߦ͢Δ
    • ༩͑ΒΕͨ࢓ࣄ͚ͩ͜ͳ͍ͯͯ͠΋ධՁ͸͞Εͳ͍ʢ༩͑ΒΕΔ͜ͱ͸كʣ
    • ۀ຿্ͷίϛϡχέʔγϣϯ͸Slack͕த৺
    • ࡞ۀ͸GitHub Issue/Pull Requestͱͯ͠Ξ΢τϓοτ͠ɺٞ࿦΍ϨϏϡʔΛ͓͜ͳ͏
    10

    View Slide

  11. ౰൪੍
    • 2ਓମ੍Ͱ1िؒަ୅ɻ೔༵೔࢝·Γ౔༵೔ऴΘΓʢຖ݄1ճ͘Β͍ͷϖʔεʣ
    • جຊతʹ1࣍ରԠ
    • PagerDutyͷ׆༻
    • ༷ʑͳ௨஌ʢి࿩ɺϝʔϧɺϓογϡͳͲʣ
    • ౰൪͕௨஌ʹؾ͔ͮͳ͔ͬͨ৔߹ɺ౰൪֎΁ࣗಈΤεΧϨʔγϣϯ
    • ౰൪த͸ΦϯϥΠϯͰ͋Δ͜ͱʢΞϥʔτ͕ड͚औΕΔΑ͏ʹʣɺΞϥʔτൃใ࣌
    ʹଈ࣌ʹରԠͰ͖Δ͜ͱ
    • ྫ͑͹ɺ౰൪த͸өըΛ؍ͨΓɺӡస͢ΔͳͲ͸ආ͚Δ
    • ඞཁͳ࡞ۀखॱ͸WikiʹυΩϡϝϯτͱͯ͠هࡌɻπʔϧԽͰखॱΛγϯϓϧʹɻ
    11

    View Slide

  12. SRE͕ؔΘ͍ͬͯΔ͜ͱ
    • ෛՙରࡦ
    • ίʔυվળɺDB෼ׂɺϨϏϡʔ
    • ϋʔυ΢ΣΞબఆɺεέʔϧΞ΢τ/ΞοϓɺKernel΍ϛυϧ΢ΣΞͷνϡʔχϯά
    • ؂ࢹɺ౰൪
    • ؂ࢹπʔϧͷվળ
    • ιϑτ΢ΣΞͷϝτϦΫεҎ֎ʹిྗ΋ؾʹ͢Δ
    • ো֐ରԠ
    • Ϋϥ΢υো֐ɺϋʔυ΢ΣΞނো
    • σʔλΠϯϙʔτ
    • ήʔϜσʔλɺϦιʔεͷߋ৽
    • σϓϩΠ
    • εςʔδϯάɺຊ൪΁ͷίʔυσϓϩΠ
    12

    View Slide

  13. SRE͕ؔΘ͍ͬͯΔ͜ͱ
    • WebαΠτߏஙɺCDNઃఆ
    • ChefͰCMSߏஙɺCloudFrontઃఆ
    • ϝϯςφϯε
    • όʔδϣϯΞοϓϝϯςφϯε
    • ։ൃ؀ڥɺCI؀ڥ
    • ։ൃ༻ΠϯελϯεɺCIπʔϧʢJenkinsͳͲʣͷ੔උ
    • ηΩϡϦςΟରࡦ
    • ੬ऑੑ਍அґཔɺ੬ऑੑͷ͋Διϑτ΢ΣΞͷΞοϓσʔτͳͲ
    • πʔϧ։ൃ
    • νϟοτϘοτɺCLIπʔϧͳͲ
    • ֤छ૬ஊ
    • ৽Ωϟϯϖʔϯͷෛՙ૬ஊͳͲ
    13

    View Slide

  14. SRE͕ࢧ͍͑ͯΔϞϯελʔετϥΠΫͷ؀ڥʢ࠷৽൛ʣ
    14

    View Slide

  15. ϞϯελʔετϥΠΫ೔ຊ൛ͷΠϯϑϥ
    15
    DC1 DC2
    GMO
    ΞϓϦΫϥ΢υ AWS
    App/Batch/DB/
    Memcached/Redis
    DB/Memcached/Redis
    Backup
    App App
    ن໛ͱͯ͠͸1,400୆͘Β͍
    DC1-2ؒ͸40Gbps
    ৗ࣌ DB 350୆ɺApp 10,000ίΞલޙʢ৔߹ʹΑͬͯ૿ݮʣɺMemcached 90୆લޙఔ౓
    ΦϯϓϨϛε Ϋϥ΢υ

    View Slide

  16. ΦϯϓϨϛε؀ڥ
    • ෳ਺ͷDCΛར༻͠ɺσʔλετΞͷόοΫΞοϓʢϨϓϦέʔγϣϯʣΛ഑ஔ
    • App/DBαʔόͱͯ͠24ʙ56ίΞʢXeon E5-2670v4ʣͷϚγϯɺMemcached͸8ίΞͷϚγϯ
    • OSΠϯετʔϧɺ࠶Πϯετʔϧ͸Cobbler+KoanͰϦϞʔτ࣮ߦ
    • αʔόͱͯ͠࠷খߏ੒ͷ؀ڥ͕Ͱ͖͕͋Δ
    • ߴෛՙͳDBαʔόͰ͸ɺioMemory SX350 1.3TB΍ioDriveΛ׆༻ʢNVMe SSD΋ࢹ໺ʣ
    • ނো཰ΛԼ͛ΔͨΊʹɺجຊతʹͲΜͳαʔόͰ΋SSDΛ࢖͍ɺSASͳͲͷ࣓ؾσΟεΫσόΠ
    ε͸࢖Θͳ͍ʢ౰વɺSSDͰ΋յΕΔ࣌͸յΕΔʣ
    • ϋʔυ΢ΣΞRAID͸࢖Θͳ͍ɺιϑτ΢ΣΞRAID͸΄ͱΜͲ࢖Θͳ͍
    • SRE͕௚઀DCͰ࡞ۀΛ͢Δ͜ͱ͸͋·Γͳ͍
    • ֤ϥοΫͷిྗ࢖༻ྔΛ࠷దԽͨ͠αʔό഑ஔΛ৺͕͚͍ͯΔ
    16

    View Slide

  17. Ϋϥ΢υ؀ڥ
    • ΦϯϓϨϛεͷ؀ڥͱઐ༻ઢͰ઀ଓ͠ɺϓϥΠϕʔτͰ௨৴Ͱ͖ΔΑ͏ʹ
    • ֤DCͱΫϥ΢υͷϨΠςϯγେࣄ
    • APIͱυΩϡϝϯτʢͰ͖ͨΒαϯϓϧίʔυ΋ʣͷॆ࣮ͨ͠Ϋϥ΢υ͸ѻ͍΍͍͢
    • AWSɺGMOΞϓϦΫϥ΢υɺͲͪΒ΋APIΛ࢖ͬͨಠࣗπʔϧʹΑͬͯૢ࡞
    • 100୆Ұؾʹىಈͯ͠αʔϏεΠϯͤ͞Δ͜ͱ΋
    • GMOΞϓϦΫϥ΢υ͸ݱঢ়Appαʔόͱͯ͠ͷΈར༻ʢݱࡏ40ίΞͷλΠϓΛϝΠϯʹར༻ʣ
    • ϚϧνϓϨΠͰ༻͍ΔTURNαʔό͸AWSͰӡ༻
    • ։ൃ؀ڥɺεςʔδϯά؀ڥ͸AWSʹ౷Ұ
    • ৚͕݅߹͑͹ͲΜͳΫϥ΢υͰ΋׆༻͍ͨ͠
    17

    View Slide

  18. ΞʔΩςΫνϟʢ؆қ൛ʣ
    18
    A10 Load
    Balancer
    Unicorn
    Fluentd
    Redis
    MariaDB
    Memcached
    Batch

    Worker
    Cron
    APIΞΫηε

    View Slide

  19. CDN
    • σʔλϦιʔε
    • AkamaiɺCloudFrontΛซ༻
    • ར༻ൺ཰͸αʔόͷConfigΛσϓϩΠ੍ͯ͠ޚ
    • ঢ়گʹΑͬͯར༻ൺ཰Λม͑Δ͜ͱ΋ʢۃكʣ
    • ΦϦδϯ͸Amazon S3
    • WebαΠτ
    • CloudFrontʹΩϟογϡ
    • ΦϦδϯ͸Amazon EC2
    19

    View Slide

  20. Provisioning
    • جຊ͸ChefʢҰ෦Ansibleʣ
    • ಠࣗaptϨϙδτϦ
    • AptlyΛ࢖ͬͯS3্ʹߏங
    • ࣗ࡞πʔϧͷdebύοέʔδԽ
    • Chefͷ՝୊
    • ϝϯς͞Εͳ͍Cookbookɺdeprecated warningͷཛྷ
    • ChefͷϝδϟʔόʔδϣϯΞοϓͷλΠϛϯά
    • Կ͔͍͍ํ๏͸ͳ͍͔໛ࡧத
    • ผͷProvisioning Tool΁ͷҠߦίετ໰୊
    20

    View Slide

  21. ϩάసૹ
    • Fluentd
    • Amazon S3΁సૹ͠ɺղੳιʔεͱͯ͠ར༻
    • Elasticsearch΁సૹ͠ɺKibanaΛ؆қతͳϩάղੳπʔϧͱͯ͠
    • td-agent 3΁ͷҠߦʢ·ͩstable଴ͪʣ
    21

    View Slide

  22. SRE͕΍͖ͬͯͨ͜ͱʢൈਮʣ
    22

    View Slide

  23. SRE͕΍͖ͬͯͨ͜ͱʢൈਮʣ
    • ௨ৗͷӡ༻ۀ຿ɺෛՙରࡦҎ֎Ͱ΍͖ͬͯͨ͜ͱͷൈਮʢৄ͘͠͸࣍ͷύʔτͰ঺հ&ࢀߟࢿྉʣ
    • DBγϟʔσΟϯάʢαʔϏεແఀࢭʣ
    • THPʢTransparent Huge Pageʣ໰୊ͷௐࠪɺղܾ
    • Kernel΍MySQLͷνϡʔχϯάʹΑΔIOվળ
    • ιϑτ΢ΣΞΞοϓσʔτɺϦϓϨΠε
    • Memcachedʢ1.4.37ɺmodernΦϓγϣϯར༻ʣ
    • Ubuntu ServerɺNginxɺElasticsearchɺKibana
    • ֤छϋʔυ΢ΣΞɺΠϯελϯελΠϓ
    • ࢀߟ
    • https://www.slideshare.net/FumihiroIto/sre-78912803
    • https://speakerdeck.com/isaoshimizu/sregurupugadekitekofalseban-nian-jian-yatutekitakoto
    23

    View Slide

  24. ·ͱΊ
    24

    View Slide

  25. ·ͱΊ
    • XFLAG ελδΦͷࣄۀ͸೔ʑ֦େத
    • SREͱͯ͠ͷ΍Δ΂͖͜ͱɺ՝୊͸ଟ͋͘Δ
    • ࢓ࣄʹରͯ͠SRE͸͜͏͔ͩΒͱนΛઃ͚ͳ͍
    • ࣄۀʹੵۃతʹؔΘ͍ͬͯ͘ɺߩݙ͢Δ
    • ιϑτ΢ΣΞΤϯδχΞϦϯάͰ༷ʑͳ՝୊Λղܾ͢Δ
    25

    View Slide

  26. We're hiring!!
    XFLAG ελδΦͷSREʹڵຯ͕͋Δํ͸͓ؾܰʹ͓੠͕͚͍ͩ͘͞
    https://xflag.com/recruit/engineer/404.html
    ※ϑΝΠϧ໊͕404ͳͷ͸ۮવͰ͢ʢসʣ
    26

    View Slide

  27. Thank you!

    View Slide