Upgrade to Pro — share decks privately, control downloads, hide ads and more …

マイクロサービスとSRECon / #microserv

マイクロサービスとSRECon / #microserv

8/9 に FiNC 社で開催された Microservices Meetup vol.2 で、マイクロサービスと SRECon という話をしました。

2db33f44183cdc9ea0ec523924cab3a0?s=128

Takumi Sakamoto

August 09, 2016
Tweet

Transcript

  1. ϚΠΫϩαʔϏεͱSRECon ~ SRECon'16 Wrap Up ~ Takumi Sakamoto @takus

  2. ࣗݾ঺հ • ࡔຊ୎າ (@takus) • SRE @ εϚʔτχϡʔεגࣜձࣾ • ࠷ۙͷڵຯ

    : OLAP data store (ಛʹ druid.io) • ࠷ۙͷझຯɿଉࢠ (5 ݄ʹര஀) ͱ༡ͿɺҭࣇຊΛಡΉ
  3. Slack ຊΛॻ͖·ͨ͠ Slackೖ໳ [ChatOpsʹΑΔνʔϜ։ൃͷޮ཰Խ]͸Slackʹ͸͡Ίͯ࢖͍͸͡ΊΔਓʹ΋ಡΜͰ΋Β͍͍ͨ ՄѪ͍ݟͨ໨ͷSlackೖ໳ [ChatOpsʹΑΔνʔϜ։ൃͷޮ཰Խ]த਎΋ՄѪౕ͍ͩͥ ෼͔Γ΍͍͢ʂSlackͷॳ৺ऀ͔Βߋʹ׆༻͍ͨ͠தڃऀ·Ͱɺղઆ͕ඇৗʹॆ࣮ͨ͠Φεεϝͷຊ -Slackೖ໳ ʲॻධʳʮSlackೖ໳ʙ ChatOpsʹΑΔνʔϜ։ൃͷޮ཰Խʙʯ

    Slack͸؀ڥͰ͋Δ ʙॻධ : ʮSlackೖ໳ ChatOpsʹΑΔνʔϜ։ൃͷޮ཰Խʯʙ
  4. AWS ͷϒϩάʹدߘ͠·ͨ͠ How SmartNews Built a Lambda Architecture on AWS

    to Analyze Customer Behavior and Recommend Content
  5. SmartNews • News discovery app for mobile • Algorithm-driven article

    selection • 18M+ downloads in world wide https://www.smartnews.com/
  6. εϚχϡʔͱϚΠΫϩαʔϏε • ͍ΘΏΔϚΠΫϩαʔϏεΞʔΩςΫνϟͰ͸ͳ͍(ͱࢥ͏) • χϡʔεϓϩμΫτͱΞυϓϩμΫτ • ͦΕͧΕͷதʹ APIɺσʔλղੳج൫ɺetc... • ϝΠϯͷ

    API ͸ׂͱେ͖͍ • ৄࡉ͸࣍εϥΠυͷࢀߟࢿྉΛ͝ཡ͍ͩ͘͞
  7. ࢀߟࢿྉ1 SmartNewsͷχϡʔε഑৴Λࢧ͑Δαʔόٕज़

  8. ࢀߟࢿྉ2 SmartNews TechNight vol5 SmartNews Adsେਤղ

  9. εϚχϡʔࣗओ౉ߤ঑ྭ੍౓ • ൒ظ͝ͱʹ 1 ճɺSan Francisco ·ͨ͸ New York ΦϑΟε

    ͷ๚໰ɺ·ͨ͸ΧϯϑΝϨϯεࢀՃͷͨΊͷւ֎౉ߤʹ͔͔ Δߤۭ݊අɺަ௨අɺ॓ധඅɺ௨৴අɺւ֎౉ߤอݥɺΧϯ ϑΝϨϯεɾֶձ౳ࢀՃඅΛෛ୲ͯ͘͠ΕΔࣾ಺੍౓ • ΪϣʔϜʹ௚઀ؔ܎ͳ͍ΧϯϑΝϨϯεͰ΋ OK
  10. • Conference for Site Reliability Engineers (SRE) • April 7-8,

    2016 in Santa Clara, CA. • 600+ attendees https://www.usenix.org/conference/srecon16
  11. SRECon ͱ͸ʁ • Site Reliability Engineer (SRE) ͷͨΊͷΧϯϑΝϨϯε • ࠓ೥ͷ

    4/7 - 8 ʹΧϦϑΥϧχΞभαϯλΫϥϥͰ։࠵ • ΞϝϦΧࠃ಺Λத৺ʹ 600 ໊ఔ౓ͷࢀՃऀ
  12. ௌߨͨ͠ηογϣϯͷҰ෦ • Netflix: 190 Countries and 5 CORE SREs •

    Panel: Who/What Is SRE? • Shaping Reality to Shape Outcomes: Making SRE Work with Uber Growth • nrrd 911 ic me: The Incident Commander Role • A Young Lady's Illustrated Primer to Technical Decision-Making • Continuous Deployment to Millions of Users 40 Times a Day • Finding the Order in Chaos • Performance Checklists for SREs • Doorman: Global Distributed Client Side Rate Limiting • Running Consul at Scale—Journey from RFC to Production • Panel: SRE Managers
  13. "Microservices" ͷݕࡧ݁Ռ https://www.usenix.org/conference/srecon16/program

  14. Ͱ΋ɺϚΠΫϩαʔϏε͸ΞλϦϚΤ ⬇ൃදऀͷձࣾͷϒϩά΍εϥΠυ⬇ • Netflix • MicroServices at Netflix - challenges

    of scale • Uber • Service-Oriented Architecture: Scaling Our Codebase As We Grow • Fastly • Microservices war stories
  15. ϚΠΫϩαʔϏεʹର͢Δ SRE ͷؔΘΓํ (@kenjiszk ͞ΜͷൃදͱඃͬͨΒΰϝϯͳ͍͞)

  16. Netflix: 190 Countries and 5 CORE SREs / USENIX SRECon'16

  17. Freedom & Responsibility

  18. Ͳ͜·Ͱࣗ༝ͳͷ͔ฉ͍ͯΈͨ Q. Freedom ͬͯݴͬͯΔ͚ͲɺͲ͜·Ͱࣗ༝ͳͷʁ • جຊతʹ֤αʔϏεͷ͜ͱ͸શͯ։ൃνʔϜʹ೚ͤΔ • Ͳͷ։ൃऀ΋ Netflix ͷγεςϜΛյͤΔ΄ͲͷΞΫηεݖ

    ݶ͕༩͑ΒΕ͍ͯΔ Q. ҙਤ͠ͳ͍ૢ࡞ͰαʔϏεΛഁյ͞ΕͨΓ͠ͳ͍ͷʁ • ͦ͜Ͱ SRE ͕࡞͍ͬͯΔπʔϧ͕ॏཁʹͳΔ • πʔϧ͕ศར͗͢ΔͷͰ։ൃऀ͸࢖͍ʹ͘͘ࣄނ΋ى͖΍ ͍͢ଞͷπʔϧΛબΜͩΓ͠ͳ͍
  19. Developers can run Ops(*) *If provided the tools and support

  20. ྫ: Spinnaker http://techblog.netflix.com/2015/11/global-continuous-delivery-with.html

  21. ࣾ಺޲͚πʔϧͷ։ൃ΋ ϓϩμΫτ։ൃͷͭ΋ΓͰ Overall, these SRE-developed tools are full-fledged software engineering

    projects, distinct from one-off solutions and quick hacks, and the SREs who develop them have adopted a product-based mindset that takes both internal customers and a roadmap for future plans into account. Chapter 18. Software Engineering in SRE - Site Reliability Engineering
  22. ྫ: εϚχϡʔࣾ಺ PaaS • ࣾ಺ͷ՝୊ΛΈ͚ͭΔ (՝୊͸݈ࡏԽͯ͠Δ͜ͱ΋ଟ͍) • ϓϩτλΠϓΛ࡞Δ (࡞ΓࠐΈա͗ͳ͍ɺMVP Λҙࣝ)

    • ࠷ॳͷސ٬ (։ൃऀ) ΛΈ͚ͭΔ • ϩʔυϚοϓΛ੔උɺ༏ઌ౓Λ͚ͭͯ։ൃ͍ͯ͘͠ • υοάϑʔσΟϯά͢Δ • ސ٬ͷ੠ʹࣖΛ܏͚Δɺސ٬ͷߦಈཤྺΛ௥͍͔͚Δ
  23. ͓٬༷͕෍ڭͯ͘͠ΕΔ͜ͱ΋

  24. ࣗ໰ࣗ౴ͷ೔ʑ (·ͩ·ͩෆे෼...orz) ࣗ෼͕ಋೖͨ͠ xxx ຊ౰ʹ࢖͍΍͍ͩ͢Ζ͏͔ʁ xxx = OSS πʔϧ /

    σϓϩΠγεςϜ / SaaS
  25. Shaping Reality to Shape Outcomes: Making SRE Work with Uber

    Growth / USENIX SRECon'16
  26. the school of hard knocks i have 99 problems, and

    reliability is 1
  27. ྫ: ϑΣΠϧΦʔόʔ ϓϩμΫτνʔϜ͸ ໓ଟʹى͖ͳ͍͜ͱʹ ࣌ؒΛׂ͖ʹ͍͘

  28. ఆظతʹىͯ͜͠ݱ࣮ײΛ

  29. ྑ͍श׳Λࣗવʹ࡞Δ࢓૊Έ ͋ΔछͷήʔϛϑΟέʔγϣϯʁ • Chaos Engineering / ϑΣΠϧΦʔόʔͷςετ • ఆظతʹյ͢͜ͱͰো֐Λҙࣝͯ͠΋Β͏ •

    յΕͳ͍Α͏ʹ޻෉͢Δ • σϓϩΠ࣌ʹ໰୊Λൃݟ͢ΔͱࣗಈϩʔϧόοΫ • ϑΣΠϧͨ͠෦෼ʹ͍ͭͯߟ͑ͯཧղͯ͠΋Β͏ • σϓϩΠͷͨΊʹΫϦΞ͠Α͏ͱ޻෉͢Δ
  30. nrrd 911 ic me: The Incident Commander Role / USENIX

    SRECon'16
  31. Incident Command System • ถࠃͰ։ൃ͞Εͨࡂ֐ݱ৔ɾࣄ݅ݱ৔ͳͲʹ͓͚Δඪ४Խ͞ ΕͨϚωδϝϯτɾγεςϜͷ͜ͱɻ໋ྩܥ౷΍؅ཧख๏͕ ඪ४Խ͞Ε͍ͯΔ఺͕ಛ௃ɻ1970೥୅ʹফ๷ʹΑΓ։ൃ͞ Εɺঃʑʹଞͷߦ੓ػؔͳͲͰͷར༻͕֦େ͠ɺσϑΝΫτ ελϯμʔυʹͳͬͨɻ ΠϯγσϯτɾίϚϯυɾγεςϜ

    / Wikipedia
  32. ICS ͷྫ http://www.wikiwand.com/en/Incident_Command_System

  33. ϓϩηεʹ߹Θͤͯ୯७Խ nrrd 911 ic me: The Incident Commander Role /

    USENIX SRECon 16
  34. nrrd 911 ic me: The Incident Commander Role / USENIX

    SRECon 16
  35. nrrd 911 ic me: The Incident Commander Role / USENIX

    SRECon 16
  36. nrrd 911 ic me: The Incident Commander Role / USENIX

    SRECon 16
  37. Կ͕Α͍ͷ͔ʁ • ݸʑͷ໾ׂ͕໌֬ʹఆٛ͞Ε͍ͯΔ • ৘ใϑϩʔͷ੍ޚ • શମ΁ͷӨڹΛߟ্ྀͨ͠Ͱͷ൑அ

  38. Ͳ͏΍ͬͯීٴ͍͔ͯ͘͠ʁ ChatOps

  39. Πϯγσϯτൃੜ

  40. εςʔλεߋ৽

  41. ސ٬ͱͷίϛϡχέʔγϣϯ

  42. ରԠ׬ྃ

  43. ৼΓฦΓͷ४උ

  44. ·ͱΊ

  45. SRE should be a Enabler • ։ൃऀ͕ࣗ෼ͨͪͰӡ༻͍͚ͯ͠ΔΑ͏ʹಓΛ࡞Δ • ։ൃऀʹ৴པੑʹ͍ͭͯҙࣝͯ͠΋Β͏Α͏ʹ͢Δ •

    ྑ͍ϓϥΫςΟε͕ࣗવʹ࣮ફ͞ΕΔ૊৫ʹ͢Δ
  46. SRE should not be a Servant • Google SRE's 50%

    ϧʔϧ • ӡ༻ΛؚΊͨٿर͍తͳ࢓ࣄΛ 50% Ҏ্΍Βͳ͍ • ٿर͍ͯ͠ʮಇ͍ͨؾʯʹͳ͍ͬͯ·ͤΜ͔ʁ • Կ͔ΛࠜຊతʹΑ͘͢Δ࣌ؒΛࣦ͍ͬͯΔ͔΋ • λεΫͷ༏ઌॱҐΛͪΌΜͱҙࣝ͢Δ • ࣌ʹ͸ No Λݴ͏͜ͱ΋େ੾ • ΋ͪΖΜো֐ରԠͷΞγετͳͲ͸༏ઌ
  47. Be a Enabler!!! https://www.wantedly.com/projects/48033