Upgrade to Pro — share decks privately, control downloads, hide ads and more …

はてなでの サービス信頼性向上のための 取り組み事例

dekokun
July 25, 2016

はてなでの サービス信頼性向上のための 取り組み事例

SRE Tech Talks ( http://connpass.com/event/34825/ ) でお話した際の資料です

dekokun

July 25, 2016
Tweet

More Decks by dekokun

Other Decks in Technology

Transcript

 1. ͸ͯͳͰͷ
  αʔϏε৴པੑ޲্ͷͨΊͷ
  औΓ૊Έࣄྫ
  SRE Tech Talks
  גࣜձࣾ͸ͯͳ
  id:dekokun

  View Slide

 2. ࣗݾ঺հ
  2

  View Slide

 3. ࣗݾ঺հ
  • id:dekokun
  • ͸ͯͳͰΠϯϑϥ@౦ژ
  • WebΦϖϨʔγϣϯΤϯδχΞ
  • ͸ͯͳͷ͍Ζ͍ΖͳαʔϏεͷΠϯϑϥ୲౰
  • ͸ͯͳϒϩάͱଞʹ΋͍͔ͭ͘
  • ೖࣾ1೥ऑ
  • લ৬͸PHPͱJSॻ͍ͯ·ͨ͠
  3

  View Slide

 4. 4
  • SREͬͯͦ΋ͦ΋ͳʹʁ
  • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

  WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
  • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
  • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
  • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
  • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
  • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
  ຊ೔ͷൃද಺༰

  View Slide

 5. 5
  • SREͬͯͦ΋ͦ΋ͳʹʁ
  • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

  WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
  • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
  • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
  • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
  • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
  • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
  ຊ೔ͷൃද಺༰

  View Slide

 6. SREͬͯͦ΋ͦ΋Կʁ
  6

  View Slide

 7. SREͱ͸
  • SRE Tech Talksͱ͍͏໊લͷΠϕϯτͰ

  ൃද͍ͤͯͨͩ͘͜͞ͱʹͳ͕ͬͨ

  ͦ΋ͦ΋SRE͕Կ͔Λৄ͘͠஌Βͳ͔ͬͨ
  • ࢲͷ৬छ΋”SRE”ͱ͍͏໊લͰ͸ͳ͍
  • SLAΛܾΊͯͦͷதͰ௅ઓ͍ͯ͘͠Πϝʔδ
  • ίʔυΛΑ͘ॻ͘Πϝʔδ
  • DevOpsͱ͸ҧ͏ͷ͔ͳʁ
  7

  View Slide

 8. SREͱ͸
  • Site Reliability EngineeringಡΈ࢝Ί·ͨ͠
  • ࣾ಺ྠಡձ࢝·Γ·ͨ͠
  • ·ͩchapter 1,2,4͔͠ಡΜͰ͍ͳ͍
  • ࠓճ͸ɺࢲͷൃදʹؔ܎͋Γͦ͏ͳ෦෼͚ͩ

  ܰ͘આ໌͓ͭͭ͠࿩Λ͠·͢
  • ҎԼɺ”SREຊ”ͱݺͼ·͢
  8
  Site Reliability Engineering: http://shop.oreilly.com/product/0636920041528.do

  View Slide

 9. SREͱ͸
  • ͸ͯͳʹ͸SREͱ͍͏ݞॻ͸͋Γ·ͤΜ͕ɺ

  ͸ͯͳͰ΋αʔϏεͷ৴པੑ޲্ͷͨΊʹ

  ೔ʑ༷ʑͳ͜ͱΛ͍ͯ͠·͢
  • ຊ೔͸ͦΕΒͷҰ෦Λ͝঺հ͍ͨ͠·͢
  9

  View Slide

 10. ༻ޠղઆ
  10

  View Slide

 11. ༻ޠղઆ
  • WebΦϖϨʔγϣϯΤϯδχΞ
  • ͍ΘΏΔΠϯϑϥ୲౰
  • ͜ͷࢿྉͰ͸”OpsΤϯδχΞ”
  • WebΞϓϦέʔγϣϯΤϯδχΞ
  • ͍ΘΏΔ։ൃΤϯδχΞ
  • ͜ͷࢿྉͰ͸”DevΤϯδχΞ”
  • ͸ͯͳʹ͓͍ͯ྆ऀ͕ߦ͏ࣄͷڥ໨͸͋Δఔ౓ᐆດ
  11

  View Slide

 12. 12
  • SREͬͯͦ΋ͦ΋ͳʹʁ
  • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

  WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
  • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
  • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
  • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
  • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
  • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
  ຊ೔ͷൃද಺༰

  View Slide

 13. OpsΤϯδχΞͱDevΤϯδχΞͷ
  ڠྗମ੍
  DBͷసૹྔ࡟ݮ
  13

  View Slide

 14. SREͱ͸(࠶ܝ)
  • “In SRE we bring this conflict to the fore, and then
  resolve it with the introduction of an error budget”
  • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy
  Beyerɺ Chris Jones Chapter 1 ΑΓ
  14

  View Slide

 15. opsΤϯδχΞͷαʔϏε΁ͷؔΘΓํ
  • SREຊʹΑΕ͹ɺSREͰ͸ͳ͍Ops͸Devͱ͸

  ໨ඪ͕ҟͳΔ͜ͱʹΑΓ໰୊͕ൃੜ͕ͪͩ͠ͱͷ͜ͱ
  • ͸ͯͳͰ͸͋·Γͳ͍Α͏ʹࢥ͑Δ
  • ͸ͯͳͰͷOpsΤϯδχΞͱDevΤϯδχΞͷ

  αʔϏε΁ͷؔΘΓํΛ঺հ͠·͢
  15

  View Slide

 16. OpsΤϯδχΞͷαʔϏε΁ͷؔΘΓํ
  WebΦϖϨʔγϣϯ
  ΤϯδχΞνʔϜ
  ֤αʔϏε
  ͸ͯͳϒϩά
  ͸ͯͳϒοΫϚʔΫ
  etc…
  • ֤αʔϏεʹOpsΤϯδχΞ͕

  ୲౰ͱͯͭ͘͠
  16

  View Slide

 17. OpsΤϯδχΞͷαʔϏε΁ͷؔΘΓํ
  • ͸ͯͳʹ͓͍ͯ྆ऀ͕ߦ͏ࣄͷڥ໨͸͋Δఔ౓ᐆດ
  • OpsΤϯδχΞ͕αʔϏεͷίʔυʹpull requestΛ

  ग़ͨ͠Γ
  • DevΤϯδχΞ͕chefΛ͍ͬͨ͡Γ͢Δ
  17

  View Slide

 18. ڠྗମ੍ࣄྫɿ
  DBͷసૹྔ࡟ݮ
  18

  View Slide

 19. DBసૹྔ࡟ݮࣄྫ
  • ͋Δ೔ɺmaster DB(MySQL)ͷసૹྔ͕

  ݶքۙ͘ͳ͍ͬͯΔ͜ͱʹؾ෇͘
  • ೔ʹΑͬͯ͸ϐʔΫ࣌ʹ͸

  ”ޙ10%૿͑ͨΒݶքಥഁͩͳ”

  ͱ͍͏ײ͡ʹͳ͍ͬͯΔ৔߹΋…
  • खΛଧͭඞཁ͕͋Δ
  19
  ཧ࿦্ͷݶք஋
  ΪϦΪϦͩ…
  Πϝʔδਤ

  View Slide

 20. DBసૹྔ࡟ݮࣄྫ
  • CPU࢖༻཰͸ΊͪΌͪ͘Ό༨͍ͬͯͨͷͰMySQLͷ௨
  ৴ѹॖ͕ྑͦ͞͏ͱ͍͏͜ͱͰݕূΛߦͬͨ
  • DSNʹ’mysql_compression=1’Λࢦఆ
  • ѹॖʹΑΓCPU͸৯͏͕ϚϧνίΞΛ࢖ͬͯ͘ΕΔ͠
  ҆৺
  20

  View Slide

 21. DBసૹྔ࡟ݮࣄྫ
  • ݕূ݁Ռ্͕ʑͩͬͨͷͰpull requestΛग़ͯ͠

  OpsΤϯδχΞʹϨϏϡʔͯ͠΋ΒͬͨΓ

  मਖ਼ͯ͠΋ΒͬͨΓͭͭ͠ϦϦʔε
  21

  View Slide

 22. DBసૹྔ࡟ݮࣄྫ
  • సૹྔܹݮ(໿1/4) ΊͰ͍ͨ
  • DBͷCPU࢖༻཰͕໰୊ͳ͍ൣғͰܹ૿
  22
  సૹྔܹݮ
  CPU࢖༻཰ܹ૿

  View Slide

 23. DBసૹྔ࡟ݮࣄྫ
  • సૹྔ͕ݮগ҆͠৺͍ͯͨ͋͠Δ໷ɺ͍͖ͳΓCPU࢖
  ༻཰͕ܹ૿͠DB͕٧·Γ͔͚Δ
  • “Writing to net”ͷDBεϨου͕େྔʹଘࡏ
  • సૹྔ͕૿͑͗ͯ͢ѹॖʹΑͬͯCPUΛ৯͍ͭͿ͠ѹॖ
  ͕௥͍͍͍ͭͯͳ͍
  • ѹॖʹΑΔ΋ͷ͕ͩɺ͔ͱ͍ͬͯΞϓϦέʔγϣϯαʔ
  όશ୆ͰѹॖΛղআ͢Δͱࠓ౓͸సૹྔͰ٧·Δ
  23

  View Slide

 24. DBసૹྔ࡟ݮࣄྫ Ұ࣍ରԠ
  • Ұ࣍ରԠͱͯ͠ΞϓϦέʔγϣϯαʔόͷҰ෦ͷΈͰ

  ѹॖΛղআ͠೉ΛಀΕͨ
  24
  ѹॖ ѹॖ ඇѹॖ ඇѹॖ

  View Slide

 25. DBసૹྔ࡟ݮࣄྫ Ұ࣍ରԠ
  • ཌ೔ɺࣄଶ͸·ͩ༧அΛڐ͞ͳ͍ঢ়گͳͷͰ

  Ұ୴ΠϯελϯελΠϓΛ্͛ͯmasterDB੾Γସ͑Λ
  ߦ͍Ұ҆৺
  • AWS͞ΜMHA͞Μ͋Γ͕ͱ͏
  25

  View Slide

 26. DBసૹྔ࡟ݮࣄྫ ࠜຊରॲ ௐࠪ
  • ٸʹCPU࢖༻཰্͕͕Δͷ͸ௐ͕ࠪඞཁ
  • DevΤϯδχΞͱڞʹkibana౳͔ΒݪҼΛௐࠪ͠ɺ

  ͋ΔΤϯυϙΠϯτ͕ϢʔβͷߦಈʹΑͬͯ

  ܹ͘͠DBͷసૹྔΛফඅ͢ΔΑ͏ʹͳΔ͜ͱΛൃݟɾ

  DevΤϯδχΞͱڞ༗
  26

  View Slide

 27. DBసૹྔ࡟ݮࣄྫ ࠜຊରॲ
  • ཌ೔ʹ͸DevΤϯδχΞͷखʹΑͬͯରॲ൛͕

  ϦϦʔε͞ΕɺDBͷసૹྔ΋ݮΓΊͰͨ͠ΊͰͨ͠
  • ͦͷޙDevΤϯδχΞͷखͰɺߋʹDBͷసૹྔΛݮΒ
  ͢ϦϦʔε͕Կ౓͔ߦΘΕͨ
  27

  View Slide

 28. OpsΤϯδχΞͱDevΤϯδχΞ
  • SREຊͰ͸“(SREͰ͸ͳ͍چདྷͷ)Ops͸αΠτͷՄ༻ੑ
  ΛकΔͨΊʹૉૣ͍σϓϩΠͳͲʹରཱ͢Δ”ͱ͍͏Α
  ͏ͳ࿩΋͋Γ·ͨ͠ɻ͕ɺ
  • ࠓճݟͨΑ͏ʹɺڍಈ͕Ϣʔβͷߦಈʹґଘ͢Δ

  ҰൠͷWebαʔϏεʹ͓͍ͯͦ΋ͦ΋”αΠτͷՄ༻ੑ
  ΛकΔ”ͱ͍͏จ຺Ͱ΋Ospଆ΋ૉૣ͍มԽΛ๬·ͳ͍
  ͱࢮ͵ͷͰ͸
  • ΋ͪΖΜαʔϏεͷ੒௕͕Ұ൪
  28

  View Slide

 29. 29
  • ΤϯδχΞ͕ࢀՃ͢ΔɺຖिߦΘΕΔࣾ಺ษڧձͰ͸

  ͦΕͧΕͷ࢓ࣄͷ಺༰Λൃද͍ͯ͘͠ͳͲ

  ૬ޓͷཧղͷॿ͚ʹͳ͍ͬͯ·͢
  • ۀ຿࣌ؒ֎ͷো֐ରԠΛݮΒͨ͢Ίɺ

  ҎԼͷΑ͏ͳ։ൃνʔϜͱͷ߹ҙ΋ߦ͍ͬͯ·͢
  • ༵ۚ೔ʹϦϦʔεΛ͠ͳ͍
  • ༦ํҎ߱ʹϦϦʔεΛ͠ͳ͍
  OpsΤϯδχΞͱDevΤϯδχΞ ଞʹ΋͍Ζ͍Ζ

  View Slide

 30. 30
  • ͜ͷΑ͏ʹɺ͸ͯͳͰ͸೔ʑOpsΤϯδχΞͱ

  DevΤϯδχΞ͕ڠྗͭͭ͠

  αʔϏεͷ৴པੑ޲্΁ͷऔΓ૊ΈΛ

  ߦ͍͍ͬͯͬͯ·͢
  OpsΤϯδχΞͱDevΤϯδχΞ ·ͱΊ

  View Slide

 31. 31
  • SREͬͯͦ΋ͦ΋ͳʹʁ
  • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

  WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
  • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
  • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
  • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
  • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
  • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
  ຊ೔ͷൃද಺༰

  View Slide

 32. ίʔυͱݕূʹΑͬͯ
  αʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
  AWS؀ڥͷRedisͷ৑௕ߏ੒ߏங
  32

  View Slide

 33. SREͱ͸(࠶ܝ)
  • “Google caps operational work for SREs at 50% of their
  time. Their remaining time should be spent using their
  coding skills on project work.”
  • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy
  Beyerɺ Chris Jones Chapter 1 ΑΓ
  33

  View Slide

 34. AWS؀ڥͷRedisͷ৑௕ߏ੒ߏங
  • զʑ͸50%΋ίʔυ͸ॻ͍͍ͯͳ͍ʂ
  • ͔͠͠ɺ΋ͪΖΜίʔυ͸ॻ͍͍ͯ·͢
  • ίʔυΛॻ͖ͭͭɺ໰୊ղܾΛߦͬͨ࿩Λ͠·͢
  34

  View Slide

 35. AWS؀ڥͷRedisͷ৑௕ߏ੒ߏங
  • ͸ͯͳͰ͸σʔληϯλʔͱAWSͷ྆ํͰαʔϏεΛ
  ӡӦ͍ͯ͠·͢
  • Redisͷ৑௕ߏ੒͸͜Ε·ͰσʔληϯλʔͰ͸

  ߦ͍ͬͯ·͕ͨ͠ɺAWS্Ͱ͸Ͱ͖͍ͯ·ͤΜͰͨ͠
  • AWS্Ͱ΋Redisͷ৑௕ߏ੒Λߏங͓ͨ͠࿩Λ͠·͢
  35

  View Slide

 36. σʔληϯλʔͰͷRedisͷ৑௕ߏ੒
  36
  Redis
  (master)
  keepalived
  IP: x.x.x.x
  Redis
  (slave)
  keepalived
  IP:y.y.y.y
  VRRP /w broadcast
  replication
  health check health check
  VIP: z.z.z.z VIPͰ઀ଓ

  View Slide

 37. σʔληϯλʔͰͷRedisͷ৑௕ߏ੒
  • keepalived + VRRPʹΑΔ৑௕ߏ੒
  • ΞϓϦέʔγϣϯ͔Β͸VIPͰRedisʹ઀ଓ
  • Redisʹ໰୊͕͋ͬͨ৔߹͸VIP͕Ҡಈ
  • VIPͷҠಈͱಉ࣌ʹɺslaveͷmasterঢ֨/masterͷslave
  ߱֨Λ࣮ࢪ
  37
  Redis
  (master)
  keepalived
  IP: x.x.x.x
  Redis
  (slave)
  keepalived
  IP:y.y.y.y
  VRRP /w broadcast
  replication
  health check
  VIP: z.z.z.z
  VIPͰ઀ଓ
  health check

  View Slide

 38. Redis
  (master)
  keepalived
  IP: x.x.x.x
  Redis
  (slave)
  keepalived
  IP:y.y.y.y
  VRRP /w broadcast
  replication
  health check health check
  VIP: z.z.z.z VIPͰ઀ଓ
  AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊
  38
  VRRP /w broadcast
  VIP: z.z.z.z

  View Slide

 39. Redis
  (master)
  keepalived
  IP: x.x.x.x
  Redis
  (slave)
  keepalived
  IP:y.y.y.y
  VRRP /w broadcast
  replication
  health check health check
  VIP: z.z.z.z VIPͰ઀ଓ
  AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊఺ 1
  39
  VRRP /w broadcast
  AWSͰ͸
  broadcast͕࢖͑ͳ͍

  View Slide

 40. 40
  • AWS VPCͰ͸IPͷbroadcast/multicast͸࢖༻Ͱ͖ͳ͍
  ͷͰunicastΛ࢖༻ͨ͠VRRPΛߦ͏ඞཁ͕͋Δ
  • ͸ͯͳͰ࢖͍ͬͯΔkeepalivedͷߏจνΣοΧ͸ൺֱ
  త৽͍͠ػೳͰ͋ΔunicastʹରԠ͍ͯ͠ͳ͍
  • keepalived͸ඪ४ͰߏจνΣοΧ͕͍͓ͭͯΒͣɺ
  ޡͬͨߏจͰreload͢Δͱਖ਼ৗʹಈ࡞͠ͳ͘ͳΔͷ
  ͰߏจνΣοΧͳ͠Ͱ͸ාͯ͘࢖͑ͳ͍
  Redisͷ৑௕ߏ੒ unicastͱߏจνΣοΫ

  View Slide

 41. 41
  • ͸ͯͳͷ࢖͍ͬͯͨߏจνΣοΧ͸Haskell੡
  • ࢲ͕unicastରԠͷHaskellίʔυΛॻͨ͘Ίʹ

  Ϗϧυ؀ڥΛ੔͑Α͏ͱ࢛ۤീ͍ۤͯ͠Δ಺ʹɺ

  ผͷDevΤϯδχΞͷid:y_uuki ͞Μ͕

  GoݴޠͰߏจνΣοΧΛ࡞ͬͯ͘Εͯղܾ
  Redisͷ৑௕ߏ੒ unicastͱߏจνΣοΫ

  View Slide

 42. Redisͷ৑௕ߏ੒ unicastͱߏจνΣοΫ
  42

  View Slide

 43. Redis
  (master)
  keepalived
  IP: x.x.x.x
  Redis
  (slave)
  keepalived
  IP:y.y.y.y
  VRRP /w broadcast
  replication
  health check health check
  VIP: z.z.z.z VIPͰ઀ଓ
  AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊఺ 2
  43
  VIP: z.z.z.z
  VIPͷҠಈΛͲ͏͢Δʁ

  View Slide

 44. 44
  • σʔληϯλʔ಺Ͱ͸VIPͷҠಈ͸

  GARPʹΑͬͯαϒωοτʹप஌͞ΕΔ͕

  AWS VPCͰ͸ͦͷํ๏͸࢖͑ͳ͍
  • VIPͷҠಈ͸ENIΛҠಈͤ͞Δ͜ͱͰ࣮ݱ
  • ͜͜Ͱ΋id:y_uuki͞Μͷ࡞ͬͨ

  Goݴޠ੡ͷιϑτ΢ΣΞɺgrabeniΛ࢖༻
  Redisͷ৑௕ߏ੒ VIPͷҠಈ

  View Slide

 45. Redisͷ৑௕ߏ੒ VIPͷҠಈ
  45

  View Slide

 46. 46
  • ͍ͭͰʹݕূதʹݟ͔༷ͭͬͨʑͳখ͍͞໰୊ΛຒΊΔ
  • keepalivedͷݱࡏͷstatusΛϑΝΠϧग़ྗ
  • keepalivedͷMASTER/BACKUPߏ੒͔Β

  BACKUP/BACKUPߏ੒΁
  • keepalivedͷstatus੾Γସ͑௨஌ϝʔϧͷ

  λΠτϧʹϗετ໊͕ग़ྗ͞ΕΔΑ͏ʹ
  • Redisͷfailover࣌ʹconfig rewriteΛߦ͍ɺfailoverޙʹ
  Redis͕࠶ىಈͯ͠΋ઃఆ͕ר͖໭Βͳ͍Α͏ʹ
  Redisͷ৑௕ߏ੒ ͦͷଞ

  View Slide

 47. 47
  • ͱ͍͏༷ʑͳιϑτ΢ΣΞ։ൃ΍ݕূʹΑͬͯɺAWS
  ʹ͓͚ΔRedisͷ৑௕ߏ੒͸࣮ݱ͠·ͨ͠
  Redisͷ৑௕ߏ੒ ·ͱΊ

  View Slide

 48. 48
  • SREͬͯͦ΋ͦ΋ͳʹʁ
  • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

  WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
  • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
  • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
  • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
  • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
  • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
  ຊ೔ͷൃද಺༰

  View Slide

 49. ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
  མͪͯ΋མͪͯ΋ࣗಈͰ

  ىಈͯ͘͠Δαʔό
  49

  View Slide

 50. SREͱ͸(࠶ܝ)
  • “Monitoring should never require a human to interpret
  any part of the alerting domain. Instead, software
  should do the interpreting, and humans should be
  notified only when they need to take action. ”
  • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy
  Beyerɺ Chris Jones Chapter 1 ΑΓ
  50

  View Slide

 51. 51
  • ਓؒ͸ਓؒͷͰ͖Δ͜ͱʹूத͢΂͖Ͱɺ

  ػց͕Ͱ͖Δ͜ͱ͸ػցʹ೚ͤΔ΂͖
  • ͸ͯͳʹೖͬͯ໘ന͍ͳͱײͨ͡

  ࣗ཯෮چγεςϜʹ͍ͭͯ
  • ΋ͪΖΜଞʹ΋͍Ζ͍Ζͳ

  ࣗಈγεςϜ͸͋ΔͷͰ͕͢ɺ

  Ұ൪͓΋͠Ζ͔ͬͨ΋ͷΛ
  ໨ࢦ͢΂͖γεςϜͷ࢟

  View Slide

 52. 52
  • ͸ͯͳͷσʔληϯλʔͰ͸

  ෺ཧαʔόΛԾ૝Խͯ͠࢖༻
  • Ծ૝Խج൫ʹ͸XenΛ࢖༻͠ɺ

  1୆ͷ෺ཧαʔόʹෳ਺ͷOSΛฒྻʹಈ࡞͍ͤͯ͞Δ
  Xenʹ͍ͭͯ

  View Slide

 53. Xenʹ͍ͭͯ(༻ޠ)
  • DomU: ͍ΘΏΔී௨ͷαʔόͱͯ͠࢖༻
  • Dom0: DomUͷىಈ౳΋ؚΊͨXenͷ؅ཧ΍ϋʔυ΢ΣΞΞΫηεΛߦ͏
  53
  ϋʔυ΢ΣΞ
  Dom0(Xenͷ؅ཧ)
  DomU(Webαʔό)
  DomU(Webαʔό)
  DomU(DBαʔό)

  View Slide

 54. 54
  • Xenͷ্Ͱ࡞ΒΕͨɺੲͷαʔϏεʹΑ͘࢖ΘΕ͍ͯΔ
  ࣗ཯෮چγεςϜ
  ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ

  View Slide

 55. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
  • Dom0͕DomU্Ͱಈ͍͍ͯΔHTTPΞϓϦέʔγϣϯ
  ΛHTTPͰ؂ࢹ(monit࢖༻)
  55
  ؂ࢹ ؂ࢹ
  ϋʔυ΢ΣΞ
  Dom0(Xenͷ؅ཧ)
  DomU(Webαʔό)
  DomU(Webαʔό)
  DomU(DBαʔό)

  View Slide

 56. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
  • DomUͷHTTPαʔό͕ࢮΜͩ͜ͱΛݕ஌͢Δͱ
  56
  ؂ࢹ ࢮ๢ݕ஌
  ϋʔυ΢ΣΞ
  Dom0(Xenͷ؅ཧ)
  DomU(Webαʔό)
  DomU(Webαʔό)
  DomU(DBαʔό)

  View Slide

 57. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
  • monit͕ͦͷDomUΛOS͝ͱstop/start
  • xm destroy && xm create
  57
  ؂ࢹ DomUΛOS͝ͱ
  stop/start
  ϋʔυ΢ΣΞ
  Dom0(Xenͷ؅ཧ)
  DomU(Webαʔό)
  DomU(Webαʔό)
  DomU(DBαʔό)

  View Slide

 58. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
  • উखʹ෮چ׬ྃ
  58
  ؂ࢹ ؂ࢹ
  ϋʔυ΢ΣΞ
  Dom0(Xenͷ؅ཧ)
  DomU(Webαʔό)
  DomU(Webαʔό)
  DomU(DBαʔό)

  View Slide

 59. ໨ࢦ͢΂͖γεςϜͷ࢟
  • ͋Δ೔ߴෛՙͰαʔό͕མͪͯ͸ࣗಈ෮چ͍ͯ͠ΔͷΛ
  ݟͯײಈͨ࣌͠ͷൃݴ
  59

  View Slide

 60. 60
  • ਓؒ͸ਓؒʹ͔͠Ͱ͖ͳ͍͜ͱʹ

  ूத͢Δ؀ڥΛ࡞͍ͬͯ͘
  ໨ࢦ͢΂͖γεςϜͷ࢟

  View Slide

 61. 61
  • ͸ͯͳͰͷࣄྫ঺հΛ͠·ͨ͠
  • ͜Ε͔Β΋DevΤϯδχΞͱڞʹྑ͍αʔϏεΛ࡞ͬͯ
  ͍͖·͢
  • ͜Ε͔Β΋೪Γڧࣗ͘ಈԽΛਐΊ͍͖ͯ·͢
  • ػցͷͰ͖Δ͜ͱ͸શͯػցʹ೚ͤΔੈք΁…
  ·ͱΊ

  View Slide

 62. 62
  • ͸ͯͳͰ͸OpsΤϯδχΞΛੵۃ࠾༻͍ͯ͠·͢ʂ
  • ౦ژͰ΋ژ౎Ͱ΋ʂ
  • ࢲ͸౦ژͰژ౎ͷDev/OpsΤϯδχΞͱ

  Ұॹʹ࢓ࣄ͍ͯ͠·͢ʂ
  • ࠓ೔ͷൃද΍౦ژͱژ౎ͷϦϞʔτϫʔΫͷ࿩ͳͲͰ΋

  ฉ͖͍ͨ͜ͱ͋Γ·ͨ͠Βੋඇ࠙਌ձͰʂ
  ࠷ޙʹ

  View Slide