Upgrade to Pro — share decks privately, control downloads, hide ads and more …

はてなでの サービス信頼性向上のための 取り組み事例

dekokun
July 25, 2016

はてなでの サービス信頼性向上のための 取り組み事例

SRE Tech Talks ( http://connpass.com/event/34825/ ) でお話した際の資料です

dekokun

July 25, 2016
Tweet

More Decks by dekokun

Other Decks in Technology

Transcript

  1. 4 • SREͬͯͦ΋ͦ΋ͳʹʁ • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ
 WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍ • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩

    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό ຊ೔ͷൃද಺༰
  2. 5 • SREͬͯͦ΋ͦ΋ͳʹʁ • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ
 WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍ • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩

    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό ຊ೔ͷൃද಺༰
  3. SREͱ͸ • Site Reliability EngineeringಡΈ࢝Ί·ͨ͠ • ࣾ಺ྠಡձ࢝·Γ·ͨ͠ • ·ͩchapter 1,2,4͔͠ಡΜͰ͍ͳ͍

    • ࠓճ͸ɺࢲͷൃදʹؔ܎͋Γͦ͏ͳ෦෼͚ͩ
 ܰ͘આ໌͓ͭͭ͠࿩Λ͠·͢ • ҎԼɺ”SREຊ”ͱݺͼ·͢ 8 Site Reliability Engineering: http://shop.oreilly.com/product/0636920041528.do
  4. ༻ޠղઆ • WebΦϖϨʔγϣϯΤϯδχΞ • ͍ΘΏΔΠϯϑϥ୲౰ • ͜ͷࢿྉͰ͸”OpsΤϯδχΞ” • WebΞϓϦέʔγϣϯΤϯδχΞ •

    ͍ΘΏΔ։ൃΤϯδχΞ • ͜ͷࢿྉͰ͸”DevΤϯδχΞ” • ͸ͯͳʹ͓͍ͯ྆ऀ͕ߦ͏ࣄͷڥ໨͸͋Δఔ౓ᐆດ 11
  5. 12 • SREͬͯͦ΋ͦ΋ͳʹʁ • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ
 WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍ • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩

    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό ຊ೔ͷൃද಺༰
  6. SREͱ͸(࠶ܝ) • “In SRE we bring this conflict to the

    fore, and then resolve it with the introduction of an error budget” • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy Beyerɺ Chris Jones Chapter 1 ΑΓ 14
  7. 31 • SREͬͯͦ΋ͦ΋ͳʹʁ • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ
 WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍ • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩

    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό ຊ೔ͷൃද಺༰
  8. SREͱ͸(࠶ܝ) • “Google caps operational work for SREs at 50%

    of their time. Their remaining time should be spent using their coding skills on project work.” • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy Beyerɺ Chris Jones Chapter 1 ΑΓ 33
  9. σʔληϯλʔͰͷRedisͷ৑௕ߏ੒ 36 Redis (master) keepalived IP: x.x.x.x Redis (slave) keepalived

    IP:y.y.y.y VRRP /w broadcast replication health check health check VIP: z.z.z.z VIPͰ઀ଓ
  10. σʔληϯλʔͰͷRedisͷ৑௕ߏ੒ • keepalived + VRRPʹΑΔ৑௕ߏ੒ • ΞϓϦέʔγϣϯ͔Β͸VIPͰRedisʹ઀ଓ • Redisʹ໰୊͕͋ͬͨ৔߹͸VIP͕Ҡಈ •

    VIPͷҠಈͱಉ࣌ʹɺslaveͷmasterঢ֨/masterͷslave ߱֨Λ࣮ࢪ 37 Redis (master) keepalived IP: x.x.x.x Redis (slave) keepalived IP:y.y.y.y VRRP /w broadcast replication health check VIP: z.z.z.z VIPͰ઀ଓ health check
  11. Redis (master) keepalived IP: x.x.x.x Redis (slave) keepalived IP:y.y.y.y VRRP

    /w broadcast replication health check health check VIP: z.z.z.z VIPͰ઀ଓ AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊ 38 VRRP /w broadcast VIP: z.z.z.z
  12. Redis (master) keepalived IP: x.x.x.x Redis (slave) keepalived IP:y.y.y.y VRRP

    /w broadcast replication health check health check VIP: z.z.z.z VIPͰ઀ଓ AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊఺ 1 39 VRRP /w broadcast AWSͰ͸ broadcast͕࢖͑ͳ͍
  13. Redis (master) keepalived IP: x.x.x.x Redis (slave) keepalived IP:y.y.y.y VRRP

    /w broadcast replication health check health check VIP: z.z.z.z VIPͰ઀ଓ AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊఺ 2 43 VIP: z.z.z.z VIPͷҠಈΛͲ͏͢Δʁ
  14. 46 • ͍ͭͰʹݕূதʹݟ͔༷ͭͬͨʑͳখ͍͞໰୊ΛຒΊΔ • keepalivedͷݱࡏͷstatusΛϑΝΠϧग़ྗ • keepalivedͷMASTER/BACKUPߏ੒͔Β
 BACKUP/BACKUPߏ੒΁ • keepalivedͷstatus੾Γସ͑௨஌ϝʔϧͷ


    λΠτϧʹϗετ໊͕ग़ྗ͞ΕΔΑ͏ʹ • Redisͷfailover࣌ʹconfig rewriteΛߦ͍ɺfailoverޙʹ Redis͕࠶ىಈͯ͠΋ઃఆ͕ר͖໭Βͳ͍Α͏ʹ Redisͷ৑௕ߏ੒ ͦͷଞ
  15. 48 • SREͬͯͦ΋ͦ΋ͳʹʁ • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ
 WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍ • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩

    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό ຊ೔ͷൃද಺༰
  16. SREͱ͸(࠶ܝ) • “Monitoring should never require a human to interpret

    any part of the alerting domain. Instead, software should do the interpreting, and humans should be notified only when they need to take action. ” • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy Beyerɺ Chris Jones Chapter 1 ΑΓ 50
  17. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ • monit͕ͦͷDomUΛOS͝ͱstop/start • xm destroy && xm create

    57 ؂ࢹ DomUΛOS͝ͱ stop/start ϋʔυ΢ΣΞ Dom0(Xenͷ؅ཧ) DomU(Webαʔό) DomU(Webαʔό) DomU(DBαʔό)