Upgrade to Pro — share decks privately, control downloads, hide ads and more …

はてなでの サービス信頼性向上のための 取り組み事例

dekokun
July 25, 2016

はてなでの サービス信頼性向上のための 取り組み事例

SRE Tech Talks ( http://connpass.com/event/34825/ ) でお話した際の資料です

dekokun

July 25, 2016
Tweet

More Decks by dekokun

Other Decks in Technology

Transcript

  1. ͸ͯͳͰͷ
    αʔϏε৴པੑ޲্ͷͨΊͷ
    औΓ૊Έࣄྫ
    SRE Tech Talks
    גࣜձࣾ͸ͯͳ
    id:dekokun

    View Slide

  2. ࣗݾ঺հ
    2

    View Slide

  3. ࣗݾ঺հ
    • id:dekokun
    • ͸ͯͳͰΠϯϑϥ@౦ژ
    • WebΦϖϨʔγϣϯΤϯδχΞ
    • ͸ͯͳͷ͍Ζ͍ΖͳαʔϏεͷΠϯϑϥ୲౰
    • ͸ͯͳϒϩάͱଞʹ΋͍͔ͭ͘
    • ೖࣾ1೥ऑ
    • લ৬͸PHPͱJSॻ͍ͯ·ͨ͠
    3

    View Slide

  4. 4
    • SREͬͯͦ΋ͦ΋ͳʹʁ
    • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

    WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
    • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
    • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
    • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
    • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
    ຊ೔ͷൃද಺༰

    View Slide

  5. 5
    • SREͬͯͦ΋ͦ΋ͳʹʁ
    • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

    WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
    • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
    • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
    • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
    • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
    ຊ೔ͷൃද಺༰

    View Slide

  6. SREͬͯͦ΋ͦ΋Կʁ
    6

    View Slide

  7. SREͱ͸
    • SRE Tech Talksͱ͍͏໊લͷΠϕϯτͰ

    ൃද͍ͤͯͨͩ͘͜͞ͱʹͳ͕ͬͨ

    ͦ΋ͦ΋SRE͕Կ͔Λৄ͘͠஌Βͳ͔ͬͨ
    • ࢲͷ৬छ΋”SRE”ͱ͍͏໊લͰ͸ͳ͍
    • SLAΛܾΊͯͦͷதͰ௅ઓ͍ͯ͘͠Πϝʔδ
    • ίʔυΛΑ͘ॻ͘Πϝʔδ
    • DevOpsͱ͸ҧ͏ͷ͔ͳʁ
    7

    View Slide

  8. SREͱ͸
    • Site Reliability EngineeringಡΈ࢝Ί·ͨ͠
    • ࣾ಺ྠಡձ࢝·Γ·ͨ͠
    • ·ͩchapter 1,2,4͔͠ಡΜͰ͍ͳ͍
    • ࠓճ͸ɺࢲͷൃදʹؔ܎͋Γͦ͏ͳ෦෼͚ͩ

    ܰ͘આ໌͓ͭͭ͠࿩Λ͠·͢
    • ҎԼɺ”SREຊ”ͱݺͼ·͢
    8
    Site Reliability Engineering: http://shop.oreilly.com/product/0636920041528.do

    View Slide

  9. SREͱ͸
    • ͸ͯͳʹ͸SREͱ͍͏ݞॻ͸͋Γ·ͤΜ͕ɺ

    ͸ͯͳͰ΋αʔϏεͷ৴པੑ޲্ͷͨΊʹ

    ೔ʑ༷ʑͳ͜ͱΛ͍ͯ͠·͢
    • ຊ೔͸ͦΕΒͷҰ෦Λ͝঺հ͍ͨ͠·͢
    9

    View Slide

  10. ༻ޠղઆ
    10

    View Slide

  11. ༻ޠղઆ
    • WebΦϖϨʔγϣϯΤϯδχΞ
    • ͍ΘΏΔΠϯϑϥ୲౰
    • ͜ͷࢿྉͰ͸”OpsΤϯδχΞ”
    • WebΞϓϦέʔγϣϯΤϯδχΞ
    • ͍ΘΏΔ։ൃΤϯδχΞ
    • ͜ͷࢿྉͰ͸”DevΤϯδχΞ”
    • ͸ͯͳʹ͓͍ͯ྆ऀ͕ߦ͏ࣄͷڥ໨͸͋Δఔ౓ᐆດ
    11

    View Slide

  12. 12
    • SREͬͯͦ΋ͦ΋ͳʹʁ
    • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

    WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
    • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
    • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
    • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
    • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
    ຊ೔ͷൃද಺༰

    View Slide

  13. OpsΤϯδχΞͱDevΤϯδχΞͷ
    ڠྗମ੍
    DBͷసૹྔ࡟ݮ
    13

    View Slide

  14. SREͱ͸(࠶ܝ)
    • “In SRE we bring this conflict to the fore, and then
    resolve it with the introduction of an error budget”
    • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy
    Beyerɺ Chris Jones Chapter 1 ΑΓ
    14

    View Slide

  15. opsΤϯδχΞͷαʔϏε΁ͷؔΘΓํ
    • SREຊʹΑΕ͹ɺSREͰ͸ͳ͍Ops͸Devͱ͸

    ໨ඪ͕ҟͳΔ͜ͱʹΑΓ໰୊͕ൃੜ͕ͪͩ͠ͱͷ͜ͱ
    • ͸ͯͳͰ͸͋·Γͳ͍Α͏ʹࢥ͑Δ
    • ͸ͯͳͰͷOpsΤϯδχΞͱDevΤϯδχΞͷ

    αʔϏε΁ͷؔΘΓํΛ঺հ͠·͢
    15

    View Slide

  16. OpsΤϯδχΞͷαʔϏε΁ͷؔΘΓํ
    WebΦϖϨʔγϣϯ
    ΤϯδχΞνʔϜ
    ֤αʔϏε
    ͸ͯͳϒϩά
    ͸ͯͳϒοΫϚʔΫ
    etc…
    • ֤αʔϏεʹOpsΤϯδχΞ͕

    ୲౰ͱͯͭ͘͠
    16

    View Slide

  17. OpsΤϯδχΞͷαʔϏε΁ͷؔΘΓํ
    • ͸ͯͳʹ͓͍ͯ྆ऀ͕ߦ͏ࣄͷڥ໨͸͋Δఔ౓ᐆດ
    • OpsΤϯδχΞ͕αʔϏεͷίʔυʹpull requestΛ

    ग़ͨ͠Γ
    • DevΤϯδχΞ͕chefΛ͍ͬͨ͡Γ͢Δ
    17

    View Slide

  18. ڠྗମ੍ࣄྫɿ
    DBͷసૹྔ࡟ݮ
    18

    View Slide

  19. DBసૹྔ࡟ݮࣄྫ
    • ͋Δ೔ɺmaster DB(MySQL)ͷసૹྔ͕

    ݶքۙ͘ͳ͍ͬͯΔ͜ͱʹؾ෇͘
    • ೔ʹΑͬͯ͸ϐʔΫ࣌ʹ͸

    ”ޙ10%૿͑ͨΒݶքಥഁͩͳ”

    ͱ͍͏ײ͡ʹͳ͍ͬͯΔ৔߹΋…
    • खΛଧͭඞཁ͕͋Δ
    19
    ཧ࿦্ͷݶք஋
    ΪϦΪϦͩ…
    Πϝʔδਤ

    View Slide

  20. DBసૹྔ࡟ݮࣄྫ
    • CPU࢖༻཰͸ΊͪΌͪ͘Ό༨͍ͬͯͨͷͰMySQLͷ௨
    ৴ѹॖ͕ྑͦ͞͏ͱ͍͏͜ͱͰݕূΛߦͬͨ
    • DSNʹ’mysql_compression=1’Λࢦఆ
    • ѹॖʹΑΓCPU͸৯͏͕ϚϧνίΞΛ࢖ͬͯ͘ΕΔ͠
    ҆৺
    20

    View Slide

  21. DBసૹྔ࡟ݮࣄྫ
    • ݕূ݁Ռ্͕ʑͩͬͨͷͰpull requestΛग़ͯ͠

    OpsΤϯδχΞʹϨϏϡʔͯ͠΋ΒͬͨΓ

    मਖ਼ͯ͠΋ΒͬͨΓͭͭ͠ϦϦʔε
    21

    View Slide

  22. DBసૹྔ࡟ݮࣄྫ
    • సૹྔܹݮ(໿1/4) ΊͰ͍ͨ
    • DBͷCPU࢖༻཰͕໰୊ͳ͍ൣғͰܹ૿
    22
    సૹྔܹݮ
    CPU࢖༻཰ܹ૿

    View Slide

  23. DBసૹྔ࡟ݮࣄྫ
    • సૹྔ͕ݮগ҆͠৺͍ͯͨ͋͠Δ໷ɺ͍͖ͳΓCPU࢖
    ༻཰͕ܹ૿͠DB͕٧·Γ͔͚Δ
    • “Writing to net”ͷDBεϨου͕େྔʹଘࡏ
    • సૹྔ͕૿͑͗ͯ͢ѹॖʹΑͬͯCPUΛ৯͍ͭͿ͠ѹॖ
    ͕௥͍͍͍ͭͯͳ͍
    • ѹॖʹΑΔ΋ͷ͕ͩɺ͔ͱ͍ͬͯΞϓϦέʔγϣϯαʔ
    όશ୆ͰѹॖΛղআ͢Δͱࠓ౓͸సૹྔͰ٧·Δ
    23

    View Slide

  24. DBసૹྔ࡟ݮࣄྫ Ұ࣍ରԠ
    • Ұ࣍ରԠͱͯ͠ΞϓϦέʔγϣϯαʔόͷҰ෦ͷΈͰ

    ѹॖΛղআ͠೉ΛಀΕͨ
    24
    ѹॖ ѹॖ ඇѹॖ ඇѹॖ

    View Slide

  25. DBసૹྔ࡟ݮࣄྫ Ұ࣍ରԠ
    • ཌ೔ɺࣄଶ͸·ͩ༧அΛڐ͞ͳ͍ঢ়گͳͷͰ

    Ұ୴ΠϯελϯελΠϓΛ্͛ͯmasterDB੾Γସ͑Λ
    ߦ͍Ұ҆৺
    • AWS͞ΜMHA͞Μ͋Γ͕ͱ͏
    25

    View Slide

  26. DBసૹྔ࡟ݮࣄྫ ࠜຊରॲ ௐࠪ
    • ٸʹCPU࢖༻཰্͕͕Δͷ͸ௐ͕ࠪඞཁ
    • DevΤϯδχΞͱڞʹkibana౳͔ΒݪҼΛௐࠪ͠ɺ

    ͋ΔΤϯυϙΠϯτ͕ϢʔβͷߦಈʹΑͬͯ

    ܹ͘͠DBͷసૹྔΛফඅ͢ΔΑ͏ʹͳΔ͜ͱΛൃݟɾ

    DevΤϯδχΞͱڞ༗
    26

    View Slide

  27. DBసૹྔ࡟ݮࣄྫ ࠜຊରॲ
    • ཌ೔ʹ͸DevΤϯδχΞͷखʹΑͬͯରॲ൛͕

    ϦϦʔε͞ΕɺDBͷసૹྔ΋ݮΓΊͰͨ͠ΊͰͨ͠
    • ͦͷޙDevΤϯδχΞͷखͰɺߋʹDBͷసૹྔΛݮΒ
    ͢ϦϦʔε͕Կ౓͔ߦΘΕͨ
    27

    View Slide

  28. OpsΤϯδχΞͱDevΤϯδχΞ
    • SREຊͰ͸“(SREͰ͸ͳ͍چདྷͷ)Ops͸αΠτͷՄ༻ੑ
    ΛकΔͨΊʹૉૣ͍σϓϩΠͳͲʹରཱ͢Δ”ͱ͍͏Α
    ͏ͳ࿩΋͋Γ·ͨ͠ɻ͕ɺ
    • ࠓճݟͨΑ͏ʹɺڍಈ͕Ϣʔβͷߦಈʹґଘ͢Δ

    ҰൠͷWebαʔϏεʹ͓͍ͯͦ΋ͦ΋”αΠτͷՄ༻ੑ
    ΛकΔ”ͱ͍͏จ຺Ͱ΋Ospଆ΋ૉૣ͍มԽΛ๬·ͳ͍
    ͱࢮ͵ͷͰ͸
    • ΋ͪΖΜαʔϏεͷ੒௕͕Ұ൪
    28

    View Slide

  29. 29
    • ΤϯδχΞ͕ࢀՃ͢ΔɺຖिߦΘΕΔࣾ಺ษڧձͰ͸

    ͦΕͧΕͷ࢓ࣄͷ಺༰Λൃද͍ͯ͘͠ͳͲ

    ૬ޓͷཧղͷॿ͚ʹͳ͍ͬͯ·͢
    • ۀ຿࣌ؒ֎ͷো֐ରԠΛݮΒͨ͢Ίɺ

    ҎԼͷΑ͏ͳ։ൃνʔϜͱͷ߹ҙ΋ߦ͍ͬͯ·͢
    • ༵ۚ೔ʹϦϦʔεΛ͠ͳ͍
    • ༦ํҎ߱ʹϦϦʔεΛ͠ͳ͍
    OpsΤϯδχΞͱDevΤϯδχΞ ଞʹ΋͍Ζ͍Ζ

    View Slide

  30. 30
    • ͜ͷΑ͏ʹɺ͸ͯͳͰ͸೔ʑOpsΤϯδχΞͱ

    DevΤϯδχΞ͕ڠྗͭͭ͠

    αʔϏεͷ৴པੑ޲্΁ͷऔΓ૊ΈΛ

    ߦ͍͍ͬͯͬͯ·͢
    OpsΤϯδχΞͱDevΤϯδχΞ ·ͱΊ

    View Slide

  31. 31
    • SREͬͯͦ΋ͦ΋ͳʹʁ
    • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

    WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
    • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
    • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
    • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
    • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
    ຊ೔ͷൃද಺༰

    View Slide

  32. ίʔυͱݕূʹΑͬͯ
    αʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
    AWS؀ڥͷRedisͷ৑௕ߏ੒ߏங
    32

    View Slide

  33. SREͱ͸(࠶ܝ)
    • “Google caps operational work for SREs at 50% of their
    time. Their remaining time should be spent using their
    coding skills on project work.”
    • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy
    Beyerɺ Chris Jones Chapter 1 ΑΓ
    33

    View Slide

  34. AWS؀ڥͷRedisͷ৑௕ߏ੒ߏங
    • զʑ͸50%΋ίʔυ͸ॻ͍͍ͯͳ͍ʂ
    • ͔͠͠ɺ΋ͪΖΜίʔυ͸ॻ͍͍ͯ·͢
    • ίʔυΛॻ͖ͭͭɺ໰୊ղܾΛߦͬͨ࿩Λ͠·͢
    34

    View Slide

  35. AWS؀ڥͷRedisͷ৑௕ߏ੒ߏங
    • ͸ͯͳͰ͸σʔληϯλʔͱAWSͷ྆ํͰαʔϏεΛ
    ӡӦ͍ͯ͠·͢
    • Redisͷ৑௕ߏ੒͸͜Ε·ͰσʔληϯλʔͰ͸

    ߦ͍ͬͯ·͕ͨ͠ɺAWS্Ͱ͸Ͱ͖͍ͯ·ͤΜͰͨ͠
    • AWS্Ͱ΋Redisͷ৑௕ߏ੒Λߏங͓ͨ͠࿩Λ͠·͢
    35

    View Slide

  36. σʔληϯλʔͰͷRedisͷ৑௕ߏ੒
    36
    Redis
    (master)
    keepalived
    IP: x.x.x.x
    Redis
    (slave)
    keepalived
    IP:y.y.y.y
    VRRP /w broadcast
    replication
    health check health check
    VIP: z.z.z.z VIPͰ઀ଓ

    View Slide

  37. σʔληϯλʔͰͷRedisͷ৑௕ߏ੒
    • keepalived + VRRPʹΑΔ৑௕ߏ੒
    • ΞϓϦέʔγϣϯ͔Β͸VIPͰRedisʹ઀ଓ
    • Redisʹ໰୊͕͋ͬͨ৔߹͸VIP͕Ҡಈ
    • VIPͷҠಈͱಉ࣌ʹɺslaveͷmasterঢ֨/masterͷslave
    ߱֨Λ࣮ࢪ
    37
    Redis
    (master)
    keepalived
    IP: x.x.x.x
    Redis
    (slave)
    keepalived
    IP:y.y.y.y
    VRRP /w broadcast
    replication
    health check
    VIP: z.z.z.z
    VIPͰ઀ଓ
    health check

    View Slide

  38. Redis
    (master)
    keepalived
    IP: x.x.x.x
    Redis
    (slave)
    keepalived
    IP:y.y.y.y
    VRRP /w broadcast
    replication
    health check health check
    VIP: z.z.z.z VIPͰ઀ଓ
    AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊
    38
    VRRP /w broadcast
    VIP: z.z.z.z

    View Slide

  39. Redis
    (master)
    keepalived
    IP: x.x.x.x
    Redis
    (slave)
    keepalived
    IP:y.y.y.y
    VRRP /w broadcast
    replication
    health check health check
    VIP: z.z.z.z VIPͰ઀ଓ
    AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊఺ 1
    39
    VRRP /w broadcast
    AWSͰ͸
    broadcast͕࢖͑ͳ͍

    View Slide

  40. 40
    • AWS VPCͰ͸IPͷbroadcast/multicast͸࢖༻Ͱ͖ͳ͍
    ͷͰunicastΛ࢖༻ͨ͠VRRPΛߦ͏ඞཁ͕͋Δ
    • ͸ͯͳͰ࢖͍ͬͯΔkeepalivedͷߏจνΣοΧ͸ൺֱ
    త৽͍͠ػೳͰ͋ΔunicastʹରԠ͍ͯ͠ͳ͍
    • keepalived͸ඪ४ͰߏจνΣοΧ͕͍͓ͭͯΒͣɺ
    ޡͬͨߏจͰreload͢Δͱਖ਼ৗʹಈ࡞͠ͳ͘ͳΔͷ
    ͰߏจνΣοΧͳ͠Ͱ͸ාͯ͘࢖͑ͳ͍
    Redisͷ৑௕ߏ੒ unicastͱߏจνΣοΫ

    View Slide

  41. 41
    • ͸ͯͳͷ࢖͍ͬͯͨߏจνΣοΧ͸Haskell੡
    • ࢲ͕unicastରԠͷHaskellίʔυΛॻͨ͘Ίʹ

    Ϗϧυ؀ڥΛ੔͑Α͏ͱ࢛ۤീ͍ۤͯ͠Δ಺ʹɺ

    ผͷDevΤϯδχΞͷid:y_uuki ͞Μ͕

    GoݴޠͰߏจνΣοΧΛ࡞ͬͯ͘Εͯղܾ
    Redisͷ৑௕ߏ੒ unicastͱߏจνΣοΫ

    View Slide

  42. Redisͷ৑௕ߏ੒ unicastͱߏจνΣοΫ
    42

    View Slide

  43. Redis
    (master)
    keepalived
    IP: x.x.x.x
    Redis
    (slave)
    keepalived
    IP:y.y.y.y
    VRRP /w broadcast
    replication
    health check health check
    VIP: z.z.z.z VIPͰ઀ଓ
    AWSͰσʔληϯλʔͱಉ༷ͷߏ੒ΛͱΔͨΊͷ໰୊఺ 2
    43
    VIP: z.z.z.z
    VIPͷҠಈΛͲ͏͢Δʁ

    View Slide

  44. 44
    • σʔληϯλʔ಺Ͱ͸VIPͷҠಈ͸

    GARPʹΑͬͯαϒωοτʹप஌͞ΕΔ͕

    AWS VPCͰ͸ͦͷํ๏͸࢖͑ͳ͍
    • VIPͷҠಈ͸ENIΛҠಈͤ͞Δ͜ͱͰ࣮ݱ
    • ͜͜Ͱ΋id:y_uuki͞Μͷ࡞ͬͨ

    Goݴޠ੡ͷιϑτ΢ΣΞɺgrabeniΛ࢖༻
    Redisͷ৑௕ߏ੒ VIPͷҠಈ

    View Slide

  45. Redisͷ৑௕ߏ੒ VIPͷҠಈ
    45

    View Slide

  46. 46
    • ͍ͭͰʹݕূதʹݟ͔༷ͭͬͨʑͳখ͍͞໰୊ΛຒΊΔ
    • keepalivedͷݱࡏͷstatusΛϑΝΠϧग़ྗ
    • keepalivedͷMASTER/BACKUPߏ੒͔Β

    BACKUP/BACKUPߏ੒΁
    • keepalivedͷstatus੾Γସ͑௨஌ϝʔϧͷ

    λΠτϧʹϗετ໊͕ग़ྗ͞ΕΔΑ͏ʹ
    • Redisͷfailover࣌ʹconfig rewriteΛߦ͍ɺfailoverޙʹ
    Redis͕࠶ىಈͯ͠΋ઃఆ͕ר͖໭Βͳ͍Α͏ʹ
    Redisͷ৑௕ߏ੒ ͦͷଞ

    View Slide

  47. 47
    • ͱ͍͏༷ʑͳιϑτ΢ΣΞ։ൃ΍ݕূʹΑͬͯɺAWS
    ʹ͓͚ΔRedisͷ৑௕ߏ੒͸࣮ݱ͠·ͨ͠
    Redisͷ৑௕ߏ੒ ·ͱΊ

    View Slide

  48. 48
    • SREͬͯͦ΋ͦ΋ͳʹʁ
    • ͸ͯͳͷWebΦϖϨʔγϣϯΤϯδχΞͱ

    WebΞϓϦέʔγϣϯΤϯδχΞͷڠྗମ੍
    • DBసૹྔ࡟ݮ΁ͷऔΓ૊Έ
    • ίʔυͱݕূʹΑͬͯαʔϏεͷ৴པੑΛ޲্ͤ͞Δ࿩
    • AWSʹ͓͚ΔRedisͷ৑௕ߏ੒ߏங
    • ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
    • མͪͯ΋མͪͯ΋ࣗಈͰىಈͯ͘͠Δαʔό
    ຊ೔ͷൃද಺༰

    View Slide

  49. ໨ࢦ͢΂͖γεςϜͷ࢟ʹ͍ͭͯ
    མͪͯ΋མͪͯ΋ࣗಈͰ

    ىಈͯ͘͠Δαʔό
    49

    View Slide

  50. SREͱ͸(࠶ܝ)
    • “Monitoring should never require a human to interpret
    any part of the alerting domain. Instead, software
    should do the interpreting, and humans should be
    notified only when they need to take action. ”
    • “Site Reliability Engineering: How Google Runs Production Systems” 2016/3/23 Betsy
    Beyerɺ Chris Jones Chapter 1 ΑΓ
    50

    View Slide

  51. 51
    • ਓؒ͸ਓؒͷͰ͖Δ͜ͱʹूத͢΂͖Ͱɺ

    ػց͕Ͱ͖Δ͜ͱ͸ػցʹ೚ͤΔ΂͖
    • ͸ͯͳʹೖͬͯ໘ന͍ͳͱײͨ͡

    ࣗ཯෮چγεςϜʹ͍ͭͯ
    • ΋ͪΖΜଞʹ΋͍Ζ͍Ζͳ

    ࣗಈγεςϜ͸͋ΔͷͰ͕͢ɺ

    Ұ൪͓΋͠Ζ͔ͬͨ΋ͷΛ
    ໨ࢦ͢΂͖γεςϜͷ࢟

    View Slide

  52. 52
    • ͸ͯͳͷσʔληϯλʔͰ͸

    ෺ཧαʔόΛԾ૝Խͯ͠࢖༻
    • Ծ૝Խج൫ʹ͸XenΛ࢖༻͠ɺ

    1୆ͷ෺ཧαʔόʹෳ਺ͷOSΛฒྻʹಈ࡞͍ͤͯ͞Δ
    Xenʹ͍ͭͯ

    View Slide

  53. Xenʹ͍ͭͯ(༻ޠ)
    • DomU: ͍ΘΏΔී௨ͷαʔόͱͯ͠࢖༻
    • Dom0: DomUͷىಈ౳΋ؚΊͨXenͷ؅ཧ΍ϋʔυ΢ΣΞΞΫηεΛߦ͏
    53
    ϋʔυ΢ΣΞ
    Dom0(Xenͷ؅ཧ)
    DomU(Webαʔό)
    DomU(Webαʔό)
    DomU(DBαʔό)

    View Slide

  54. 54
    • Xenͷ্Ͱ࡞ΒΕͨɺੲͷαʔϏεʹΑ͘࢖ΘΕ͍ͯΔ
    ࣗ཯෮چγεςϜ
    ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ

    View Slide

  55. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
    • Dom0͕DomU্Ͱಈ͍͍ͯΔHTTPΞϓϦέʔγϣϯ
    ΛHTTPͰ؂ࢹ(monit࢖༻)
    55
    ؂ࢹ ؂ࢹ
    ϋʔυ΢ΣΞ
    Dom0(Xenͷ؅ཧ)
    DomU(Webαʔό)
    DomU(Webαʔό)
    DomU(DBαʔό)

    View Slide

  56. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
    • DomUͷHTTPαʔό͕ࢮΜͩ͜ͱΛݕ஌͢Δͱ
    56
    ؂ࢹ ࢮ๢ݕ஌
    ϋʔυ΢ΣΞ
    Dom0(Xenͷ؅ཧ)
    DomU(Webαʔό)
    DomU(Webαʔό)
    DomU(DBαʔό)

    View Slide

  57. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
    • monit͕ͦͷDomUΛOS͝ͱstop/start
    • xm destroy && xm create
    57
    ؂ࢹ DomUΛOS͝ͱ
    stop/start
    ϋʔυ΢ΣΞ
    Dom0(Xenͷ؅ཧ)
    DomU(Webαʔό)
    DomU(Webαʔό)
    DomU(DBαʔό)

    View Slide

  58. ໨ࢦ͢΂͖γεςϜͷ࢟ ࣗ཯෮چ
    • উखʹ෮چ׬ྃ
    58
    ؂ࢹ ؂ࢹ
    ϋʔυ΢ΣΞ
    Dom0(Xenͷ؅ཧ)
    DomU(Webαʔό)
    DomU(Webαʔό)
    DomU(DBαʔό)

    View Slide

  59. ໨ࢦ͢΂͖γεςϜͷ࢟
    • ͋Δ೔ߴෛՙͰαʔό͕མͪͯ͸ࣗಈ෮چ͍ͯ͠ΔͷΛ
    ݟͯײಈͨ࣌͠ͷൃݴ
    59

    View Slide

  60. 60
    • ਓؒ͸ਓؒʹ͔͠Ͱ͖ͳ͍͜ͱʹ

    ूத͢Δ؀ڥΛ࡞͍ͬͯ͘
    ໨ࢦ͢΂͖γεςϜͷ࢟

    View Slide

  61. 61
    • ͸ͯͳͰͷࣄྫ঺հΛ͠·ͨ͠
    • ͜Ε͔Β΋DevΤϯδχΞͱڞʹྑ͍αʔϏεΛ࡞ͬͯ
    ͍͖·͢
    • ͜Ε͔Β΋೪Γڧࣗ͘ಈԽΛਐΊ͍͖ͯ·͢
    • ػցͷͰ͖Δ͜ͱ͸શͯػցʹ೚ͤΔੈք΁…
    ·ͱΊ

    View Slide

  62. 62
    • ͸ͯͳͰ͸OpsΤϯδχΞΛੵۃ࠾༻͍ͯ͠·͢ʂ
    • ౦ژͰ΋ژ౎Ͱ΋ʂ
    • ࢲ͸౦ژͰژ౎ͷDev/OpsΤϯδχΞͱ

    Ұॹʹ࢓ࣄ͍ͯ͠·͢ʂ
    • ࠓ೔ͷൃද΍౦ژͱژ౎ͷϦϞʔτϫʔΫͷ࿩ͳͲͰ΋

    ฉ͖͍ͨ͜ͱ͋Γ·ͨ͠Βੋඇ࠙਌ձͰʂ
    ࠷ޙʹ

    View Slide