Slide 1

Slide 1 text

Copyright © 2009-2018 eureka, inc. All rights reserved. Takuya Onda / eureka, Inc. 2018-09-07 Builderscon Tokyo WebαʔϏεͷ඼࣭ͱ͸Կ͔ʁ Ξϥʔτ஍ࠈͱ؂ࢹͷࣦഊɺαʔϏεϨϕϧ໨ඪઃܭ
 ͔ΒֶΜͩ3ͭͷ౴͑

Slide 2

Slide 2 text

Introduction ■ Takuya Onda – eureka, Inc. – SRE team Head

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

About Us - IAC/Match Group

Slide 5

Slide 5 text

Agenda ■ WebΞϓϦέʔγϣϯ؂ࢹʹ·ͭΘΔٕज़τϨϯυ ■ ؂ࢹͷ໨తͱɺݱ৔ͷ՝୊ͱ͸ ■ Τ΢ϨΧͰͷࣦഊ / ཱͯ௚͠ࣄྫͷ঺հ ■ ؂ࢹͷஈ֊త࡮৽ɾ࣮ྫ঺հ

Slide 6

Slide 6 text

8FCΞϓϦέʔγϣϯ؂ࢹʹ ·ͭΘΔٕज़τϨϯυ

Slide 7

Slide 7 text

ύϒϦοΫΫϥ΢υͷ୆಄ ■ ΑΓૣ͘ɺڧྗͳϚγϯϦιʔεͷௐୡ͕༰қʹ ■ αʔόΛ࢖͍ࣺͯΔલఏͷΞʔΩςΫνϟ

Slide 8

Slide 8 text

؂ࢹπʔϧͷॆ࣮ ■ SaaSܕαʔόʔ؂ࢹαʔϏε ■ ΠϯςάϨʔγϣϯͷॆ࣮

Slide 9

Slide 9 text

DevOpsɾSRE ■ ߴ͍։ൃੜ࢈ੑͱ҆ఆӡ༻΁ͷཁٻ ■ Culture, Automation, Lean, Measurement, Sharing

Slide 10

Slide 10 text

γεςϜͷෳࡶԽ ■ ϚΠΫϩαʔϏεɾSPAɾσόΠεͷଟ༷Խ ■ ඇػೳཁ݅ͷ؂ࢹ΁ͷχʔζ

Slide 11

Slide 11 text

͜Μͳܦݧ͋Γ·ͤΜ͔ʁ

Slide 12

Slide 12 text

■ ʮ4UBUVT͕૿͑ͯ·͢ʯ – ຖ೔͜ͷ࣌ؒͰͯΔΑͳɺɺ – ͑Β͍ਓʮ͜ͷΞϥʔτେৎ෉ͳͷʁʯ ■ ʮ%#ͷίωΫγϣϯ਺͕YYYΛ௒͑·ͨ͠ʯ – ͑ɺͲ͏͢Ε͹͍͍ͷʁࠓϐʔΫλΠϜͩΑʁ – εύΠΫʁϝϯςೖΕΔʁͱΓ͋͑ͣ؍࡯ɺɺʁ ■ ʮ999ͷΤϥʔ͕ൃੜ͠·ͨ͠ʯ – ੲ͔Β͍Δਓʮ͜Ε͸ແࢹͯ͠0,ʯ – ࠷ۙೖͬͨਓʮʜʂʁʂʁʯ Ξϥʔτ͋Δ͋Δ

Slide 13

Slide 13 text

Ξϥʔτ͋Δ͋Δ ■ ຊ౰ʹҟৗͳͷ͔Θ͔Βͳ͍ ■ Ξϥʔτ͕ߦಈʹ݁ͼ͔ͭͳ͍ ■ ਖ਼͍͠ํ޲ʹ޲͔͍ͬͯΔ͔Θ͔Βͳ͍ (඼࣭ͱ͸?)

Slide 14

Slide 14 text

γεςϜΛ؂ࢹ͢Δ໨తͱ͸ ■ Ϣʔβʔͷຬ଍͢ΔੑೳཁٻʹԠ͍͑ͨ ■ ͦͷͨΊʹɺଈ࣌ʹγεςϜҟৗΛݕ஌͍ͨ͠ ■ ҟৗʹଈ࠲ʹରԠ͠ɺҟৗൃੜظؒΛ࠷খԽ͍ͨ͠

Slide 15

Slide 15 text

γεςϜΛ؂ࢹ͢Δ໨తͱ͸ ■ Ϣʔβʔͷຬ଍͢ΔੑೳཁٻʹԠ͍͑ͨ ■ ͦͷͨΊʹɺଈ࣌ʹγεςϜҟৗΛݕ஌͍ͨ͠ ■ ҟৗʹଈ࠲ʹରԠ͠ɺҟৗൃੜظؒΛ࠷খԽ͍ͨ͠ ϢʔβͷٻΊΔ඼࣭ʹԠ͑Δͷ͕Ձ஋ ؂ࢹ͸໨తͷͨΊͷखஈ

Slide 16

Slide 16 text

ΞΫγσϯτ͸ฏৗͨΕ By SRE Workbook ■ ࣦഊͷίετΛ཈͑ΔͨΊʹૣΊʹಈ͘ͷ͕େࣄ ■ MTTRʢฏۉ෮چ࣌ؒ)͕୹͍΄Ͳ։ൃऀෛ୲খ͍͞ ■ ໰୊ൃݟ͸ޙʹͳΔ΄Ͳम෮͕೉͍͠

Slide 17

Slide 17 text

Τ΢ϨΧͰͷ؂ࢹͷࣦഊྫ

Slide 18

Slide 18 text

Τ΢ϨΧͰͷ؂ࢹͷࣦഊ ■ Ξϥʔτ͕ߦಈʹ݁ͼ͔ͭͳ͍ ■ ࠓରԠ͢΂͖ͳͷ͔൑அͰ͖ͳ͍

Slide 19

Slide 19 text

Ξϥʔτ͕ߦಈʹ݁ͼ͔ͭͳ͍ ■ 1ɿͱΓ͋͑ͣಈ͍ͯΔ ■ 2ɿৗʹҟৗ ■ 3ɿԿ΋Ͱ͖ͳ͍

Slide 20

Slide 20 text

Ξϥʔτ͕ߦಈʹ݁ͼ͔ͭͳ͍ ■ 1ɿͱΓ͋͑ͣಈ͍ͯΔ ■ 2ɿৗʹҟৗ ■ 3ɿԿ΋Ͱ͖ͳ͍ • ϐʔΫλΠϜʹϩʔΞϕ͕ۤ͘͠ͳΔDB • DynamoͷΩϟύγςΟ͕࢒ΓΘ͔ͣ • ຖ࣌CPU͕ுΓ෇͘ϝʔϧ഑৴αʔό

Slide 21

Slide 21 text

Ξϥʔτ͕ߦಈʹ݁ͼ͔ͭͳ͍ ■ 1ɿͱΓ͋͑ͣಈ͍ͯΔ ■ 2ɿৗʹҟৗ ■ 3ɿԿ΋Ͱ͖ͳ͍ • ຖ೔٧·ΔδϣϒΩϡʔ • ৗʹྲྀΕͯΔΞϓϦέʔγϣϯΤϥʔϩά • σϓϩΠͷͨͼൃੜ͢ΔΤϥʔ

Slide 22

Slide 22 text

Ξϥʔτ͕ߦಈʹ݁ͼ͔ͭͳ͍ ■ 1ɿͱΓ͋͑ͣಈ͍ͯΔ ■ 2ɿৗʹҟৗ ■ 3ɿԿ΋Ͱ͖ͳ͍ • 5xxΤϥʔ૿͑ͯΔ͚ͲԿ͜Εʁ(༷ࢠݟ) • ετϨʔδܥͷανϡϨʔγϣϯ • ߏ੒࠶ݱੑ͕ͳ͍SPOFͳαʔό

Slide 23

Slide 23 text

ࠓରԠ͢΂͖͔൑அͰ͖ͳ͍ ■ ݁ہ͜ΕϢʔβӨڹ͋Μͷʁ ■ ͲΕ͘Β͍Өڹ͋Μͷʁ ■ ͦ΋ͦ΋ఆྔԽͰ͖Μͷʁ ■ ͜Ε߃ٱରԠ͠ͳ͍ͱϚζΠͷʁ ■ ࣄۀࢪࡦΑΓ༏ઌ͢Μͷʁ

Slide 24

Slide 24 text

ࣾ಺ͷงғؾʹ΋Өڹ ΞϥʔτݟΖͬͯݴ͏͚Ͳҙຯͳ͘Ͷʁ ͏ͪͷγεςϜɺຊ౰ʹେৎ෉ͳͷʁʁ Ͳ͏ͤΈΜͳؾʹͯ͠ͳ͍͍͍͠΍ ࠶ൃ๷ࢭɺ͍ͭ΋Կ΋มΘΒͳ͍͡ΌΜ

Slide 25

Slide 25 text

Ͳ͏΍ཱͬͯͪ޲͔͔ͬͨ

Slide 26

Slide 26 text

ํ਑ ■ 1ɿఆྔ໨ඪΛઃఆ͢Δ ■ 2ɿΞϥʔτ = ଈ࠲ΞΫγϣϯͱ͢Δ ■ 3ɿશମΛγϯϓϧʹ؅ཧ͢Δ ■ 4ɿ؂ࢹ͕ϥΫͳΞʔΩςΫνϟʹ͢Δ ■ 5ɿҟৗݕ஌ͱύϑΥʔϚϯε໨ඪΛ۠ผ͢Δ

Slide 27

Slide 27 text

1ɿఆྔ໨ඪΛઃఆ͢Δ ■ αʔϏεϨϕϧࢦඪ (SLI) Λఆٛ ■ αʔϏεϨϕϧ໨ඪ (SLO)Λઃఆ ■ SLOΛΞϥʔτᮢ஋ͱλεΫ༏ઌ౓ͷج४ʹ

Slide 28

Slide 28 text

1ɿఆྔ໨ඪΛઃఆ͢Δ ■ αʔϏεϨϕϧࢦඪ (SLI) Λఆٛ ■ αʔϏεϨϕϧ໨ඪ (SLO)Λઃఆ ■ SLOΛΞϥʔτᮢ஋ͱλεΫ༏ઌ౓ͷج४ʹ • SLI = ੒ޭϦΫΤετ / ૯ϦΫΤετ • SLO = SLI > 99.95 (ظؒɿ1िؒ)

Slide 29

Slide 29 text

2ɿΞϥʔτ = ଈ࠲ΞΫγϣϯͱ͢Δ ■ ຊ౰ʹΞΫγϣϯ͕ඞཁͳ΋ͷ͚ͩΞϥʔτ໐Β͢ ■ ଈ࠲ʹରԠ͕ՄೳͳΞʔΩςΫνϟʹม͍͑ͯ͘ • SLOະୡͷཁҼͱͳΔ΋ͷ • ରԠʹ͔͔࣌ؒΔܥ (ετϨʔδܥͱ͔)

Slide 30

Slide 30 text

2ɿΞϥʔτ = ଈ࠲ʹΞΫγϣϯ͕ඞཁͳࣄ৅ͱ͢Δ ■ ຊ౰ʹΞΫγϣϯ͕ඞཁͳ΋ͷ͚ͩΞϥʔτ໐Β͢ ■ ଈ࠲ʹରԠ͕ՄೳͳΞʔΩςΫνϟʹม͍͑ͯ͘ • ଈ࠲ʹαʔόϦιʔεΛ૿ڧ / ަ׵Ͱ͖Δ • LB / API / Batch / DB / Cache / etc,,

Slide 31

Slide 31 text

3ɿશମΛγϯϓϧʹ؅ཧ͢Δ ■ πʔϧͷ౷Ұ ■ ඼࣭ͷఆٛ΍ᮢ஋ΛҰݩԽ • ؂ࢹʹར༻͢ΔπʔϧΛ͠΅Δ • ؂ࢹઃఆΛίʔυԽ͢Δ

Slide 32

Slide 32 text

3ɿશମΛγϯϓϧʹ؅ཧ͢Δ ■ πʔϧͷ౷Ұ ■ ඼࣭ͷఆٛ΍ᮢ஋ΛҰݩԽ • SLI / SLOΛ౷Ұ • Threshold / Rate / Change / Anomaly

Slide 33

Slide 33 text

4ɿ؂ࢹ͕ϥΫͳΞʔΩςΫνϟʹ͢Δ ■ αʔόͷަ׵͕Χϯλϯ ■ ࣗલओٛΛ΍ΊΔ ■ ࣄۀͱӡ༻޻਺(؂ࢹίετ)Λൺྫͤ͞ͳ͍

Slide 34

Slide 34 text

4-1ɿ؂ࢹ͕ϥΫ ~ αʔόަ׵͕Χϯλϯ ■ ҟৗͷ͋Δαʔό͸͙ࣺͯ͢Δ – ίʔυ͔ΒϏϧυ͞ΕͨΠϝʔδ͕͙͢౤ೖՄೳͳঢ়ଶ Scheduling Rotate API Worker

Slide 35

Slide 35 text

4-2ɿ؂ࢹ͕ϥΫ ~ ࣗલओٛΛ΍ΊΔ ■ ڊਓ(AWS)ͷݞʹ৐Δɻ؂ࢹର৅ΛݮΒ͢ – εέʔϧΞ΢τ/ Ξοϓ͕ϥΫ & ϑΣΠϧΦʔό؅ཧෆཁ S3 Aurora Dynamo ElastiCache SQS

Slide 36

Slide 36 text

4-3ɿ؂ࢹ͕ϥΫ ~ ࣄۀͱ؂ࢹίετΛൺྫͤ͞ͳ͍ ■ ಉ͡ϓϩϏδϣχϯάϓϩηεͱٕज़ελοΫ – ٕज़ߏ੒ͷΏΒ͗Λ࡞Βͳ͍ Pairs JP Pairs GL Capacity = LL Capacity = M

Slide 37

Slide 37 text

5ɿҟৗݕ஌ͱύϑΥʔϚϯε໨ඪΛ۠ผ͢Δ ■ ҟৗݕ஌ͱ໨ඪୡ੒ͷ؍ଌ͸DurationΛ෼͚Δ ■ αʔϏεΛ࢖͑ͳ͍ͱ࢖͍ͮΒ͍Λ۠ผ͢Δ • ҟৗɿΞϥʔτʹΑΔݕ஌(within 1min) • ໨ඪɿఆظతͳݕࠪ(within 1week)

Slide 38

Slide 38 text

5ɿҟৗݕ஌ͱύϑΥʔϚϯε໨ඪΛ۠ผ͢Δ ■ ҟৗݕ஌ͱ໨ඪୡ੒ͷ؍ଌ͸DurationΛ෼͚Δ ■ αʔϏεΛ࢖͑ͳ͍ͱ࢖͍ͮΒ͍Λ۠ผ͢Δ • ࢖͑ͳ͍ɿϩάΠϯ/ ݕࡧͰ͖ͳ͍ͳͲ • ࢖͍ͮΒ͍ɿαʔϏεମײ͕஗͍ / ॏ͍

Slide 39

Slide 39 text

ํ਑(࠶ܝ) ■ 1ɿఆྔ໨ඪΛઃఆ͢Δ ■ 2ɿΞϥʔτ = ଈ࠲ΞΫγϣϯͱ͢Δ ■ 3ɿશମΛγϯϓϧʹ؅ཧ͢Δ ■ 4ɿ؂ࢹ͕ϥΫͳΞʔΩςΫνϟʹ͢Δ ■ 5ɿҟৗݕ஌ͱύϑΥʔϚϯε໨ඪΛ۠ผ͢Δ

Slide 40

Slide 40 text

؂ࢹͱΞʔΩςΫνϟΛ࡮৽͠·ͨ͠

Slide 41

Slide 41 text

؂ࢹΛ࡮৽

Slide 42

Slide 42 text

؂ࢹɾରԠΛஈ֊తʹਐԽ ■ ҟৗΛ஌ΕΔ(ݟΕΔ) ■ ҟৗΛݕ஌Ͱ͖Δ ■ ҟৗʹରԠͰ͖Δ ■ ҟৗ͕ࣗಈͰम෮͢Δ

Slide 43

Slide 43 text

ҟৗΛ஌ΕΔ(ݟΕΔ) ■ Datadog ■ StackDriver Loggin

Slide 44

Slide 44 text

Metrics Aggregate AWS Integration DatadogʹΑΔϗετϕʔε؂ࢹ &Integration

Slide 45

Slide 45 text

StackDriver LoggingʹΑΔϩάՄࢹԽͱϨϙʔτੜ੒ Metrics / Log Monitoring Alert Slack call you Log Aggregation Sync to DWH Generate Performance Report Hosting Report Dev & SRE

Slide 46

Slide 46 text

StackDriver LoggingʹΑΔϩάՄࢹԽͱϨϙʔτੜ੒ Metrics / Log Monitoring Alert Slack call you Log Aggregation Sync to DWH Generate Performance Report Hosting Report Dev & SRE • Ξϥʔτ & ଈ࣌ରԠ • Windowɿ1෼

Slide 47

Slide 47 text

StackDriver LoggingʹΑΔϩάՄࢹԽͱϨϙʔτੜ੒ Metrics / Log Monitoring Alert Slack call you Log Aggregation Sync to DWH Generate Performance Report Hosting Report Dev & SRE • ύϑΥʔϚϯεৼฦ • Windowɿ1िؒ

Slide 48

Slide 48 text

StackDriver LoggingʹΑΔϩάՄࢹԽͱϨϙʔτੜ੒ Metrics / Log Monitoring Alert Slack call you Log Aggregation Sync to DWH Generate Performance Report Hosting Report Dev & SRE • ύϑΥʔϚϯεৼฦ • Windowɿ1िؒ Ϩϙʔτൈਮ(Ұ෦)ɾϨΠςϯγώετάϥϜ

Slide 49

Slide 49 text

StackDriver LoggingʹΑΔϩάՄࢹԽͱϨϙʔτੜ੒ Metrics / Log Monitoring Alert Slack call you Log Aggregation Sync to DWH Generate Performance Report Hosting Report Dev & SRE • ύϑΥʔϚϯεৼฦ • Windowɿ1िؒ Ϩϙʔτൈਮ(Ұ෦)ɾRest EndpointผϦΫΤετϘϦϡʔϜਪҠ

Slide 50

Slide 50 text

ҟৗΛݕ஌Ͱ͖Δ ■ ֎ܗ؂ࢹ (αʔϏε؂ࢹ)ʹΑΔΞϥʔτ ■ Ϧιʔε؂ࢹʹΑΔΞϥʔτ ■ ύϑΥʔϚϯεʹΑΔΞϥʔτ ■ ϩά؂ࢹʹΑΔΞϥʔτ

Slide 51

Slide 51 text

ҟৗΛݕ஌Ͱ͖Δ ■ ֎ܗ؂ࢹ (αʔϏε؂ࢹ)ʹΑΔΞϥʔτ ■ Ϧιʔε؂ࢹʹΑΔΞϥʔτ ■ ύϑΥʔϚϯεʹΑΔΞϥʔτ ■ ϩά؂ࢹʹΑΔΞϥʔτ • ଈରԠඞཁ • SSL΋ηοτͰ(͔͔࣌ؒΔ)

Slide 52

Slide 52 text

ҟৗΛݕ஌Ͱ͖Δ ■ ֎ܗ؂ࢹ (αʔϏε؂ࢹ)ʹΑΔΞϥʔτ ■ Ϧιʔε؂ࢹʹΑΔΞϥʔτ ■ ύϑΥʔϚϯεʹΑΔΞϥʔτ ■ ϩά؂ࢹʹΑΔΞϥʔτ • εςʔτϨεϨΠϠ͸ݟͳ͍(ࣺͯΔ) • ετϨʔδܥ͸ݟΔ(ରԠʹ͔͔࣌ؒΔ)

Slide 53

Slide 53 text

ҟৗΛݕ஌Ͱ͖Δ ■ ֎ܗ؂ࢹ (αʔϏε؂ࢹ)ʹΑΔΞϥʔτ ■ Ϧιʔε؂ࢹʹΑΔΞϥʔτ ■ ύϑΥʔϚϯεʹΑΔΞϥʔτ ■ ϩά؂ࢹʹΑΔΞϥʔτ • Latency͸લिൺ or લ࣌ؒൺͰͷมԽ཰ • Requestࣦഊ͸SLO x Status CodeͰݟΔ

Slide 54

Slide 54 text

ҟৗΛݕ஌Ͱ͖Δ ■ ֎ܗ؂ࢹ (αʔϏε؂ࢹ)ʹΑΔΞϥʔτ ■ Ϧιʔε؂ࢹʹΑΔΞϥʔτ ■ ύϑΥʔϚϯεʹΑΔΞϥʔτ ■ ϩά؂ࢹʹΑΔΞϥʔτ • લिൺ or લ࣌ؒൺͰͷมԽ཰ • ৗ࣌ྲྀΕΔܥͳΒAnomaly detection͕٢

Slide 55

Slide 55 text

ҟৗʹରԠͰ͖Δ ■ Ϧιʔεͷ૿ڧɺަ׵ɺϩʔϧόοΫΛ༰қʹ ■ ো֐ൃੜ࣌ͷௐࠪίετΛԼ͛Δ

Slide 56

Slide 56 text

Ϧιʔεͷ૿ڧɺަ׵ɺϩʔϧόοΫΛ༰қʹ͢Δ Scale Out / Discard Scale Up Add Shard Scale Out Scale Up Vertical Split

Slide 57

Slide 57 text

Ϧιʔεͷ૿ڧɺަ׵ɺϩʔϧόοΫΛ༰қʹ͢Δ Scale Out / Discard Scale Up Add Shard Scale Out Scale Up Vertical Split • ؆୯ & ϦʔυλΠϜແ͘Ϧιʔε૿΍ͤΔ • ҟৗܥΛ؆୯ʹ੾Γ཭ͤΔΑ͏ʹ • ετϨʔδܥ͕ΩϞ.ࣄલʹ༧ߦ࿅शΛ

Slide 58

Slide 58 text

ҟৗ͕ࣗಈͰम෮͢Δ ■ ΦʔτώʔϦϯά ■ ࠞಱ(ΧΦε)ͷ஫ೖ

Slide 59

Slide 59 text

ҟৗ͕ࣗಈͰम෮͢Δ ■ ΦʔτώʔϦϯά ■ ࠞಱ(ΧΦε)ͷ஫ೖ • ҟৗϗετͷ੾Γ཭͠΍ϑΣΠϧΦʔό • ετϨʔδܥ͕ΩϞ • ఆظతͳආ೉܇࿅

Slide 60

Slide 60 text

ҟৗ͕ࣗಈͰम෮͢Δ ■ ΦʔτώʔϦϯά ■ ࠞಱ(ΧΦε)ͷ஫ೖ • ҙਤతͳো֐ͷ஫ೖ • ΦʔτώʔϦϯάͷڧ੍ࢼݧ & ೔ৗԽ • ઓ͍͸͜Ε͔Βͩͥɺɺʂ

Slide 61

Slide 61 text

·ͱΊ

Slide 62

Slide 62 text

ࠓ೔ͷ͸ͳ͠ ■ WebΞϓϦέʔγϣϯ؂ࢹʹ·ͭΘΔٕज़τϨϯυ ■ ؂ࢹͷ໨తͱɺݱ৔ͷ՝୊ͱ͸ ■ Τ΢ϨΧͰͷࣦഊ / ཱͯ௚͠ࣄྫͷ঺հ ■ ϞχλϦϯάͷஈ֊తਐԽɾ࣮ྫ঺հ

Slide 63

Slide 63 text

·ͱΊ ■ ؂ࢹͷ໨త͸MTTRΛ࠷খԽ͠඼࣭ཁٻΛຬͨ͢ࣄ ■ Ξϥʔτ͸ରԠͰ͖ͳ͚Ε͹ҙຯ͕ͳ͍ ■ αʔϏε඼࣭ͷ໨ඪ஋ΛఆΊΔ΂͠ ■ ؂ࢹϨεͳΞʔΩςΫνϟΛ໨ࢦ͢΂͠ ■ Ξϥʔτ + ఆظతͳ඼࣭νΣοΫΛDev / OpsͰ

Slide 64

Slide 64 text

·ͱΊ ■ γεςϜ͸೥ʑ੒௕ & ෳࡶԽ͍ͯ͘͠ ■ ΍Γ͍ͨ͜ͱ͸ͨ͘͞Μ͋Δʂ ■ Τ΢ϨΧ͸SREνʔϜͷϝϯόʔΛืूதͰ͢ʂ

Slide 65

Slide 65 text

CONFIDENTIAL Thank you :) Thank you :)

Slide 66

Slide 66 text

Any Questions??