Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LOWYAの信頼性向上とNew Relic

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

LOWYAの信頼性向上とNew Relic

LOWYAの信頼性向上とNew Relic by ベガコーポレーション Kazuma Kohara

本資料は
NRUG (New Relic User Group) SRE支部 Vol.1
テーマ "俺たちのSREとNew Relic" として発表させていただきました。

ベガコーポレーションのSREチームが実施してきた数々の信頼性向上に向けた取り組みの中から

- Observability for Security
- AWS Prototyping Programで実現するGraphQL Cachingによる負荷対策
- CUJ観点でのSLO Review

について、New Relic活用事例と併せてご紹介しています。

Avatar for kazumax55

kazumax55

May 13, 2022
Tweet

More Decks by kazumax55

Other Decks in Programming

Transcript

  1. ΠϯγσϯτϨεϙϯε ͳͥॏཁʁ ໰୊͕ൃ֮ͨ͠ͱ͖ɺૉૣ͘ௐࠪΛߦ͍ݪҼΛಛఆ͢Δҝ ղܾ͢ΔͨΊʹ͸ʁ SIEM(Security Information and Event Management)͕ඞཁ “Amazon

    Detective”ͱ”SIEM on Amazon OpenSearch” ӡ༻ϑϩʔ Կ͔໰୊͕͋ͬͨͱ͖ͷௐࠪʹར༻ ηΩϡϦςΟΠϯγσϯτ/ӡ༻ϛε ౳
  2. ηΩϡϦςΟڧԽɿ·ͱΊ ηΩϡϦςΟʹ΋ObservabilityΛಋೖ͠Α͏ ηΩϡϦςΟج४ͷ४ڌɿAWS SecurityHub ڴҖݕ஌ɿAmazon GuardDuty Կ͔͋ͬͨͱ͖ʹඋ͑ͯSIEMΛ༗ޮԽ͠Α͏ɹ Amazon DetectiveͱSIEM on

    Amazon OpenSearch ͦͷଞ ͓͢͢Ί DDoSରࡦɿAWS WAF Sheild Advance & Automatic application layer DDoS mitigation ίετҟৗݕ஌ɿAWS Cost Anomaly Detection ੬ऑੑௐࠪɿAmazon Inspector
  3. LOWYAͷ೔Ͱൃੜͨ͠໰୊ͱରࡦ ໰୊1:AuroraίωΫγϣϯރׇ APIͷେ෯ͳεέʔϧΞ΢τʹΑͬͯൃੜ → max_connection ͷ஋Λमਖ਼ LEAST({DBInstanceClassMemory/9531392},5000) ͷLEASTؔ਺Λ֎ͨ͠ ໰୊2:ϝʔϧ഑ૹ஗Ԇ ஫จ׬ྃϝʔϧͷ഑ૹೳྗΛ্ճΔ଎౓Ͱ஫จ͕ೖͬͨҝɺϝʔϧ഑ૹ஗Ԇ͕ൃੜ

    → ϝʔϧ഑ૹworkerͷεέʔϧΞ΢τͰղܾ ໰୊3:ଟॏ஫จ ஫จ׬ྃϝʔϧ஗ԆͷӨڹͰɺҰ෦ͷ͓٬༷͕ଟॏ஫จͯ͠͠·ͬͨ → ໰୊2ͱ3ͷղফʹ൐͍ղܾ ໰୊4:Χʔυܾࡁᮢ஋ ஫จ͕ࡴ౸͠ɺඵܾؒࡁ਺ͷᮢ஋Λ௒͑ͨͱ͍͏Τϥʔ͕ൃੜ → ܾࡁձࣾʹඵܾؒࡁ਺ͷ্ݶ؇࿨Λґཔ
  4. GrahQL Caching Good ίετ࡟ݮ ຖճϖʔδΛදࣔ͢Δ౓ʹݺͼग़͞Ε͍ͯͨ΋ͷ͕ Ωϟογϡظؒ෼ͷ1ճ ʹݮΔ ϦΫΤετ਺ͷݮগ → APIλεΫͷεέʔϧΠϯ

    → ίετ࡟ݮ ϨΠςϯγ࡟ݮ ωοτϫʔΫతʹ͍ۙΤοδ͔ΒΦϦδϯʹॲཧͤ͞Δ͜ͱͳ͘Ϩεϙϯε͕ฦͤ Δ(਺ඦmsec -> ਺ेmsec) Ωϟογϡ؅ཧָ͕ CDNଆͰTTL, Invalidation͕Ұݩ؅ཧͰ͖Δ
  5. GrahQL Caching Bad GraphQL͸શͯͷϦΫΤετ͕POST͔ͭ1ͭͷύεʹରͯ͠དྷΔҝɺΩϟογϡ͠ਏ͍ ύε͕1͔ͭͩΒΩϟογϡͷ෦෼తͳ࡟আ͕Ͱ͖ͳ͍ ๫࿐ରࡦʹ஫ҙ͕ඞཁ ৘ใ͕গͳ͍ɿAkamai, Cloud fl are,

    GraphCDN, Fastly Cloud fl are WorkersͰGraphQLϦΫΤετΛΩϟογϡͯ͠30msͰฦ͢Α͏ʹͨ͠࿩ Cloud fl are Workers(Lambda@Edgeతͳ΋ͷ)+KVSͰ࣮ݱͨ͠ͱ͍͏ࣄྫ Good > Bad க໋తBadϙΠϯτ͸ແ͠ɻ࣮ݱʹ޲͚ຊ֨తͳݕ౼Λ։࢝
  6. SLOͷܾΊํɺݟ௚͠ํʁʁ ·ͣSLOΛܾΊͯɺঃʑʹݟ௚͢ ྫ͑͹ɺ֘౰γεςϜͷΰʔϧσϯγάφϧΛNewRelicͰर͏ ฏۉAPIԠ౴͇࣌ؒඵҎ಺, ϖʔδදࣔ଎౓͈ඵҎ಺, Τϥʔ཰͉%Ҏ಺, Քಇ཰ … etc SLOΛͲ͏΍ͬͯݟ௚ͤ͹ྑ͍͔Θ͔Βͳ͘ͳΔ

    શͯͷฏۉ஋ͩͱɺॏཁͳࢦඪ/Τϥʔ͕ຒ΋Εͯ͠·͏ ͦ΋ͦ΋શͯͷฏۉͰݟΔ͜ͱʹҙຯ͸͋Δͷ͔ʁ Ͱ͸Ͳ͏͢Ε͹ྑ͍͔ʁ CUJ(Critical User Journey)ͷ؍఺Ͱࠜຊ͔Βݟ௚͢ʂ
  7. New RelicʹΑΔCUJ؍఺ͷSLOଌఆ Backend APMͰର৅ͷCUJͰ࣮ߦ͞ΕΔTransactionΛಛఆ͠ɺApdex/ϨΠςϯγ/Τϥʔ཰ ౳ΛݟΔ REST API : Key Transaction

    / GraphQL : NRQLͰߜΔ Frontend BrowserͰCUJͰݺͼग़͞ΕΔύεͷCWV΍Τϥʔ཰ ౳ΛݟΔ Back/Frontڞ௨ CUJʹӨڹ͠ͳ͍ΤϥʔΛແࢹ͢Δ(ॏཁ) Triage > Erros Inbox ΍ TransactionErrorΠϕϯτ ΛΈͯ൑அʢྫɿϢʔβىҼͷΫϨΧܾࡁΤϥʔ౳ʣ Քಇ཰ SyntheticsͰCUJͷҰ࿈ͷಈ͖ΛscriptͰఆٛ ϨΠςϯγ΍εςʔλεΛݟΔ Infrastructure ͋·Γؾʹ͢Δඞཁ͸ͳ͍ʢSLOʹӨڹ͕ͳ͚Ε͹OK ͜ΕΒͷSLIΛ Alert/SLOͱͯ͠ઃఆ͢Δ
  8. New Relic ৽μογϡϘʔυ Workloard views ͸͍͍ͧʂ Create Workload > ؔ࿈͢Δ

    entities Λશ෦ϒνࠐΉʂ > ऴΘΓ উखʹ͍͍ײ͡ʹμογϡϘʔυԽͯ͘͠ΕΔ ଍Γͳ͍෦෼͚ͩݸผʹμογϡϘʔυΛ࡞ͬͯWorkloadʹ௥Ճ͢Ε͹ϦϯΫͯ͘͠ΕΔ τϥϒϧγϡʔςΟϯά΋εϜʔζ SLO΋Service Level ManagementΛ࢖ͬͯ࠶ఆٛ ͍ܰʂݟ΍͍͢ʂ
  9. SLO Reviewɿ·ͱΊ CUJ؍఺ͰSLO ReviewΛ࣮ࢪ ઌʹCUJΛܾΊɺCUJΛݩʹߟ͍͑ͯ͘ CUJʹؔ࿈͠ͳ͍Τϥʔ͸ແࢹ SRE͕SLOΛܾΊͯ͠·Θͳ͍ SLOΛଞਓࣄͰ͸ͳࣗ͘෼ࣄͱͯ͠औΓ૊ΜͰ΋Β͏ҝ DevνʔϜʹܾΊͯ΋Β͍ɺSRE͸ܭଌͱܾఆΛαϙʔτ͢Δ Workload

    views͸͍͍ͧ ؆୯ͳͷͰੋඇ͓ࢼ͍ͩ͘͠͞ ࠓޙ New RelicΛ࢖ͬͯ΍Γ͍ͨ͜ͱɿϏδωεμογϡϘʔυͷ੔උ ϑΝωϧ෼ੳ/ώετάϥϜ ౳Λ࢖ͬͯSLO Reviewʹ໾ཱ͍ͯͨʢ໨ඪΛԿඵʹ͢Δ͔౳ʁ ച্/஫จ਺ʢΤϥʔόδΣοτΛఀࢭ࣌ؒ͡Όͳͯ͘ଛֹࣦʹ͍ͨ͠