Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LOWYAの信頼性向上とNew Relic

LOWYAの信頼性向上とNew Relic

LOWYAの信頼性向上とNew Relic by ベガコーポレーション Kazuma Kohara

本資料は
NRUG (New Relic User Group) SRE支部 Vol.1
テーマ "俺たちのSREとNew Relic" として発表させていただきました。

ベガコーポレーションのSREチームが実施してきた数々の信頼性向上に向けた取り組みの中から

- Observability for Security
- AWS Prototyping Programで実現するGraphQL Cachingによる負荷対策
- CUJ観点でのSLO Review

について、New Relic活用事例と併せてご紹介しています。

kazumax55

May 13, 2022
Tweet

More Decks by kazumax55

Other Decks in Programming

Transcript

  1. ΠϯγσϯτϨεϙϯε ͳͥॏཁʁ ໰୊͕ൃ֮ͨ͠ͱ͖ɺૉૣ͘ௐࠪΛߦ͍ݪҼΛಛఆ͢Δҝ ղܾ͢ΔͨΊʹ͸ʁ SIEM(Security Information and Event Management)͕ඞཁ “Amazon

    Detective”ͱ”SIEM on Amazon OpenSearch” ӡ༻ϑϩʔ Կ͔໰୊͕͋ͬͨͱ͖ͷௐࠪʹར༻ ηΩϡϦςΟΠϯγσϯτ/ӡ༻ϛε ౳
  2. ηΩϡϦςΟڧԽɿ·ͱΊ ηΩϡϦςΟʹ΋ObservabilityΛಋೖ͠Α͏ ηΩϡϦςΟج४ͷ४ڌɿAWS SecurityHub ڴҖݕ஌ɿAmazon GuardDuty Կ͔͋ͬͨͱ͖ʹඋ͑ͯSIEMΛ༗ޮԽ͠Α͏ɹ Amazon DetectiveͱSIEM on

    Amazon OpenSearch ͦͷଞ ͓͢͢Ί DDoSରࡦɿAWS WAF Sheild Advance & Automatic application layer DDoS mitigation ίετҟৗݕ஌ɿAWS Cost Anomaly Detection ੬ऑੑௐࠪɿAmazon Inspector
  3. LOWYAͷ೔Ͱൃੜͨ͠໰୊ͱରࡦ ໰୊1:AuroraίωΫγϣϯރׇ APIͷେ෯ͳεέʔϧΞ΢τʹΑͬͯൃੜ → max_connection ͷ஋Λमਖ਼ LEAST({DBInstanceClassMemory/9531392},5000) ͷLEASTؔ਺Λ֎ͨ͠ ໰୊2:ϝʔϧ഑ૹ஗Ԇ ஫จ׬ྃϝʔϧͷ഑ૹೳྗΛ্ճΔ଎౓Ͱ஫จ͕ೖͬͨҝɺϝʔϧ഑ૹ஗Ԇ͕ൃੜ

    → ϝʔϧ഑ૹworkerͷεέʔϧΞ΢τͰղܾ ໰୊3:ଟॏ஫จ ஫จ׬ྃϝʔϧ஗ԆͷӨڹͰɺҰ෦ͷ͓٬༷͕ଟॏ஫จͯ͠͠·ͬͨ → ໰୊2ͱ3ͷղফʹ൐͍ղܾ ໰୊4:Χʔυܾࡁᮢ஋ ஫จ͕ࡴ౸͠ɺඵܾؒࡁ਺ͷᮢ஋Λ௒͑ͨͱ͍͏Τϥʔ͕ൃੜ → ܾࡁձࣾʹඵܾؒࡁ਺ͷ্ݶ؇࿨Λґཔ
  4. GrahQL Caching Good ίετ࡟ݮ ຖճϖʔδΛදࣔ͢Δ౓ʹݺͼग़͞Ε͍ͯͨ΋ͷ͕ Ωϟογϡظؒ෼ͷ1ճ ʹݮΔ ϦΫΤετ਺ͷݮগ → APIλεΫͷεέʔϧΠϯ

    → ίετ࡟ݮ ϨΠςϯγ࡟ݮ ωοτϫʔΫతʹ͍ۙΤοδ͔ΒΦϦδϯʹॲཧͤ͞Δ͜ͱͳ͘Ϩεϙϯε͕ฦͤ Δ(਺ඦmsec -> ਺ेmsec) Ωϟογϡ؅ཧָ͕ CDNଆͰTTL, Invalidation͕Ұݩ؅ཧͰ͖Δ
  5. GrahQL Caching Bad GraphQL͸શͯͷϦΫΤετ͕POST͔ͭ1ͭͷύεʹରͯ͠དྷΔҝɺΩϟογϡ͠ਏ͍ ύε͕1͔ͭͩΒΩϟογϡͷ෦෼తͳ࡟আ͕Ͱ͖ͳ͍ ๫࿐ରࡦʹ஫ҙ͕ඞཁ ৘ใ͕গͳ͍ɿAkamai, Cloud fl are,

    GraphCDN, Fastly Cloud fl are WorkersͰGraphQLϦΫΤετΛΩϟογϡͯ͠30msͰฦ͢Α͏ʹͨ͠࿩ Cloud fl are Workers(Lambda@Edgeతͳ΋ͷ)+KVSͰ࣮ݱͨ͠ͱ͍͏ࣄྫ Good > Bad க໋తBadϙΠϯτ͸ແ͠ɻ࣮ݱʹ޲͚ຊ֨తͳݕ౼Λ։࢝
  6. SLOͷܾΊํɺݟ௚͠ํʁʁ ·ͣSLOΛܾΊͯɺঃʑʹݟ௚͢ ྫ͑͹ɺ֘౰γεςϜͷΰʔϧσϯγάφϧΛNewRelicͰर͏ ฏۉAPIԠ౴͇࣌ؒඵҎ಺, ϖʔδදࣔ଎౓͈ඵҎ಺, Τϥʔ཰͉%Ҏ಺, Քಇ཰ … etc SLOΛͲ͏΍ͬͯݟ௚ͤ͹ྑ͍͔Θ͔Βͳ͘ͳΔ

    શͯͷฏۉ஋ͩͱɺॏཁͳࢦඪ/Τϥʔ͕ຒ΋Εͯ͠·͏ ͦ΋ͦ΋શͯͷฏۉͰݟΔ͜ͱʹҙຯ͸͋Δͷ͔ʁ Ͱ͸Ͳ͏͢Ε͹ྑ͍͔ʁ CUJ(Critical User Journey)ͷ؍఺Ͱࠜຊ͔Βݟ௚͢ʂ
  7. New RelicʹΑΔCUJ؍఺ͷSLOଌఆ Backend APMͰର৅ͷCUJͰ࣮ߦ͞ΕΔTransactionΛಛఆ͠ɺApdex/ϨΠςϯγ/Τϥʔ཰ ౳ΛݟΔ REST API : Key Transaction

    / GraphQL : NRQLͰߜΔ Frontend BrowserͰCUJͰݺͼग़͞ΕΔύεͷCWV΍Τϥʔ཰ ౳ΛݟΔ Back/Frontڞ௨ CUJʹӨڹ͠ͳ͍ΤϥʔΛແࢹ͢Δ(ॏཁ) Triage > Erros Inbox ΍ TransactionErrorΠϕϯτ ΛΈͯ൑அʢྫɿϢʔβىҼͷΫϨΧܾࡁΤϥʔ౳ʣ Քಇ཰ SyntheticsͰCUJͷҰ࿈ͷಈ͖ΛscriptͰఆٛ ϨΠςϯγ΍εςʔλεΛݟΔ Infrastructure ͋·Γؾʹ͢Δඞཁ͸ͳ͍ʢSLOʹӨڹ͕ͳ͚Ε͹OK ͜ΕΒͷSLIΛ Alert/SLOͱͯ͠ઃఆ͢Δ
  8. New Relic ৽μογϡϘʔυ Workloard views ͸͍͍ͧʂ Create Workload > ؔ࿈͢Δ

    entities Λશ෦ϒνࠐΉʂ > ऴΘΓ উखʹ͍͍ײ͡ʹμογϡϘʔυԽͯ͘͠ΕΔ ଍Γͳ͍෦෼͚ͩݸผʹμογϡϘʔυΛ࡞ͬͯWorkloadʹ௥Ճ͢Ε͹ϦϯΫͯ͘͠ΕΔ τϥϒϧγϡʔςΟϯά΋εϜʔζ SLO΋Service Level ManagementΛ࢖ͬͯ࠶ఆٛ ͍ܰʂݟ΍͍͢ʂ
  9. SLO Reviewɿ·ͱΊ CUJ؍఺ͰSLO ReviewΛ࣮ࢪ ઌʹCUJΛܾΊɺCUJΛݩʹߟ͍͑ͯ͘ CUJʹؔ࿈͠ͳ͍Τϥʔ͸ແࢹ SRE͕SLOΛܾΊͯ͠·Θͳ͍ SLOΛଞਓࣄͰ͸ͳࣗ͘෼ࣄͱͯ͠औΓ૊ΜͰ΋Β͏ҝ DevνʔϜʹܾΊͯ΋Β͍ɺSRE͸ܭଌͱܾఆΛαϙʔτ͢Δ Workload

    views͸͍͍ͧ ؆୯ͳͷͰੋඇ͓ࢼ͍ͩ͘͠͞ ࠓޙ New RelicΛ࢖ͬͯ΍Γ͍ͨ͜ͱɿϏδωεμογϡϘʔυͷ੔උ ϑΝωϧ෼ੳ/ώετάϥϜ ౳Λ࢖ͬͯSLO Reviewʹ໾ཱ͍ͯͨʢ໨ඪΛԿඵʹ͢Δ͔౳ʁ ച্/஫จ਺ʢΤϥʔόδΣοτΛఀࢭ࣌ؒ͡Όͳͯ͘ଛֹࣦʹ͍ͨ͠