$30 off During Our Annual Pro Sale. View Details »

3-shake SRE Tech Talk #10 LLMのO11yに触れる

abnoumaru
August 23, 2024

3-shake SRE Tech Talk #10 LLMのO11yに触れる

abnoumaru

August 23, 2024
Tweet

More Decks by abnoumaru

Other Decks in Technology

Transcript

  1. • ॴଐ • גࣜձࣾεϦʔγΣΠΫ • Sreakeࣄۀ෦ άϧʔϓϦʔμʔ • ڵຯ •

    ӡ༻ ! / SRE " / O11y • ࢿྉ • speakerdeck.com/abnoumaru • ࢲʹ͍ͭͯ • abnoumaru.com 2024/08/23 3-shake SRE Tech Talk #10 2
  2. ༷ʑͳαΠτͰLLM O11yʹ͍ͭͯ৮ΕΒΕ͍ͯΔ • 2023/09/15 LLM Monitoring and Observability — A

    Summary of Techniques and Approaches for Responsible AI • 2023/09/28 Observability for Large Language Models - Understanding & Improving Your Use of LLMs • 2024/02/26 Techniques and approaches for monitoring large language models on AWS • 2024/03/28 The LLM stack brings a different set of metrics than your team usually tracks. In this Makers episode, co-host Janakiram MSV identifies the new "golden signals." • 2024/05/22 Snowflake Announces Agreement to Acquire TruEra AI Observability Platform to Bring LLM and ML Observability to the AI Data Cloud • 2024/05/27 Mastering LLM Monitoring and Observability: A Comprehensive Guide for 2024 • 2024/06/04 An Introduction to Observability for LLM-based applications using OpenTelemetry • 2024/06/24 LLM Observability: Azure OpenAI • 2024/07/18 A complete guide to LLM observability with OpenTelemetry and Grafana Cloud 2024/08/23 3-shake SRE Tech Talk #10 9
  3. Observability ! • Observability Engineering3ΑΓ • ʮγεςϜ͕ͲͷΑ͏ͳঢ়ଶʹͳͬͨ ͱͯ͠΋ɺͦΕ͕ͲΜͳ࢐৽ͰحົͰ ͋ͬͯ΋ɺͲΕ͚ͩཧղ͠આ໌Ͱ͖Δ ͔Λࣔ͢ई౓ʯ

    • @nwiizoࢯͷՄ؍ଌੑΨΠμϯε4 • ମܥతʹҰ࣍৘ใʹ৮ΕΒΕΔ 4 https://speakerdeck.com/nwiizo/ke-guan-ce-xing-kaitansu 3 Charity Majors[΄͔]ஶ; େ୩ ࿨لɺࢁޱ ೳ᫫ ༁, "ΦϒβʔόϏϦς ΟɾΤϯδχΞϦϯά", ΦϥΠϦʔɾδϟύϯ, 2023೥. 2024/08/23 3-shake SRE Tech Talk #10 13
  4. ίετ • Ϋϥ΢υಉ༷༧ࢉ΍ҟৗ஋͸Ωϟον͍ͨ͠ • ར༻͢ΔαʔϏεͷಛ௃Λ௫ΜͰ͓͘ • ex. GPT-4o $5.000 /

    1M input tokensʢ͍҆ʁߴ͍ʁ͸֤ʑͷײ֮ʣ • ex. outputͷ΄͏͕ྉ͕ۚߴ͍ͳΒoutputΛ޻෉ͯ͠࡟ݮͰ͖Δͳ • UsageͷμογϡϘʔυ͕ͳΔ΂͘ϦΞϧλΠϜͩͱ҆৺ • ֤LLM O11yπʔϧ͸ϦΫΤετຖʹτʔΫϯྔͷه࿥Մೳ • ಠࣗͰϝτϦΫεΛ࡞ΕΔ • O11yͰ͸ͳ͍͕τʔΫϯͷྔͰྲྀྔ੍ޚΛ͢ΔΑ͏ͳख๏΋ଘࡏ6 7 7 https://docs.konghq.com/hub/kong-inc/ai-rate-limiting-advanced/ 6 https://qiita.com/ipppppei/items/8ee4e693e2aea768c3a9 2024/08/23 3-shake SRE Tech Talk #10 18
  5. ϨΠςϯγ • ϢʔεέʔεʹΑΓٻΊΒΕΔ଎౓͸ҟͳΔ • ex. Ϣʔβ͕LLM͕Ԡ౴͍ͯ͠Δͱཧղ͍ͯ͠Δঢ়گ • ମݧͷྑ͞ʹݶ౓͸͋Δ͕ɺϛϦඵ୯ҐͷੈքͰͷ࠷దԽ͸ඞͣ͠΋ඞཁͳ͍ • ex.

    Ի੠ͰԠ౴͢ΔαʔϏεͰ͸Ұఆ଎౓ΛٻΊΒΕΔ • Speech-To-Text / Text-to-Speechͷॲཧ͕͋Δ • τϥϯεΫϦϓγϣϯ/Ի੠߹੒ͷํ๏΋༷ʑʢϦΞϧλΠϜʁόονʁಡΈ্͛ʹײ৘ࠐΊΔʁʣ • τϥϯεΫϦϓγϣϯͷϨΠςϯγΛͲ͏ݮΒ͔͢ʁͱ͍ͬͨٞ࿦͕͞ΕͯΔϒϩά8 • Ԡ౴͕଎͍≠ਫ਼౓͕ߴ͍ • O11yͷ؍఺Ͱ͸ɺڐ༰Ͱ͖ͳ͍஗Ԇ͕ൃੜͨ͠ࡍʹͲ͜Ͱ஗Ԇ͕ൃੜ͔ͨ͠Λ؍ଌ͠վળ఺Λಛఆ͢ΔͨΊͷ४උ͕ඞཁ 8 https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/guidebook-to-reduce-latency-for-azure-speech- to-text-stt-and/ba-p/4208289 2024/08/23 3-shake SRE Tech Talk #10 19
  6. ϨʔτϦϛοτ • ্ݶΛ௒͑ͨ৔߹ʹαʔϏεΛఏڙͰ͖ͳ͘ͳΔ • ྫ͑͹OpenAI͸5ͭͷ߲໨ͰϦϛοτΛଌఆ͍ͯ͠Δ9 • ϦΫΤετ/෼, ϦΫΤετ/೔, τʔΫϯ/෼, τʔΫϯ/೔,

    ը૾/෼ • ੍ݶ͸Ϟσϧ΍TierʹΑΓҟͳΔ • ࢧ෷ֹ΍ࢧ෷͍ʹ੒ޭͯ͠Կ೔ܦա͔ͨ͠ʁʹΑΓར༻Ͱ͖Δݶ౓ֹ΋͋Δ (Max $50,000/month) • O11yͰ͸ͳ͍͕ΤϥʔΛආ͚ΔͨΊͷख๏ΛެࣜͰਪ঑͍ͯ͠Δ9 • ϥϯμϜͷࢦ਺όοΫΦϑ • ૝ఆ͞ΕΔϨεϙϯεͷtokensʹ߹Θͤͯmax_tokensࢦఆ • ଈ࣌Ԡ౴͕ෆཁͳ৔߹όονAPIΛར༻ 9 https://platform.openai.com/docs/guides/rate-limits 2024/08/23 3-shake SRE Tech Talk #10 20
  7. ظ଴͢ΔԠ౴ʁ • ద੾ͳ଎౓ͰԠ౴͍ͯ͠Δ͔ʁʢલड़ʣ • ϋϧγωʔγϣϯ͸ى͖͍ͯͳ͍͔ʁ • Πϯϓοτͷݴޠͱಉ͡ݴޠͰฦ౴Ͱ͖͍ͯΔ͔ʁ • ΞϓϦέʔγϣϯ͕ఏڙ͍ͨ͠಺༰ΛԠ౴Λ͍ͯ͠Δ͔ʁ •

    ѱҙͷ͋ΔԠ౴ʢ΋͘͠͸Ϣʔβ͕ѱҙͷ͋Δ࣭໰ʣΛ͍ͯ͠ͳ͍͔ʁ • ϢʔβΛই͚ͭΔԠ౴͸͍ͯ͠ͳ͍͔ʁ • αʔϏεఏڙΛ͢Δاۀͷ৴པΛଛͳ͏Α͏ͳԠ౴͸͍ͯ͠ͳ͍͔ʁ 2024/08/23 3-shake SRE Tech Talk #10 23
  8. ≒Ԡ౴ΛධՁ͢Δ͜ͱ͸Մೳʁ • ػցతʹ׬ᘳͳධՁ͸·ͩ೉͍͠ೝࣝ • ఆੑతͳ൑அ͕ඞཁͳಛ௃͕͋Δ • ײ৘໘ͳͲϢʔβʹΑΓԠ౴ͷ಺༰ͷળ͠ѱ͕͠มΘΔ • ϋϧγωʔγϣϯʹؔ͢ΔධՁͷݚڀ10 •

    ࠓ೥6݄ͷ৘ใͰҰൠతͳݕ஌ख๏ͷख๏ΛఏҊ • ͜ͷΑ͏ʹݕ஌ख๏ʹ͍ͭͯ͸ݚڀ͕ߦΘΕ͍ͯΔஈ֊ • ػցతͳධՁͷਫ਼౓͸Ҿ͖ଓ͖ൃల͕ظ଴͞Ε͍ͯΔೝࣝ 10 https://www.nature.com/articles/s41586-024-07421-0 2024/08/23 3-shake SRE Tech Talk #10 26
  9. ͍Ζ͍Ζͳํ๏Ͱཧղɾઆ໌Ͱ͖Δঢ়ଶΛ໨ࢦ͢ • RAGΛ࢖ͬͨࣾ಺υΩϡϝϯτݕࡧBotΛධՁ͢Δྫ11 • ”૝ఆ࣭໰” ͱ ”Ԡ౴ʹؚ·Ε͍ͯΔͱظ଴͢ΔURL” ͷηοτΛCSVΛ༻ҙ • CSVΛԼʹϩʔΧϧ͔ΒϦΫΤετΛͯ͠ҙਤͨ͠஋ʢURLʣؚ͕·ΕΔ͔ධՁ

    • RAG (Retrieval-Augmented Generation)ͷධՁ͢Δ • ਫ਼౓Λ͔֬Ίͳ͕Β৽ͨͳαʔϏεΛཔΔ • ͜Ε͔Β঺հ͢ΔDatadog LLM Observabilityʹ͸ظ଴͢ΔԠ౴ΛධՁ͢Δػೳ͋Γ • ࣭໰ͷؔ࿈ੑ / ݴޠͷҰக / ωΨςΟϒͳײ৘ / ๫ݴ / ϓϩϯϓτΠϯδΣΫγϣϯͷ༗ແ • ຊൃද࣌఺Ͱϕʔλͳػೳ΋͋ΔͨΊਫ਼౓Λ͔֬Ίͳ͕Βར༻ • ex. Quality check metricsʢޙ΄Ͳ஫ऍͷεΫγϣ΋هࡌʣ • ຊൃද࣌఺Ͱ՝ۚ͞Εͳ͍ 11 https://blog.studysapuri.jp/entry/2024/07/17/feedback-cycle-practice-through-simplified-assessment-of-rags 2024/08/23 3-shake SRE Tech Talk #10 27
  10. ܭ૷Ҏ֎Ͱମݧ΍ਫ਼౓ΛΧόʔ͢Δ޻෉΋େ੾ • Ϣʔβ͕LLM஗ԆΛڐ༰Ͱ͖ΔΑ͏ͳUI/UXʹ͢Δ • ςΩετͰ͸ϩʔσΟϯά΍ετϦʔϜੜ੒ͷUIɺԻ੠Ͱ͸૬ṀɾϑΟϥʔΛ׆༻ • ਓ͕ؒ൑அ͢Δ • ͪ͜ΒͷԠ౴Ͱਖ਼͍͠Ͱ͔͢ʁͱϢʔβʹ໰͍൑அͯ͠΋Β͏ •

    Ξϯέʔτ౳Λ༻͍ͨϢʔβͷϑΟʔυόοΫΛ׆༻͢Δʢػց຋༁Ͱ͸ਓ͕ؒग़ྗ݁ՌΛείΞ෇͚͢ΔධՁΛਓग़ධՁͱ͍͏ʣ • ࣮ࡍʹͲΕ͚ͩۀ຿͕ޮ཰Խ͞ΕΔ͔ʁετοϓ΢ΥονͰଌͬͯΈΔ12 • ೖྗΛ੍ݶ͢Δ • ϑΥʔϜΛ੍ݶ͢Δ • 1ͭͷػೳʹݶఆͯ͠Ϣʔβʹఏڙ͢Δ 12 https://speakerdeck.com/nrryuya/jian-wei-igaxu-sarenaikesudellmwoshi-utameniha-at-genai-playground-meetup- number-01 2024/08/23 3-shake SRE Tech Talk #10 28
  11. ઃఆը໘ • Topic • Evaluation • Quality • Security and

    Safety • ஫ҙ • ͜ͷը໘͸τϨʔεΛඈ͹͢ͱͨͲΓண͚Δ • ΞϓϦ୯Ґ • ͢΂ͯσϑΥϧτͰΦϑ • Ҏ߱εΫγϣ΋߄్ͯͯதͰΦϯʹͨ͠෦෼͕͋Δ • ʢӳޠͷ࣭໰ʹ೔ຊޠͰ౴͑ͯ΋ग़ͯ͜ͳ͍ͳ...ʁʣ • ʢΦϑ͡ΌΜʂʣ 2024/08/23 3-shake SRE Tech Talk #10 35
  12. Evaluation • Failure to Answer • Ϣʔβʔͷ࣭໰ʹରͯ͠ద੾ͳ౴͑Λఏڙ͔ͨ͠Ͳ͏͔ɺ͋Δ͍͸ຬ଍ͷ͍͘౴͑Λఏڙ͔ͨ͠Ͳ͏͔ΛධՁ͢Δ • Language Mismatch

    • Ϣʔβʔͷ࣭໰ʹϢʔβʔ͕࣭໰ͨ͠ݴޠͰճ౴͔ͨ͠Ͳ͏͔ΛධՁ • Sentiment (Input/Output) • ձ࿩ͷશମతͳϜʔυΛධՁ͠ɺϢʔβʔͷຬ଍౓ɺηϯνϝϯτͷ܏޲ɺײ৘తͳ൓ԠΛධՁ • Topic Relevancy • LLMΞϓϦέʔγϣϯͷҙਤͨ͠τϐοΫʹཹ·͍ͬͯΔ͔Ͳ͏͔ΛධՁ • Toxicity • ձ࿩ͷதʹ༗֐·ͨ͸ෆద੾ͳίϯςϯπ͕͋Δ͔Ͳ͏͔ΛධՁ 2024/08/23 3-shake SRE Tech Talk #10 37
  13. Security and Safety • Prompt Injection • LLMͷԠ౴΍ձ࿩ͷํ޲ੑͷૢ࡞ΛࢼΈΔϢʔβʔ͕͍Δ • ϓϩϯϓτ΁ͷෆਖ਼·ͨ͸ѱҙͷ͋ΔૠೖΛࣝผ

    • Datadog Sensitive Data Scanner • ೖग़ྗ͕Datadogʹૹ৴͞ΕΔͱಉ࣌ʹɺࣗಈతʹػີ৘ใΛࣝผͯ͠ϚεΩϯά • σϑΥϧτͷϧʔϧ͋Δ • ͔ͬ͠ΓϥΠϒϥϦ΍ΧελϜϧʔϧͰඞཁͳઃఆΛࢪ͠·͠ΐ͏ 2024/08/23 3-shake SRE Tech Talk #10 38
  14. ײ૝ • ෆ۩߹΍վળ఺ͷௐࠪͰ໰୊ͷཁૉΛݟఆΊΔॿ͚ʹͳΔͱײͨ͡ • ex. • ҙਤ͠ͳ͍Ԡ౴͕͋ͬͨࡍʹର৅ͷεύϯΛਂ۷Δ • Ϟσϧ͝ͱʹύλʔϯ͕ແ͍͔ಛఆ͢Δ •

    Ϣʔβ͔Βಧ͍ͨ໰୊͋ΔԠ౴Λಛఆ͠ௐࠪ͢Δ • τʔΫϯ΍ϨΠςϯγʹ͍ͭͯ؂ࢹͰ͖ͯ҆৺ • Ԡ౴ΛධՁͯ͠τϨʔε΍μογϡϘʔυͰ֬ೝ͢Δ͜ͱ΋Մೳ • ࣮ΞΫηεͰ͸ͳ͍͕ݕ஌͍ͨ͠৘ใ͕ݕ஌Ͱ͖ͨ • ධՁ෦෼ͷϩδοΫ͕ϒϥοΫϘοΫε͔ͭϕʔλͳͷͰࣗ෼Ͱ৘ใΛ൑அ͢Δඞཁ͸΋ͪΖΜ͋Γ • LLM ObservabilityʹݶΒͣػց຋༁΍ChatGPTࣗମΛར༻͍ͯ͠Δͱ͖΋ಉ͡ؾ࣋ͪ • Ԡ౴ͷ੒ޭ΍ϨΠςϯγͰSLOఆٛͯ͠αʔϏεͷঢ়ଶΛோΊΔΠςϨʔγϣϯͷ։࢝Ͱ͖ͦ͏ 2024/08/23 3-shake SRE Tech Talk #10 68
  15. ·ͱΊ • 5,6݄ࠒ͔ΒLLM O11yʹؔ͢Δ৘ใ͕ൃੜ࢝͠Ί͍ͯΔ • ίετɺϨΠςϯγɺϨʔτϦϛοτͷΑ͏ͳϝτϦΫε͸Ϋϥ΢υ΍SaaSಉ༷େࣄ • LLMͷԠ౴ΛͲ͏ධՁ͢Δ͔ʁ͸Ҿ͖ଓ͖ൃల͍ͯ͘͠ೝࣝ • ޻෉͠ͳ͕Βࣗ෼ͨͪͷαʔϏεͰൃੜͨ͠ࣄ৅Λཧղ͠આ໌Ͱ͖Δঢ়ଶΛ໨ࢦ͢

    • ܭ૷Ҏ֎Ͱମݧ΍ਫ਼౓ΛΧόʔ͢Δ޻෉΋େ੾ • Datadog LLM Observabilityʹ৮ΕͯΈͨ • ΈΜͳͷLLM O11yʹର͢Δظ଴΍͜͏ͨ͠Βྑͦ͞͏͕஌Γ͍ͨʢ࠙਌ձͰ࿩͠·͠ΐ͏ʣ 2024/08/23 3-shake SRE Tech Talk #10 70