Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
IVRyエンジニア忘年LT大会2024 LLM監視の最前線
Search
Hiroyuki Moriya
December 11, 2024
1
340
IVRyエンジニア忘年LT大会2024 LLM監視の最前線
Hiroyuki Moriya
December 11, 2024
Tweet
Share
More Decks by Hiroyuki Moriya
See All by Hiroyuki Moriya
kueueに新しいPriorityClassを足した話
gekko0114
0
740
JobSet超入門
gekko0114
1
980
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
The Pragmatic Product Professional
lauravandoore
36
7k
Designing for Performance
lara
610
69k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Typedesign – Prime Four
hannesfritz
42
2.9k
A Modern Web Designer's Workflow
chriscoyier
697
190k
Designing for humans not robots
tammielis
254
26k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
A designer walks into a library…
pauljervisheath
209
24k
Transcript
confidencial LLMࢹͷ࠷લઢ IVRy ΤϯδχΞLTେձ 2024/12/11 Moriya Hiroyuki
confidencial 2 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ
confidencial 3 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ
confidencial 4 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ ʮͪΐͪΐͬͱ։ൃͨ͠ΒϘϩṶ͚Ͱ͖ΔΜʂʯͱؾ͍ͮͨԶɺىۀͯ͠ɺͻͨ ͢ΒPoCϓϩδΣΫτΛΫϥΠΞϯτʹఏڙ͢Δ͜ͱʹͨ͠ɻ
confidencial 5 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ ʮͪΐͪΐͬͱ։ൃͨ͠ΒϘϩṶ͚Ͱ͖ΔΜʂʯͱؾ͍ͮͨԶɺىۀͯ͠ɺͻͨ ͢ΒPoCϓϩδΣΫτΛΫϥΠΞϯτʹఏڙ͢Δ͜ͱʹͨ͠ɻ Զɺഎޙ͔Β͍ۙͮͯ͘Δrate limit੍ݶͱɺLatencyͷѱԽʹؾ͕͍͍ͭͯͳ͔ͬ ͨɻ
confidencial 6 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ ʮͪΐͪΐͬͱ։ൃͨ͠ΒϘϩṶ͚Ͱ͖ΔΜʂʯͱؾ͍ͮͨԶɺىۀͯ͠ɺͻͨ ͢ΒPoCϓϩδΣΫτΛΫϥΠΞϯτʹఏڙ͢Δ͜ͱʹͨ͠ɻ Զɺഎޙ͔Β͍ۙͮͯ͘Δrate limit੍ݶͱɺLatencyͷѱԽʹؾ͕͍͍ͭͯͳ͔ͬ ͨɻ
ؾ͕͍ͭͨΒɺԶͷϓϩμΫτղ͕૬࣍͗ɺձࣾ࢈ͯ͠͠·͍ͬͯͨ...
confidencial 7
confidencial 8 ࠓɺ౻৽Ұ܅͕ɺ͜Μͳ݁Λܴ͑ͳ͍ͨΊʹͰ͖Δ͜ͱΛ͓͠͠·͢ɻ
confidencial ࣗݾհ 2024/08 ೖࣾ SWEɾػցֶशΤϯδχΞͳͲΛܦݧ LLM͕ίΞʹͳΓͦ͏ͳαʔϏεͩͱࢥͬͯIVRyʹೖࣾ Moriya Hiroyuki 9 AI
engineer
confidencial IVRyͰͷLLMΛར༻ͨ͠AIର 10 WebsocketΛར༻͠ΤϯυϢʔβʔͱLLM͕ϦΞϧλΠϜʹΓऔΓ͍ͯ͠Δ
confidencial LLM Fallback 11 ෳͷLLMΛར༻͢Δ͜ͱΛલఏʹFallbackػߏΛߏங APIͷStatus, Ratelimitσʔλ੍(ཧ੍)ΛͱʹৼΓ͚
confidencial LLM Fallback 12 ෳͷLLMΛར༻͢Δ͜ͱΛલఏʹFallbackػߏΛߏங APIͷStatus, Ratelimitσʔλ੍(ཧ੍)ΛͱʹৼΓ͚ ࢹ͢Ε ྑ͍ͷ͡Ό
confidencial ํ๏ 1ɿDataDog LLM observability 13 DataDog͕Ӷҙ։ൃதͷLLMࢹʹಛԽͨ͠ػೳɻ Latency, token, promptͳͲΛऔಘͰ͖Δɻ
confidencial 14 ʮ͜ΕͰɺOpenAIͷlatency͕ࢹͰ͖ΔΑ͏ʹͳͬͨͥʂʯ
confidencial 15 ͋ΕΕʙɺ͓͔͍͠Αʙ
confidencial 16 ͋ΕΕʙɺ͓͔͍͠Αʙ ΒͷϓϩμΫτɺfallbackػߏΛ ࣮͍ͯ͠Δͷʹɺ OpenAIͷlatency͔͠ࢹͰ͖ͯͳ͍Αʙ
confidencial ํ๏ 2ɿOpenLIT (OpenTelemetry) 17 OpenTelemetryن֨ʹଇͬͨɺLLMࢹʹಛԽͨ͠πʔϧɻ ༷ʑͳLLMΛࢹ͢Δ͜ͱ͕Ͱ͖Δɻ
confidencial 18 ʮ͜ΕͰɺ৭ʑͳmodelͷlatency͕ࢹͰ͖ΔΑ͏ʹͳͬͨͥʂʯ
confidencial 19 ͋ΕΕʙɺ͓͔͍͠Αʙ
confidencial 20 ͋ΕΕʙɺ͓͔͍͠Αʙ Βɺ৭ʑͳϞσϧΛ͏Μ͔ͩΒɺ provider͝ͱʹɺlatencyΛܭଌ͢Δඞཁ͕͋Δͷʹ LiteLLMશମͰͷlatency͔͠औΕͯͳ͍Αʙ
confidencial ํ๏ 3ɿDataDog Inferred services 21 DataDogʹࡌ͞ΕͨɺApp֎ͷϦΫΤετΛࢹͯ͘͠ΕΔػߏ
confidencial 22 ʮ͜ΕͰɺLiteLLMͰ͍ͬͯΔͯ͢ͷmodelΛࢹͰ͖ΔΑ͏ʹͳͬͨͥʂʯ
confidencial 23 ͋ΕΕʙɺ͓͔͍͠Αʙ
confidencial 24 ͋ΕΕʙɺ͓͔͍͠Αʙ ΒɺGeminiɺOpenAIͰ̍ͭͷmodelΛ ͏ͱݶΒͳ͍ͷʹɺ ݸผͷmodelͷlatencyΛऔಘ͢Δ͜ͱ Ͱ͖ͯͳ͍Αʙ
confidencial ·ͱΊ LLMࢹɺ·ͩ·ͩൃల్্Ͱݟ͕͋Γ·ͤΜʂ AIɾLLMΛ͍͜ͳͯ͠ϓϩμΫτʹೖΕ͍ͯ͘աఔͰɺ ࣗΒ͕Γ։͍͍ͯ͘ඞཁ͕͋Γ·͢ɻ ͥͻҰॹʹAIࢹΛ͍͖ͬͯ·͠ΐ͏ʂ 25