Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
57
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
9
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
3
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
45
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
64
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
180
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
43
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
60
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
57
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
42
Other Decks in Technology
See All in Technology
変わるもの、変わらないもの :OSSアーキテクチャで実現する持続可能なシステム
gree_tech
PRO
0
1.2k
"なるべくスケジューリングしない" を実現する "PreferNoSchedule" taint
superbrothers
0
120
雲勉LT_Amazon Bedrock AgentCoreを知りAIエージェントに入門しよう!
ymae
2
230
Flutter Thread Merge - Flutter Tokyo #11
itsmedreamwalker
1
110
Datadog LLM Observabilityで実現するLLMOps実践事例 / practical-llm-observability-with-datadog
k6s4i53rx
0
180
Active Directory 勉強会 第 6 回目 Active Directory セキュリティについて学ぶ回
eurekaberry
7
2.6k
機械学習を「社会実装」するということ 2025年冬版 / Social Implementation of Machine Learning November 2025 Version
moepy_stats
4
690
Eight Engineering Unit 紹介資料
sansan33
PRO
0
5.6k
AI時代のインシデント対応 〜時代を切り抜ける、組織アーキテクチャ〜
jacopen
4
170
段階的に進める、 挫折しない自宅サーバ入門
yu_kod
3
830
組織の“見えない壁”を越えよ!エンタープライズシフトに必須な3つのPMの「在り方」変革 #pmconf2025
masakazu178
1
1k
Codeer.LowCode.Blazor 紹介と成長録
wadawada
0
100
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
174
15k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Side Projects
sachag
455
43k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Rails Girls Zürich Keynote
gr2m
95
14k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
11
940
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
The Cult of Friendly URLs
andyhume
79
6.7k
How to Think Like a Performance Engineer
csswizardry
28
2.3k
Documentation Writing (for coders)
carmenintech
76
5.1k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!