Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
100
0
Share
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
89
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
130
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
250
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
220
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
390
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
120
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
170
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
280
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
260
Other Decks in Technology
See All in Technology
需要創出(Chatwork)×供給(BPaaS) フライホイールとMoat 実行能力の最適配置とAI戦略
kubell_hr
0
1.7k
『生成AI時代のクレデンシャルとパーミッション設計 — Claude Code を起点に』の執筆企画
takuros
2
1.9k
AIが自律的に働く時代へ Amazon Quick で実現するAIエージェント紹介
koheiyoshikawa
0
160
EMから幅を広げるために最近挑戦していること / Recent challenges I'm undertaking to expand my horizons beyond EM
hiro_torii
1
170
コミュニティ・勉強会を作るのは目的じゃない
ohmori_yusuke
0
280
[Oracle TechNight#99] 生成AI時代のAI/ML入門 ~ AIとオラクルデータベースの関係 (前半)
oracle4engineer
PRO
1
150
雑談は、センサーだった
bitkey
PRO
2
150
小さいVue.jsを30分で作る
hal_spidernight
0
130
GitHub Copilot CLI と VS Code Agent Mode の使い分け
tomokusaba
0
120
20260423_執筆の工夫と裏側 技術書の企画から刊行まで / From the planning to the publication of technical book
nash_efp
3
690
Shipping AI Agents — Lessons from Production
vvatanabe
0
300
AndroidアプリとCopilot Studioの統合
nakasho
0
190
Featured
See All Featured
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
140
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
Docker and Python
trallard
47
3.8k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.4k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
520
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
150
For a Future-Friendly Web
brad_frost
183
10k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
500
Marketing to machines
jonoalderson
1
5.2k
Producing Creativity
orderedlist
PRO
348
40k
AI: The stuff that nobody shows you
jnunemaker
PRO
6
610
The World Runs on Bad Software
bkeepers
PRO
72
12k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!