Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
mercari
PRO
November 14, 2025
Technology
0
90
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
74
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
100
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
220
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
200
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
360
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
98
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
140
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
220
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
200
Other Decks in Technology
See All in Technology
AIエージェント×GitHubで実現するQAナレッジの資産化と業務活用 / QA Knowledge as Assets with AI Agents & GitHub
tknw_hitsuji
0
180
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
tarappo
4
340
スピンアウト講座04_ルーティン処理
overflowinc
0
970
Kiro Meetup #7 Kiro アップデート (2025/12/15〜2026/3/20)
katzueno
2
230
_Architecture_Modernization_から学ぶ現状理解から設計への道のり.pdf
satohjohn
2
700
生成AI活用でQAエンジニアにどのような仕事が生まれるか/Support Required of QA Engineers for Generative AI
goyoki
1
370
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
qa
0
110
スピンアウト講座03_CLAUDE-MDとSKILL-MD
overflowinc
0
1k
モジュラモノリス導入から4年間の総括:アーキテクチャと組織の相互作用について / Architecture and Organizational Interaction
nazonohito51
3
1.5k
品質を経営にどう語るか #jassttokyo / Communicating the Strategic Value of Quality to Executive Leadership
kyonmm
PRO
2
1.2k
FastMCP OAuth Proxy with Cognito
hironobuiga
3
130
Zero Data Loss Autonomous Recovery Service サービス概要
oracle4engineer
PRO
3
13k
Featured
See All Featured
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
110
Test your architecture with Archunit
thirion
1
2.2k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
480
Darren the Foodie - Storyboard
khoart
PRO
3
2.9k
Making the Leap to Tech Lead
cromwellryan
135
9.8k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Become a Pro
speakerdeck
PRO
31
5.9k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
130
The Mindset for Success: Future Career Progression
greggifford
PRO
0
280
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
840
Navigating Team Friction
lara
192
16k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!