Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
mercari
PRO
November 14, 2025
Technology
0
76
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
53
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
66
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
92
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
130
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
300
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
77
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
110
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
170
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
140
Other Decks in Technology
See All in Technology
茨城の思い出を振り返る ~CDKのセキュリティを添えて~ / 20260201 Mitsutoshi Matsuo
shift_evolve
PRO
1
180
Amazon Bedrock AgentCore 認証・認可入門
hironobuiga
2
510
toCプロダクトにおけるAI機能開発のしくじりと学び / ai-product-failures-and-learnings
rince
6
5.5k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
Introduction to Bill One Development Engineer
sansan33
PRO
0
360
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.8k
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
390
Webhook best practices for rock solid and resilient deployments
glaforge
1
260
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
13k
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
420
Featured
See All Featured
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
How to Ace a Technical Interview
jacobian
281
24k
The untapped power of vector embeddings
frankvandijk
1
1.6k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
180
Designing for Performance
lara
610
70k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
1.9k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
How to Talk to Developers About Accessibility
jct
2
130
Code Reviewing Like a Champion
maltzj
527
40k
Building an army of robots
kneath
306
46k
A designer walks into a library…
pauljervisheath
210
24k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!