Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
mercari
PRO
November 14, 2025
Technology
0
79
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
55
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
66
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
110
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
160
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
300
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
80
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
110
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
170
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
140
Other Decks in Technology
See All in Technology
20260204_Midosuji_Tech
takuyay0ne
1
140
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
2
580
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
42k
Agile Leadership Summit Keynote 2026
m_seki
1
570
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
0
120
What happened to RubyGems and what can we learn?
mikemcquaid
0
260
制約が導く迷わない設計 〜 信頼性と運用性を両立するマイナンバー管理システムの実践 〜
bwkw
3
920
Codex 5.3 と Opus 4.6 にコーポレートサイトを作らせてみた / Codex 5.3 vs Opus 4.6
ama_ch
0
120
モダンUIでフルサーバーレスなAIエージェントをAmplifyとCDKでサクッとデプロイしよう
minorun365
4
180
AI駆動開発を事業のコアに置く
tasukuonizawa
1
130
Featured
See All Featured
Chasing Engaging Ingredients in Design
codingconduct
0
110
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
110
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
580
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Embracing the Ebb and Flow
colly
88
5k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
160
Ruling the World: When Life Gets Gamed
codingconduct
0
140
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Darren the Foodie - Storyboard
khoart
PRO
2
2.4k
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!