Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
88
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
65
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
87
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
210
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
190
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
330
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
93
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
130
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
210
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
170
Other Decks in Technology
See All in Technology
【PyCon mini Shizuoka 2026】生成AI時代に画像処理やオーディオ処理のノードエディターを作る理由
kazuhitotakahashi
0
280
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
1.6k
自動テストが巻き起こした開発プロセス・チームの変化 / Impact of Automated Testing on Development Cycles and Team Dynamics
codmoninc
1
980
Lookerの最新バージョンv26.2がやばい話
waiwai2111
1
150
ソフトウェアアーキテクトのための意思決定術: Create Decision Readiness—The Real Skill Behind Architectural Decision
snoozer05
PRO
29
8.7k
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
5
1.1k
Introduction to Sansan Meishi Maker Development Engineer
sansan33
PRO
0
360
Master Dataグループ紹介資料
sansan33
PRO
1
4.4k
「ヒットする」+「近い」を同時にかなえるスマートサジェストの作り方.pdf
nakasho
0
100
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
44k
メタデータ同期に潜んでいた問題 〜 Cache Stampede 時の Cycle Wait を⾒つけた話
lycorptech_jp
PRO
0
150
Secure Boot 2026 - Aggiornamento dei certificati UEFI e piano di adozione in azienda
memiug
0
130
Featured
See All Featured
ラッコキーワード サービス紹介資料
rakko
1
2.5M
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.3k
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
GraphQLの誤解/rethinking-graphql
sonatard
75
11k
A better future with KSS
kneath
240
18k
BBQ
matthewcrist
89
10k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
80
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
66
Utilizing Notion as your number one productivity tool
mfonobong
4
240
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.1k
Designing for Performance
lara
611
70k
Skip the Path - Find Your Career Trail
mkilby
1
72
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!