Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
65
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
26
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
37
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
67
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
100
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
260
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
66
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
85
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
120
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
88
Other Decks in Technology
See All in Technology
MySQLとPostgreSQLのコレーション / Collation of MySQL and PostgreSQL
tmtms
1
260
MariaDB Connector/C のcaching_sha2_passwordプラグインの仕様について
boro1234
0
230
コンテキスト情報を活用し個社最適化されたAI Agentを実現する4つのポイント
kworkdev
PRO
1
1.5k
re:Invent 2025 ふりかえり 生成AI版
takaakikakei
1
220
業務のトイルをバスターせよ 〜AI時代の生存戦略〜
staka121
PRO
2
210
AI駆動開発の実践とその未来
eltociear
0
130
.NET 10の概要
tomokusaba
0
120
モダンデータスタック (MDS) の話とデータ分析が起こすビジネス変革
sutotakeshi
0
510
マイクロサービスへの5年間 ぶっちゃけ何をしてどうなったか
joker1007
14
6.3k
[デモです] NotebookLM で作ったスライドの例
kongmingstrap
0
160
2025年 開発生産「可能」性向上報告 サイロ解消からチームが能動性を獲得するまで/ 20251216 Naoki Takahashi
shift_evolve
PRO
1
200
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
410
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.3k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Producing Creativity
orderedlist
PRO
348
40k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Why Our Code Smells
bkeepers
PRO
340
57k
Side Projects
sachag
455
43k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.3k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!