Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
mercari
PRO
November 14, 2025
Technology
110
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
110
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
150
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
270
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
240
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
420
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
130
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
200
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
330
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
320
Other Decks in Technology
See All in Technology
個人の発見を、組織の知恵に 〜生成AI活用を"探索"から"組織の仕組み"へ〜
kintotechdev
3
1.1k
ルールやカスタム機能、どう使う?理想の出力を引き出すために今知りたいIBM Bob 5つの機能
muehara
1
360
Claude code Orchestra
ozakiomumkj
3
1k
Oracle Cloud Infrastructure IaaS 新機能アップデート 2026/3 - 2026/5
oracle4engineer
PRO
1
210
Databricks における 生成AIガバナンスの実践
taka_aki
1
350
Microsoft Build Keynoteふりかえり
tomokusaba
0
110
GoとSIMDとWasmの今。
askua
3
510
AIプラットフォームを運用し続けるための可観測性
tanimuyk
4
1.1k
速さだけじゃない! VoidZero ツールが移行先に選ばれる理由
mizdra
PRO
6
770
ABEMA の Datadog × OTel 基盤、 中から見るか? 外から見るか?
tetsuya28
0
110
AIにフローを作らせようとして挫折した話
hamatsutaichi
0
220
PHP と TypeScript の型システム比較:AI 時代の「型」は誰のためにあるのか? #frontend_phpcon_do / frontend_phpcon_do_2026
shogogg
1
260
Featured
See All Featured
My Coaching Mixtape
mlcsv
0
140
The World Runs on Bad Software
bkeepers
PRO
72
12k
We Have a Design System, Now What?
morganepeng
55
8.2k
Skip the Path - Find Your Career Trail
mkilby
1
140
How GitHub (no longer) Works
holman
316
150k
Amusing Abliteration
ianozsvald
1
200
Chasing Engaging Ingredients in Design
codingconduct
0
210
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
HDC tutorial
michielstock
2
690
HTML-Aware ERB: The Path to Reactive Rendering @ RubyCon 2026, Rimini, Italy
marcoroth
1
160
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
720
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!