Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
110
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
120
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
150
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
270
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
250
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
430
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
130
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
200
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
340
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
330
Other Decks in Technology
See All in Technology
LLMにもCAP定理があるという話
harukasakihara
0
300
Disciplined Vibes: Scaling AI-Assisted Engineering
sheharyar
0
130
なぜ Platform Engineering の土台に Kubernetes を選ぶのか
r4ynode
2
590
Claude Code×Terraform IaC テンプレート駆動開発
itouhi
1
490
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
中期計画、2回作ってみた ~業務委託と正社員、両方の視点から~
demaecan
1
680
AGENTS.mdとSkillsで始めるAIエージェント活用
sonoda_mj
3
200
自宅LLMの話
jacopen
1
350
2026TECHFRESH畢業分享會 - Lightning Talk - 打造精準高效的 MCP 設計模式與測試實務
line_developers_tw
PRO
0
830
"何を作るか"を任される エンジニアは、どう育つのか
yutaokafuji
1
610
Agent Skills設計で柔軟性と硬さのバランスが難しい話
nassy20
0
120
Claude Code の Sandbox 機能を Anthropic Sandbox Runtime(srt) で試そう!/lets-play-anthropic-sandbox-runtime
tomoki10
1
540
Featured
See All Featured
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Prompt Engineering for Job Search
mfonobong
0
340
Git: the NoSQL Database
bkeepers
PRO
432
67k
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
200
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
The Pragmatic Product Professional
lauravandoore
37
7.3k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
56k
Code Review Best Practice
trishagee
74
20k
Visualization
eitanlees
152
17k
Fireside Chat
paigeccino
42
3.9k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!