Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
mercari
PRO
November 14, 2025
Technology
110
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
120
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
150
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
270
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
250
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
430
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
130
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
200
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
340
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
330
Other Decks in Technology
See All in Technology
20260619 私の日常業務での生成 AI 活用
masaruogura
1
130
RSA暗号を手計算したくなること、ありますよね?? (20260615_orestudy6_rsa)
thousanda
0
260
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
1.9k
FinOps × AIエージェントで実現する コストインシデントの自動調査
oasis1994liveforever
0
130
失敗を経て、Harness Engineering で 大切にしたいことを考える / Learning from Failure: What Matters in Harness Engineering
bitkey
PRO
1
320
Socrates × Looker 〜セマンティックレイヤーで進化するデータ分析エージェント〜
hanon52_
3
2.1k
2026TECHFRESH畢業分享會 - Lightning Talk - 資料也要 CI/CD? 用 Airbyte 自動化資料同步
line_developers_tw
PRO
0
810
日本 Fintech 未来予測レポート 2027〜2028年(オリジナル版)
8maki
0
1.9k
Snowflakeと仲良くなる第一歩
coco_se
4
430
フロンティアAIのゲート化と地政学リスク
nagatsu
0
130
タクシーアプリ『GO』の実践的データ活用
mot_techtalk
3
190
2026TECHFRESH畢業分享會 - AI 時代的人生存檔點
line_developers_tw
PRO
0
830
Featured
See All Featured
Navigating Weather and Climate Data
rabernat
0
220
How to Think Like a Performance Engineer
csswizardry
28
2.6k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
Ethics towards AI in product and experience design
skipperchong
2
310
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
580
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1.1k
Amusing Abliteration
ianozsvald
1
200
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
130
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Scaling GitHub
holman
464
140k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
2
220
The Pragmatic Product Professional
lauravandoore
37
7.3k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!