Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
mercari
PRO
November 14, 2025
Technology
95
0
Share
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
84
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
110
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
230
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
210
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
370
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
110
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
160
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
240
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
230
Other Decks in Technology
See All in Technology
本番環境でPHPコードに触れずに「使われていないコード」を調べるにはどうしたらよいか?
egmc
1
260
機能・非機能の学びを一つに!Agent Skillsで月間レポート作成始めてみた / Unifying Bug & Infra Insights — Building Monthly Quality Reports with Agent Skills
bun913
5
3.9k
生成AI時代のエンジニア育成 変わる時代と変わらないコト
starfish719
0
330
Oracle Cloud Infrastructure(OCI):Onboarding Session(はじめてのOCI/Oracle Supportご利⽤ガイド)
oracle4engineer
PRO
2
17k
システムは「動く」だけでは足りない 実装編 - 非機能要件・分散システム・トレードオフをコードで見る
nwiizo
1
260
Strands Agents × Amazon Bedrock AgentCoreで パーソナルAIエージェントを作ろう
yokomachi
2
260
AIがコードを書く時代の ジェネレーティブプログラミング
polidog
PRO
3
660
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.3k
Oracle AI Databaseデータベース・サービス: BaseDB/ExaDB-Dの可用性
oracle4engineer
PRO
1
170
システムは「動く」だけでは 足りない - 非機能要件・分散システム・トレードオフの基礎
nwiizo
24
7.5k
Data Enabling Team立ち上げました
sansantech
PRO
0
300
Kubernetes基盤における開発者体験 とセキュリティの両⽴ / Balancing developer experience and security in a Kubernetes-based environment
chmikata
0
220
Featured
See All Featured
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
710
Claude Code のすすめ
schroneko
67
220k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
ラッコキーワード サービス紹介資料
rakko
1
2.9M
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
250
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
10k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
200
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.2k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
53k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.8k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!