Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enhanced EC Recommendations: Trustworthy Valida...
Search
LINE Developers Taiwan
PRO
September 23, 2024
Technology
0
47
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for Two-Tower Model
Event: iThome Hello World Dev Conference
Speaker: Dan Chen
LINE Developers Taiwan
PRO
September 23, 2024
Tweet
Share
More Decks by LINE Developers Taiwan
See All by LINE Developers Taiwan
#Rookie’s Adventure: A 0 to 1 Dev Journey
line_developers_tw
PRO
0
18
LINE 購物幕後推手
line_developers_tw
PRO
0
610
從校園到職場 我的實習旅程
line_developers_tw
PRO
0
110
探索數據未來
line_developers_tw
PRO
0
16
MLE 的修煉之路
line_developers_tw
PRO
0
88
LINE 實習分享 & 國際黑客松參賽分享
line_developers_tw
PRO
0
47
在 GCP 運用 Parse 全家餐管理那堆 AI 應用的資料
line_developers_tw
PRO
0
41
40歲的我會給20歲的自己,關於軟體開發的7個建議
line_developers_tw
PRO
0
9.7k
從零到一:轉碼仔的實習攻略
line_developers_tw
PRO
0
71
Other Decks in Technology
See All in Technology
チェックツールを導入したけど使ってもらえなかった話 #GAADjp
lycorptech_jp
PRO
1
140
激動の一年を通じて見えてきた「技術でリードする」ということ
ktr_0731
8
8.4k
名単体テスト 禁断の傀儡(モック)
iwamot
PRO
1
320
NAB Show 2025 動画技術関連レポート / NAB Show 2025 Report
cyberagentdevelopers
PRO
0
120
VitePress & MCPでアプリ仕様のオープン化に挑戦する
hal_spidernight
0
150
テスト設計、逆から読むとおもしろい──仕様にない“望ましさ”の逆設計
mhlyc
0
180
4月15日の AZ 障害をテクサポの中の人目線で振り返ってみる
kazzpapa3
3
190
Опыт использования Nessie в Азбуке Вкуса
emeremyanina1234
0
380
事業と組織から目を逸らずに技術でリードする
ogugu9
19
5.3k
インラインRBSコメントに鯛pe checkersもニッコリ
sansantech
PRO
2
200
10年もののアプリケーションを運用・開発するアプリケーションエンジニアのDatadog活用術
miyamu
0
130
AWS_MCP_Servers入門.pdf
naotoiso
0
180
Featured
See All Featured
Adopting Sorbet at Scale
ufuk
76
9.4k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
Unsuck your backbone
ammeep
671
58k
VelocityConf: Rendering Performance Case Studies
addyosmani
329
24k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
41
2.3k
Music & Morning Musume
bryan
47
6.5k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Designing for humans not robots
tammielis
253
25k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.4k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
430
Transcript
None
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for
Two-Tower Model EC Data Dev / Data Scientists Dan Chen
Dan LINE Taiwan EC Dev - Data Scientis Work Experience
Side Project
01 02 03 04 Evaluation Framework Offline & Online Evaluation
LLM on Recommendation What is Trustworthy 05 Q&A CONTENT
Why it’s so important 01 What is Trustworthy
Element of trustworthy 特點項目文字 特點項目 Trustworthy 特點項目文字 特點項目 特點項目文字 特點項目
Four Perspective 特點項目文字 特點項目 Trustworthy Recommendation 特點項目文字 特點項目 特點項目文字 特點項目
Data Preparation Data Representation Recommendation Generation Performance Evaluation
How to Correctly Evaluate AI 02 Evaluation Framework
Two - Stage Recommendation system Brickmaster Scalable Scenario-wise KPI -
Oriented Trustworthy
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to Correctly Evaluate AI 03 Offline & Online Evaluation
Key point to show how your algorithms can contribute to
your business Offline Evaluation
Key point to show how your algorithms can contribute to
your business Online Evaluation
Avoid pitfalls In Practice If experiment isn’t’ significant ?? Sample
ratio mismatch ?? Novelty effect ?? Key point to show how your algorithms can contribute to your business A/B test
Case – EC Shop recommendation
04 LLM On Recommendation
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Evaluate & Challenge 05 Conclusion
Conclusion Business Value OpenAI, Claude, Gemini XGBoost or OpenSource 來源:https://zh.wikipedia.org/zh-
tw/%E7%BE%8E%E5%9C%8B%E9%9A%8A%E9%95%B72%EF%BC%9A%E9%85%B7%E5%AF%9 2%E6%88%B0%E5%A3%AB 來源:https://images.app.goo.gl/HCygtJVtoPaU2KgX6
Conclusion & Challenge 1. Data Quality 2. Multiple – Metrics
evaluation 3. Conduct A/B test Experiment 4. Human Perception Evaluation Challenge
Q&A 聯絡資訊 (Linkedin – Dan Chen)
None
None