Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enhanced EC Recommendations: Trustworthy Valida...
Search
LINE Developers Taiwan
PRO
September 23, 2024
Technology
0
67
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for Two-Tower Model
Event: iThome Hello World Dev Conference
Speaker: Dan Chen
LINE Developers Taiwan
PRO
September 23, 2024
Tweet
Share
More Decks by LINE Developers Taiwan
See All by LINE Developers Taiwan
LINE 與 AI 機器人技術應用現況
line_developers_tw
PRO
0
4
QA Testing
line_developers_tw
PRO
0
1
jcconf_datadev_prod
line_developers_tw
PRO
0
7
jcconf_SPM_prod
line_developers_tw
PRO
0
3
jcconf_LINEPay_prod
line_developers_tw
PRO
0
4
Live Activities in LINE
line_developers_tw
PRO
0
13
Neumorphism x Liquid Glass
line_developers_tw
PRO
0
12
猜你喜歡 – 打造高度擴展的個人化電商推薦
line_developers_tw
PRO
0
33
打造新電商搜尋體驗- 搜尋意圖辨識
line_developers_tw
PRO
0
17
Other Decks in Technology
See All in Technology
ガバメントクラウド(AWS)へのデータ移行戦略の立て方【虎の巻】 / 20251011 Mitsutosi Matsuo
shift_evolve
PRO
2
200
Introduction to Sansan Meishi Maker Development Engineer
sansan33
PRO
0
310
このままAIが発展するだけでAGI達成可能な理由
frievea
0
100
物体検出モデルでシイタケの収穫時期を自動判定してみた。 #devio2025
lamaglama39
0
110
OCI Network Firewall 概要
oracle4engineer
PRO
2
7.9k
速習AGENTS.md:5分で精度を上げる "3ブロック" テンプレ
ismk
6
1.6k
やる気のない自分との向き合い方/How to Deal with Your Unmotivated Self
sanogemaru
0
510
[Codex Meetup Japan #1] Codex-Powered Mobile Apps Development
korodroid
2
830
Introduction to Bill One Development Engineer
sansan33
PRO
0
300
能登半島災害現場エンジニアクロストーク 【JAWS FESTA 2025 in 金沢】
ditccsugii
0
840
フレームワークを意識させないワークショップづくり
keigosuda
0
190
CoRL 2025 Survey
harukiabe
1
200
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
173
14k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
45
2.5k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
For a Future-Friendly Web
brad_frost
180
10k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.1k
KATA
mclloyd
32
15k
jQuery: Nuts, Bolts and Bling
dougneiner
65
7.9k
Automating Front-end Workflow
addyosmani
1371
200k
The Cult of Friendly URLs
andyhume
79
6.6k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Transcript
None
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for
Two-Tower Model EC Data Dev / Data Scientists Dan Chen
Dan LINE Taiwan EC Dev - Data Scientis Work Experience
Side Project
01 02 03 04 Evaluation Framework Offline & Online Evaluation
LLM on Recommendation What is Trustworthy 05 Q&A CONTENT
Why it’s so important 01 What is Trustworthy
Element of trustworthy 特點項目文字 特點項目 Trustworthy 特點項目文字 特點項目 特點項目文字 特點項目
Four Perspective 特點項目文字 特點項目 Trustworthy Recommendation 特點項目文字 特點項目 特點項目文字 特點項目
Data Preparation Data Representation Recommendation Generation Performance Evaluation
How to Correctly Evaluate AI 02 Evaluation Framework
Two - Stage Recommendation system Brickmaster Scalable Scenario-wise KPI -
Oriented Trustworthy
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to Correctly Evaluate AI 03 Offline & Online Evaluation
Key point to show how your algorithms can contribute to
your business Offline Evaluation
Key point to show how your algorithms can contribute to
your business Online Evaluation
Avoid pitfalls In Practice If experiment isn’t’ significant ?? Sample
ratio mismatch ?? Novelty effect ?? Key point to show how your algorithms can contribute to your business A/B test
Case – EC Shop recommendation
04 LLM On Recommendation
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Evaluate & Challenge 05 Conclusion
Conclusion Business Value OpenAI, Claude, Gemini XGBoost or OpenSource 來源:https://zh.wikipedia.org/zh-
tw/%E7%BE%8E%E5%9C%8B%E9%9A%8A%E9%95%B72%EF%BC%9A%E9%85%B7%E5%AF%9 2%E6%88%B0%E5%A3%AB 來源:https://images.app.goo.gl/HCygtJVtoPaU2KgX6
Conclusion & Challenge 1. Data Quality 2. Multiple – Metrics
evaluation 3. Conduct A/B test Experiment 4. Human Perception Evaluation Challenge
Q&A 聯絡資訊 (Linkedin – Dan Chen)
None
None