Slide 1

Slide 1 text

Unleashing the Power of NLP : Innovations from LINE Data Dev Danny Lo 2023/03/03

Slide 2

Slide 2 text

Danny Lo Data Dev, TECH FRESH • NTU CSIE • Research Assistant @MSLAB • LINE TECH FRESH @LINE Data Dev

Slide 3

Slide 3 text

Data Dev 的⼯作內容~ Data Dev LINE Family Services LINE SHOPPING LINE SPOT LINE MUSIC LINE Sticker LINE VOOM LINE Reward Fact Checker LINE HELP TW LINE Travel NLP Knowledge Graph Uplift Modeling NER Classifier Duplication Detector Auto completion Keyword Extraction Related Search Text Generation User Tagging Data Analytics Recom- mendation CLV LINE TODAY

Slide 4

Slide 4 text

Data Dev 成員組成 • Build and optimize da ta pipeline architectu re • Assemble large, comp lex data sets that mee t requirements Data Engineer Data Analyst Big data infra, SQL, ETL, message queuing • Interpret data, analyz e results using statisti cal techniques • Identify, analyze, and interpret trends or pa tterns in complex dat a sets Statistics, Data Visualiza tion, Business Knowled ge SKILL RESPONSIBILITY • Select appropriate da tasets and data repre sentation methods • Research and implem ent appropriate ML al gorithms Data Scientist Machine learning, deep learning, CV, NLP, Speec h ML Svc Engineer • Build and scale machi ne learning infrastruc ture • Monitor model perfor mance System infrastructure d esign, DevOps

Slide 5

Slide 5 text

NLP 應⽤情境篇~

Slide 6

Slide 6 text

情境⼀:廣告審查 ※Source from︓Google Images 廣告不實? 涉及療效? 涉及誇⼤?

Slide 7

Slide 7 text

建立 NLP pipeline Data preparation Model training Model Evaluation Model Validation Model Analysis

Slide 8

Slide 8 text

後續⼯作~ Upgrading ※ 4 Monitoring ※ 3 Testing Deployment ※ 1 ※ 2

Slide 9

Slide 9 text

情境⼆:SmartText

Slide 10

Slide 10 text

SmartText – 如果我們可以做到…

Slide 11

Slide 11 text

SmartText – 如何建立 ML pipeline? DS DE MLE DA PM Biz DS DE DS DS DE DA MLE Data preparation Scaling Performance Model decay Data drift EDA Model build Hyper-parameter tu ning Evaluation Feature Engineering Error analysis MLE MLE MLE DE

Slide 12

Slide 12 text

情境三:Click Through Rate Prediction ※Source from︓https://paperswithcode.com/task/click-through-rate-prediction v 預測⽤⼾點擊廣告的機率

Slide 13

Slide 13 text

Approach 1 ※Source from︓Flaticon

Slide 14

Slide 14 text

怎麼模型表現不如預期~

Slide 15

Slide 15 text

User Behavior Understanding ※Source from︓https://www.gopaisa.com/news/wp-content/uploads/2018/08/User-Data-Leak.jpg • 使⽤者每⽇產⽣數以萬計的資料,包括瀏覽紀錄、 觀看廣告、搜尋紀錄等 • 利⽤這些資料建立模型,為每個⽤⼾⽣成⼀組向量, 這個向量可以反映⽤⼾的興趣和偏好。

Slide 16

Slide 16 text

Approach 2 ※Source from︓Flaticon User Embedding

Slide 17

Slide 17 text

LINE Data Dev 與學校實驗室有哪裡不⼀樣~ 1. 目標 2. 資料來源 3. 模型需求 4. 團隊組成

Slide 18

Slide 18 text

實習⽇常

Slide 19

Slide 19 text

Q&A

Slide 20

Slide 20 text

THANK YOU