Slide 1

Slide 1 text

LINE Taiwan Data Dev Team Introduction Alice Lin

Slide 2

Slide 2 text

林昱辰(Alice Lin) Data Dev TECH FRESH 2021.09 - Now • B07 台⼤資訊管理學系 • ML/DL、NLP、Web Service • 旅遊、⽻球、攝影 1

Slide 3

Slide 3 text

資料工程 Data Engineering 資料科學 Data Science 資料分析 Data Analysis 資料工程/資料科學/資料分析分別是什麼? © LINE 資料搜集 資料清洗 資料倉儲 資料管線 機器學習 深度學習 模型開發與優化 數據運營 A/B Testing 商業洞見 報表建置 2

Slide 4

Slide 4 text

Roles, Skills and Responsibility • Build and optimize data pipeline architecture • Assemble large, complex data sets that meet requirements Data Engineer Data Analyst Big data infra, SQL, ETL, message queuing • Interpret data, analyze results using statistical techniques • Identify, analyze, and interpret trends or patterns in complex data sets Statistics, Data Visualization, Business Knowledge SKILL RESPONSIBILITY Pipeline Biz • Select appropriate datasets and data representation methods • Research and implement appropriate ML algorithms Data Scientist Machine learning, deep learning, CV, NLP, Speech Model ML Svc Engineer • Build and scale machine learning infrastructure • Monitor model performance System infrastructure design, DevOps Service © LINE 3

Slide 5

Slide 5 text

Supports & Evangelism DS/ML Applications LINE TW Data Dev ᅶୂෛ੹޻࡞ Data Governance Common Platforms 4

Slide 6

Slide 6 text

Data Governance / Common Platforms IU (with Data Governance) • Datalake/Datachain migration to IU • IU Portal and IU Web Renewal • MID download reduction plan MLU/Jutopia • Standard machine learning platform • MLOps CLOVA AI • CLOVA AI solutions Deeppocket/PicCell/… • Standard model serving platform • LINE AI Platform • And more … 5

Slide 7

Slide 7 text

MarTech應用:顧客終身價值(CLV)預測 從歷史用戶行為與交易資料,利用機器學習模型,協助服務找出最有價值的顧客! R F M • Recency • Frequency • Monetary © LINE DS/ML Applications 6

Slide 8

Slide 8 text

MarTech應用:顧客終身價值(CLV)預測 從歷史用戶行為與交易資料,利用機器學習模型,協助服務找出最有價值的顧客! RFM Model CLV Model 歷史價值衡量 未來價值預測 © LINE DS/ML Applications 6

Slide 9

Slide 9 text

NLP-enabled Applications DS/ML Applications 7

Slide 10

Slide 10 text

NLP services platform 文字是重要的的內容表現方式,透過自動化的自然 語言處理技術,可以大量且快速的讓系統處理與理 解文字,進而開發加值應用。 由於中文的特性,所需的自然語言技巧有其不同之 處。為了開發資源能有效利用,我們將常見的自然 語言任務,實作成多項服務,並整合成一個通用的 平台。 © LINE DS/ML Applications 8

Slide 11

Slide 11 text

Concept: NLPaaS As-is Prepare labeled data Develop model Test model Training Deploy To-be Integrate with services Prepare labeled data Train model with NLPaaS Integrate with services © LINE DS/ML Applications 9

Slide 12

Slide 12 text

NLPaaS 分類 近似 文章 © LINE NLP services DS/ML Applications 10

Slide 13

Slide 13 text

Can the model be explainable? Engineer User 為什麼模型做出這樣的預測? 我該相信模型產生的結果嗎? © LINE 11

Slide 14

Slide 14 text

Explainable AI - Let’s open the black box! Data Validation Performance Model Explainable Model Interpretation © LINE 12

Slide 15

Slide 15 text

SHAP(SHapley Additive exPlanations) • shap values 原先是由一位博 弈理論大師 - 加州大學洛杉 磯分校(UCLA)教授 Lloyd Shapley 提出,最初用以計算 某玩家貢獻度。 • SHAP 則是本篇 paper 提出的 一種解釋機器學習模型的方 法,其核心思想為計算每個 feature 對 output 的影響程度 (shapley value) 。 Download the paper: https://www.researchgate.net/publication/317062430_A_Uni fied_Approach_to_Interpreting_Model_Predictions 13

Slide 16

Slide 16 text

import shap # define our explainer explainer = shap.Explainer(classifier) # input one sample shap_values = explainer(data[:1]) # text plot of the shap values shap.plots.text(shap_values[0]) 解釋 LINE Travel TW 文章分類器 • 挑選其中一筆資料,觀察其在“住宿”標 籤上的視覺化結果: © LINE • 判斷一篇文章屬於以下哪種類別 (Multiclass classifier): positive effect negative effect ['遊記', '住宿', '旅遊知識', '購物', '景點', '美食'] 14

Slide 17

Slide 17 text

Coworking Time Management What did I learn… Data Study Group Internal Hackathon Keep Learning Scrum Coding Style Work-School- Life-Balance © LINE Communication Skill 15

Slide 18

Slide 18 text

Thank you & Welcome to join us!