Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE Dev Meetup 16 - Data Dev Team

LINE Dev Meetup 16 - Data Dev Team

LINE Taiwan Data Dev Team Introduction by Alice Lin @ LINE Developers Meetup 16

Event: https://linegroup.kktix.cc/events/20220324

2102a6b8760bd6f57f672805723dd83a?s=128

LINE Developers Taiwan
PRO

March 24, 2022
Tweet

More Decks by LINE Developers Taiwan

Other Decks in Technology

Transcript

  1. LINE Taiwan Data Dev Team Introduction Alice Lin

  2. 林昱辰(Alice Lin) Data Dev TECH FRESH 2021.09 - Now •

    B07 台⼤資訊管理學系 • ML/DL、NLP、Web Service • 旅遊、⽻球、攝影 1
  3. 資料工程 Data Engineering 資料科學 Data Science 資料分析 Data Analysis 資料工程/資料科學/資料分析分別是什麼?

    © LINE 資料搜集 資料清洗 資料倉儲 資料管線 機器學習 深度學習 模型開發與優化 數據運營 A/B Testing 商業洞見 報表建置 2
  4. Roles, Skills and Responsibility • Build and optimize data pipeline

    architecture • Assemble large, complex data sets that meet requirements Data Engineer Data Analyst Big data infra, SQL, ETL, message queuing • Interpret data, analyze results using statistical techniques • Identify, analyze, and interpret trends or patterns in complex data sets Statistics, Data Visualization, Business Knowledge SKILL RESPONSIBILITY Pipeline Biz • Select appropriate datasets and data representation methods • Research and implement appropriate ML algorithms Data Scientist Machine learning, deep learning, CV, NLP, Speech Model ML Svc Engineer • Build and scale machine learning infrastructure • Monitor model performance System infrastructure design, DevOps Service © LINE 3
  5. Supports & Evangelism DS/ML Applications LINE TW Data Dev ᅶୂෛ੹޻࡞

    Data Governance Common Platforms 4
  6. Data Governance / Common Platforms IU (with Data Governance) •

    Datalake/Datachain migration to IU • IU Portal and IU Web Renewal • MID download reduction plan MLU/Jutopia • Standard machine learning platform • MLOps CLOVA AI • CLOVA AI solutions Deeppocket/PicCell/… • Standard model serving platform • LINE AI Platform • And more … 5
  7. MarTech應用:顧客終身價值(CLV)預測 從歷史用戶行為與交易資料,利用機器學習模型,協助服務找出最有價值的顧客! R F M • Recency • Frequency •

    Monetary © LINE DS/ML Applications 6
  8. MarTech應用:顧客終身價值(CLV)預測 從歷史用戶行為與交易資料,利用機器學習模型,協助服務找出最有價值的顧客! RFM Model CLV Model 歷史價值衡量 未來價值預測 © LINE

    DS/ML Applications 6
  9. NLP-enabled Applications DS/ML Applications 7

  10. NLP services platform 文字是重要的的內容表現方式,透過自動化的自然 語言處理技術,可以大量且快速的讓系統處理與理 解文字,進而開發加值應用。 由於中文的特性,所需的自然語言技巧有其不同之 處。為了開發資源能有效利用,我們將常見的自然 語言任務,實作成多項服務,並整合成一個通用的 平台。

    © LINE DS/ML Applications 8
  11. Concept: NLPaaS As-is Prepare labeled data Develop model Test model

    Training Deploy To-be Integrate with services Prepare labeled data Train model with NLPaaS Integrate with services © LINE DS/ML Applications 9
  12. NLPaaS 分類 近似 文章 © LINE NLP services DS/ML Applications

    10
  13. Can the model be explainable? Engineer User 為什麼模型做出這樣的預測? 我該相信模型產生的結果嗎? ©

    LINE 11
  14. Explainable AI - Let’s open the black box! Data Validation

    Performance Model Explainable Model Interpretation © LINE 12
  15. SHAP(SHapley Additive exPlanations) • shap values 原先是由一位博 弈理論大師 - 加州大學洛杉

    磯分校(UCLA)教授 Lloyd Shapley 提出,最初用以計算 某玩家貢獻度。 • SHAP 則是本篇 paper 提出的 一種解釋機器學習模型的方 法,其核心思想為計算每個 feature 對 output 的影響程度 (shapley value) 。 Download the paper: https://www.researchgate.net/publication/317062430_A_Uni fied_Approach_to_Interpreting_Model_Predictions 13
  16. import shap # define our explainer explainer = shap.Explainer(classifier) #

    input one sample shap_values = explainer(data[:1]) # text plot of the shap values shap.plots.text(shap_values[0]) 解釋 LINE Travel TW 文章分類器 • 挑選其中一筆資料,觀察其在“住宿”標 籤上的視覺化結果: © LINE • 判斷一篇文章屬於以下哪種類別 (Multiclass classifier): positive effect negative effect ['遊記', '住宿', '旅遊知識', '購物', '景點', '美食'] 14
  17. Coworking Time Management What did I learn… Data Study Group

    Internal Hackathon Keep Learning Scrum Coding Style Work-School- Life-Balance © LINE Communication Skill 15
  18. Thank you & Welcome to join us!