Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unleashing the Power of NLP : Innovations from LINE Data Dev

Unleashing the Power of NLP : Innovations from LINE Data Dev

Event: 臺北醫學大學企業參訪
Speaker: Danny Lo

LINE Developers Taiwan

March 03, 2023
Tweet

More Decks by LINE Developers Taiwan

Other Decks in Technology

Transcript

  1. Unleashing the Power of NLP :
    Innovations from LINE Data Dev
    Danny Lo
    2023/03/03

    View full-size slide

  2. Danny Lo
    Data Dev, TECH FRESH
    • NTU CSIE
    • Research Assistant @MSLAB
    • LINE TECH FRESH @LINE Data Dev

    View full-size slide

  3. Data Dev 的⼯作內容~
    Data Dev
    LINE
    Family
    Services
    LINE
    SHOPPING
    LINE
    SPOT
    LINE
    MUSIC
    LINE
    Sticker
    LINE
    VOOM
    LINE
    Reward
    Fact
    Checker
    LINE
    HELP TW
    LINE
    Travel
    NLP Knowledge
    Graph
    Uplift
    Modeling
    NER
    Classifier
    Duplication
    Detector
    Auto
    completion
    Keyword
    Extraction
    Related
    Search
    Text
    Generation
    User
    Tagging
    Data
    Analytics
    Recom-
    mendation
    CLV
    LINE
    TODAY

    View full-size slide

  4. Data Dev 成員組成
    • Build and optimize da
    ta pipeline architectu
    re
    • Assemble large, comp
    lex data sets that mee
    t requirements
    Data Engineer Data Analyst
    Big data infra, SQL, ETL,
    message queuing
    • Interpret data, analyz
    e results using statisti
    cal techniques
    • Identify, analyze, and
    interpret trends or pa
    tterns in complex dat
    a sets
    Statistics, Data Visualiza
    tion, Business Knowled
    ge
    SKILL RESPONSIBILITY
    • Select appropriate da
    tasets and data repre
    sentation methods
    • Research and implem
    ent appropriate ML al
    gorithms
    Data Scientist
    Machine learning, deep
    learning, CV, NLP, Speec
    h
    ML Svc Engineer
    • Build and scale machi
    ne learning infrastruc
    ture
    • Monitor model perfor
    mance
    System infrastructure d
    esign, DevOps

    View full-size slide

  5. NLP 應⽤情境篇~

    View full-size slide

  6. 情境⼀:廣告審查
    ※Source from︓Google Images
    廣告不實?
    涉及療效?
    涉及誇⼤?

    View full-size slide

  7. 建立 NLP pipeline
    Data preparation
    Model
    training
    Model
    Evaluation
    Model
    Validation
    Model
    Analysis

    View full-size slide

  8. 後續⼯作~
    Upgrading
    ※ 4
    Monitoring
    ※ 3
    Testing Deployment
    ※ 1 ※ 2

    View full-size slide

  9. 情境⼆:SmartText

    View full-size slide

  10. SmartText – 如果我們可以做到…

    View full-size slide

  11. SmartText – 如何建立 ML pipeline?
    DS
    DE MLE
    DA
    PM Biz DS
    DE DS DS DE DA
    MLE
    Data
    preparation Scaling
    Performance
    Model decay
    Data drift
    EDA Model build
    Hyper-parameter tu
    ning Evaluation
    Feature
    Engineering Error analysis
    MLE
    MLE MLE DE

    View full-size slide

  12. 情境三:Click Through Rate Prediction
    ※Source from︓https://paperswithcode.com/task/click-through-rate-prediction
    v 預測⽤⼾點擊廣告的機率

    View full-size slide

  13. Approach 1
    ※Source from︓Flaticon

    View full-size slide

  14. 怎麼模型表現不如預期~

    View full-size slide

  15. User Behavior Understanding
    ※Source from︓https://www.gopaisa.com/news/wp-content/uploads/2018/08/User-Data-Leak.jpg
    • 使⽤者每⽇產⽣數以萬計的資料,包括瀏覽紀錄、
    觀看廣告、搜尋紀錄等
    • 利⽤這些資料建立模型,為每個⽤⼾⽣成⼀組向量,
    這個向量可以反映⽤⼾的興趣和偏好。

    View full-size slide

  16. Approach 2
    ※Source from︓Flaticon
    User
    Embedding

    View full-size slide

  17. LINE Data Dev 與學校實驗室有哪裡不⼀樣~
    1. 目標
    2. 資料來源
    3. 模型需求
    4. 團隊組成

    View full-size slide

  18. 實習⽇常

    View full-size slide