How Machine Learning helps LINE Fact Checker - Now and Future

How Machine Learning helps LINE Fact Checker - Now and Future

Huai-Chien Hung (Jim Horng)
LINE Taiwan Today Team Software/Data Engineer
https://linedevday.linecorp.com/jp/2019/sessions/S1-08

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay How Machine Learning Helps LINE Fact Checker -

    Now and Future > Huai-Chien Hung (Jim Horng) > LINE Taiwan Today Team Software/Data Engineer
  2. Agenda > The Need for Fact Checker > Overview of

    ML Components > Overview of ML System > Challenges and Future Work
  3. Severity of Fake Messages Source from Digital Society Project

  4. Fact Checker OA, Dashboard OA - Query OA - Report

    Dashboard
  5. Agenda > The Need for Fact Checker > Overview of

    ML Components > Overview of ML System > Challenges and Future Work
  6. AI Knows Everything?

  7. How ML Helps? Human + AI = Collaboration

  8. How ML Helps? Coverage of Fake/Real Messages Verified Messages Similar

    Messages Total Messages ML Near-Duplication ML Classification
  9. Near-Duplication - Use Cases Verified Fake Message: "footage on Captain's

    Instagram Stories showed them wearing wedding rings on their both hands, which proves Captain America and Captain Marvel get married in Las Vegas" Query Result Type Captain America and Captain Marvel get married in Las Vegas True Partial The wedding in Las Vegas is hosted by Captain America and Captain Marvel couple True Semantically Ironman and Black Widow get married in Los Angeles False Syntactically
  10. Near-Duplication - Flow > Long Text è Full Match •

    performs faster and trustworthy > Short Text è Partial Match + Fuzzy Tolerance • 20% user query are partial texts of original messages ( E.g. sentence, topic of an article )
  11. Full Match - BERT based Source from https://towardsdatascience.com/bert-explained-state-of-the-art-language-model- for-nlp-f8b21a9b6270 >

    Has Chinese pre-trained model > Can extract sentence vector from Upstream > Can capture semantics based on Context
  12. Message Classification "They were in Vegas for the BillBoard Music

    Awards but, a few hours later, footage on Captain's Instagram Stories showed them wearing wedding rings on their both hands, which proves Captain America and Captain Marvel get married in Las Vegas " Topic probability: traffic: 3.7%, life: 38%, art: 49%, health: 2.54%, others: 1.6%, sport: 4.3%, education: 0.7%, law: 0.16%
  13. > Fine-Tuned (Accuracy) vs. Pre-Trained (Speed) > BERT + NN

    Layer Message Classification
  14. Agenda > The Need for Fact Checker > Overview of

    ML Components > Overview of ML System > Challenges and Future Work
  15. ML System Under the hood Scheduling, Orchestration Serving Training Data

    Ingestion Async Execution Model Management Model Deployment Index-based Vector Search (ANN)
  16. Agenda > The Need for Fact Checker > Overview of

    ML Components > Overview of ML System > Challenges and Future Work
  17. > VGG16 (ConvNet Configuration D) > Convolution network extracts image

    features, Able to capture objects and shape. Identify Message with Image Similarity 65% 46% 82%
  18. Duplicated Reported Messages Training Store as New If New, by

    Near- Duplication Report Message Report Message Report Message Check By Cache / Search Engine Store As Cache / Search Engine House Keeping: Find Duplicates By Near-Duplication and Merge
  19. > Inference > Fusion with other models > knowledge-graph triples

    Knowledge Graph Model
  20. Thank you