Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Small Mislabeling is fine: How LINE catches lab...

Small Mislabeling is fine: How LINE catches label noise

Jaewook Kang (Marco)
NAVER Chatbot Model
https://linedevday.linecorp.com/jp/2019/sessions/S1-22

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay Small Mislabeling Is Fine: How LINE Catches Label

    Noise > Jaewook Kang (Marco) > NAVER Chatbot Model
  2. Dog Dog Dog Dog ? Dog ? Dog ? https://www.freecodecam

    p.org/news/chihuahua- or-muffin-my-search-for- the-best-computer- vision-api- cbda4d6b425d/
  3. Dog Dog Dog Dog ? Dog ? Dog ? Clean

    Dog Noisy Dog https://www.freecodecam p.org/news/chihuahua- or-muffin-my-search-for- the-best-computer- vision-api- cbda4d6b425d/
  4. Impact of Label Noise on Model Training vWithout Label Noise

    Feature extractor Classifier Input X Prediction Score Optimizer “Dog” Sally features 1) Brown Skin 2) Black Eyes Low Loss Label Y Model
  5. Impact of Label Noise on Model Training vWith Label Noise

    Feature extractor Classifier Prediction Score Label Y Optimizer “Dog” Sally feature 1) Brown Skin 2) Black Eyes 3) Chocolate balls 4) Raisin 5) Sugar Input X High Loss Noisy dog Model
  6. Active Learning for Label Cleaning [Konyushkova’2017],[Yoo’2019] Noisy Label Data ReLabeled

    Data Noise Label Filtering Model Inference ReLabeling By Human
  7. Today’s Talk How does Line correct the Label noise ?

    - Label-correcting AutoML: PICO - Application to FAQ/Chat dataset
  8. Idea • Problem: A white duck is mislabeled. Can you

    look around and fix it? Labeled as the Same Class yellow! yellow! yellow! yellow! yellow!
  9. Idea • Problem: A white duck is mislabeled. Can you

    look around and fix it? The Mislabeled (Label Noise) The Well- labeled Labeled as the Same Class yellow! yellow! yellow! yellow! yellow!
  10. Split - Train - Check Algorithm Split: Train Set Valid

    Set Class1-Labeled Class1- Mislabeled Class2-Labeled
  11. Split - Train - Check Algorithm Split: • Train set:

    Ref. dataset for correction • Valid set: Checking targets (detaching label from the targets) Train Set Valid Set
  12. Split - Train - Check Algorithm Train: • Training the

    checker model + Train Set Training “Checker”
  13. Split - Train - Check Algorithm Check: • Checking the

    label of the valid set Valid Set + Trained Checker
  14. MultiSplit: Preparing a variety version of the split branches Vote:

    Collecting the results from each “Split-Train-Check” for label update
  15. MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset

    + + + + … Check … k branches k checkers Single segment is checked n time
  16. MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset

    + + + + … Check … Vote & update k branches k checkers Single segment is checked n time
  17. MultiSplit - Train – Check -Vote Algo Majority Vote: •

    The most naïve method • white: yellow = 3 : 0 + + + + + You are white Sure, you are white You m ust be white The checking target
  18. PICO: Probabilistic Iterative COrrection + + + + + W:0.7,

    Y: 0.3 흰 :0.85, 노 : 0.15 흰 :0.62, 노 : 0.38
  19. PICO’s Three Keys Bayesian Combining of the branch results Thomas

    Bayes 1701-1761 Hidden Markov Modeling of labeling history over iterations Andrey A. Markov 1856–1922 Iterative Probabilistic correction for labeling update Robert G. Gallager 1931-present
  20. 1) Soft-Voting of Branch Results + + + + …

    Likelihood 1 Likelihood 2 Joint Likelihood p(q |C t+1 ) p(q |C t+1 ) ∝ check k (q) k∈ne(q) ∏ Take n related check results from k q Data
  21. 2) Labeling Prior Update Joint Likelihood Prev posterior Class transition

    probability + + + + … Likelihood 1 Likelihood 2 Update Prior By Hidden Markov Model p(C t | q) = p(C t |C t−1 )p( C t−1 ∑ C t−1 | q) p(q |C t+1 ) Prior p(C t | q) q Data
  22. 3) Compute Posterior by Bayes’ Rule + + + +

    … Likelihood 1 Likelihood 2 Posterior Joint Likelihood p(q |C t+1 ) p(C t+1 | q) Update Posterior By Bayes Rule Prior p(C t | q) p(C t+1 | q) = 1 Z p(C t | q) × check k (q) k∈ne(q) ∏ q Data
  23. 4) Label Update Posterior p(C t+1 | q) Update Label

    + + + + … C ! t+1 = argmax p(C t+1 | q) Data p(C t+1 |C t ) ≈ 1 N p(C t+1 |C t ,q) ∀q ∑ Update Class trans. prob.
  24. 5) Passing Probability Map to Next Iter. Posterior + +

    + + … Retraining k Checkers Based on Updated Labels C ! t+1 Data PICO is iterative! Update Label Update Class trans prob.
  25. 0) Next Iteration Starts! Posterior + + + + …

    PICO is iterative! Trained k Checkers for next iteration Retraining k Checkers Based on Updated Labels Data Update Label Update Class trans prob.
  26. PICO Architecture Branch 0 Data Splitter Train Split-Train-Check branches Branch

    1 branchN Train Train … The original dataset Relabeled dataset PICO is iterative! Checker Checker Checker PICO
  27. PICO Implementation Branch 0 Data Splitter Train Split-Train-Check branches Branch

    1 Branch N Train Train … The original dataset Updated dataset PICO is iterative! Checker Checker Checker PICO
  28. FAQ Chatbot • Providing qualified service by limiting conversation domain

    • Query-Intent Classification methods showing good performance Neural Intent Classifier LINEのパスワドがわか らないのでどうしたらい いですか? Word embedding Sentence embedding OR Embedding Input query パスワド LINEミュジック ラインスタンプ ラインペイ
  29. FAQ chatbot service wrt. LINE use • Appx 90000 data

    / 1000 intent classes • Checker model: classifier-BERT • Characteristic LINE KantanHelp FAQ ü Train “classifier” only for the checkers in PICO ü Imbalance class problem ü Query-intent mapping ambiguity problem ü Some noisy queries
  30. LINE KantanHelp FAQ PICO-based Preprocessing 74.45% 2.97% gain Baseline 71.48%

    Rule-based preprocessing 72.43% Top-1 accuracy: Classifier-BERT, for single model ü 5000 validation set 2.02% 0.95%
  31. Small Mislabeling is fine! • AI projects without data strategy

    are difficult to succeed. • AI data automatic cleaning pipeline is very competitive • CLOVA/LINE Chatbot Builder improves service quality by automatically purifying data through PICO • PICO architecture can be applied to other data sets
  32. References • [Bo Han, et al.: 2019] Han, Bo et

    al.,“Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in NIPS, pp. 8536-8546, 2018. • [Konyushkova’2017] K. Konyushkova, et al., “Learning active learning from real and synthetic data,” in NIPS, 2017. • [Lu Jiang, et al.:2018] Jiang, Lu et al., “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels.” in ICML, pp. 2309-2318, 2018. • [Y. Bengio: 2009] Bengio, Yoshua et al, “Curriculum learning,” in ICML, pp. 41–48, 2009 • [Yoo’2019] D. Yoo et al., “Learning loss for Active learning,” in CVPR, 2019