Small Mislabeling is fine: How LINE catches label noise

Small Mislabeling is fine: How LINE catches label noise

Jaewook Kang (Marco)
NAVER Chatbot Model
https://linedevday.linecorp.com/jp/2019/sessions/S1-22

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Small Mislabeling Is Fine: How LINE Catches Label

    Noise > Jaewook Kang (Marco) > NAVER Chatbot Model
  2. What is Label Noise?

  3. Dog Dog Dog Dog ? Dog ? Dog ? https://www.freecodecam

    p.org/news/chihuahua- or-muffin-my-search-for- the-best-computer- vision-api- cbda4d6b425d/
  4. Dog Dog Dog Dog ? Dog ? Dog ? Clean

    Dog Noisy Dog https://www.freecodecam p.org/news/chihuahua- or-muffin-my-search-for- the-best-computer- vision-api- cbda4d6b425d/
  5. “Unintentional” mislabeling which incorrectly describes data in the same category

  6. Label Noise is everywhere !!

  7. What’s wrong with Label Noise?

  8. Impact of Label Noise on Model Training vWithout Label Noise

    Feature extractor Classifier Input X Prediction Score Optimizer “Dog” Sally features 1) Brown Skin 2) Black Eyes Low Loss Label Y Model
  9. Impact of Label Noise on Model Training vWith Label Noise

    Feature extractor Classifier Prediction Score Label Y Optimizer “Dog” Sally feature 1) Brown Skin 2) Black Eyes 3) Chocolate balls 4) Raisin 5) Sugar Input X High Loss Noisy dog Model
  10. Label noise makes feature extraction of model difficult

  11. The difficult feature extraction degrades the model’s performance

  12. https://www.trustinsights.ai/blog/2019/08/ 5-ways-your-ai-projects-fail-part-3-data- related-ai-failures/ http://www.ciokorea.com/news/127988 https://medium.com/@kanaugust/ai

  13. Then, how can we resolve the Label Noise problem ?

  14. Trained Model = Method( Data, Model Structure)

  15. Approach 1: Model Structure v Label Noise Robust Model

  16. Heavy Model == Heavy Serving + Training Cost Global warming

    is coming
  17. Approach 2-1:Curriculum Learning [Y. Bengio: 2009]

  18. Approach 2-2: MentorNet [Lu Jiang, et al.:2018],[Bo Han, et al.:

    2019]
  19. Heavy training cost + additional data Global warming is coming

  20. Approach 3: Data Cleaning Method

  21. Active Learning for Label Cleaning [Konyushkova’2017],[Yoo’2019] Noisy Label Data ReLabeled

    Data Noise Label Filtering Model Inference ReLabeling By Human
  22. How can we remove Label Noise without human assistance?

  23. Our Plan Noisy Label Data ReLabeled Data Model Inference Relabeling

    By Model
  24. Today’s Talk How does Line correct the Label noise ?

    - Label-correcting AutoML: PICO - Application to FAQ/Chat dataset
  25. Label-correcting AutoML: PICO

  26. Idea • Problem: A white duck is mislabeled. Can you

    look around and fix it? Labeled as the Same Class yellow! yellow! yellow! yellow! yellow!
  27. Idea • Problem: A white duck is mislabeled. Can you

    look around and fix it? The Mislabeled (Label Noise) The Well- labeled Labeled as the Same Class yellow! yellow! yellow! yellow! yellow!
  28. How to let AI do the same task?

  29. Split – Train – Check Algorithm

  30. Split - Train - Check Algorithm Split: Train Set Valid

    Set Class1-Labeled Class1- Mislabeled Class2-Labeled
  31. Split - Train - Check Algorithm Split: • Train set:

    Ref. dataset for correction • Valid set: Checking targets (detaching label from the targets) Train Set Valid Set
  32. Split - Train - Check Algorithm Train: • Training the

    checker model + Train Set Training “Checker”
  33. Split - Train - Check Algorithm Check: • Checking the

    label of the valid set Valid Set + Trained Checker
  34. Split - Train - Check Algorithm • Mislabeling Correction! Mislabeled

    Corrected Yellow? White!
  35. Split - Train - Check Algorithm Split Orig. Dataset +

    Check Train Label Update
  36. The current method cannot check entire data set

  37. Split - Train - Check Algorithm Split Orig. Dataset +

    Check Train Label Update
  38. Is there anyway to completely check the entire data ?

  39. MultiSplit – Train – Check - Vote

  40. MultiSplit: Preparing a variety version of the split branches Vote:

    Collecting the results from each “Split-Train-Check” for label update
  41. MultiSplit - Train – Check -Vote Orig. Dataset

  42. MultiSplit - Train – Check -Vote MultiSplit Orig. Dataset k

    branches
  43. MultiSplit - Train – Check -Vote + MultiSplit Orig. Dataset

    Train k branches
  44. MultiSplit - Train – Check -Vote MultiSplit Orig. Dataset +

    + Train k branches
  45. MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset

    + + + k branches
  46. MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset

    + + + + … k branches k checkers
  47. MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset

    + + + + … Check … k branches k checkers Single segment is checked n time
  48. MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset

    + + + + … Check … Vote & update k branches k checkers Single segment is checked n time
  49. MultiSplit - Train – Check -Vote Vote & update HOW?

  50. How to Vote ?

  51. MultiSplit - Train – Check -Vote Majority Vote:

  52. MultiSplit - Train – Check -Vote Algo Majority Vote: •

    The most naïve method • white: yellow = 3 : 0 + + + + + You are white Sure, you are white You m ust be white The checking target
  53. Utilize that the checker results are classification-score (probability)

  54. PICO: Probabilistic Iterative COrrection + + + + + W:0.7,

    Y: 0.3 흰 :0.85, 노 : 0.15 흰 :0.62, 노 : 0.38
  55. PICO’s Three Keys Bayesian Combining of the branch results Thomas

    Bayes 1701-1761 Hidden Markov Modeling of labeling history over iterations Andrey A. Markov 1856–1922 Iterative Probabilistic correction for labeling update Robert G. Gallager 1931-present
  56. 0) PICO Vote Starts! + + + + … Trained

    k Checkers
  57. 1) Soft-Voting of Branch Results + + + + …

    Likelihood 1 Likelihood 2 Joint Likelihood p(q |C t+1 ) p(q |C t+1 ) ∝ check k (q) k∈ne(q) ∏ Take n related check results from k q Data
  58. 2) Labeling Prior Update Joint Likelihood Prev posterior Class transition

    probability + + + + … Likelihood 1 Likelihood 2 Update Prior By Hidden Markov Model p(C t | q) = p(C t |C t−1 )p( C t−1 ∑ C t−1 | q) p(q |C t+1 ) Prior p(C t | q) q Data
  59. 3) Compute Posterior by Bayes’ Rule + + + +

    … Likelihood 1 Likelihood 2 Posterior Joint Likelihood p(q |C t+1 ) p(C t+1 | q) Update Posterior By Bayes Rule Prior p(C t | q) p(C t+1 | q) = 1 Z p(C t | q) × check k (q) k∈ne(q) ∏ q Data
  60. 4) Label Update Posterior p(C t+1 | q) Update Label

    + + + + … C ! t+1 = argmax p(C t+1 | q) Data p(C t+1 |C t ) ≈ 1 N p(C t+1 |C t ,q) ∀q ∑ Update Class trans. prob.
  61. 5) Passing Probability Map to Next Iter. Posterior + +

    + + … Retraining k Checkers Based on Updated Labels C ! t+1 Data PICO is iterative! Update Label Update Class trans prob.
  62. 0) Next Iteration Starts! Posterior + + + + …

    PICO is iterative! Trained k Checkers for next iteration Retraining k Checkers Based on Updated Labels Data Update Label Update Class trans prob.
  63. Gradually remove label noise through iterative probabilistic voting !

  64. All in One!

  65. PICO Architecture Branch 0 Data Splitter Train Split-Train-Check branches Branch

    1 branchN Train Train … The original dataset Relabeled dataset PICO is iterative! Checker Checker Checker PICO
  66. PICO Implementation Branch 0 Data Splitter Train Split-Train-Check branches Branch

    1 Branch N Train Train … The original dataset Updated dataset PICO is iterative! Checker Checker Checker PICO
  67. Our Plan Noisy Label Data ReLabeled Data Model Inference Relabeling

    By Model
  68. Our Plan became PICO! Noisy Label Data ReLabeled Data Split-Train-Check

    branches PICO
  69. Application to FAQ/Chat dataset!

  70. FAQ Chatbot • Providing qualified service by limiting conversation domain

    • Query-Intent Classification methods showing good performance Neural Intent Classifier LINEのパスワドがわか らないのでどうしたらい いですか? Word embedding Sentence embedding OR Embedding Input query パスワド LINEミュジック ラインスタンプ ラインペイ
  71. FAQ chatbot service wrt. LINE use • Appx 90000 data

    / 1000 intent classes • Checker model: classifier-BERT • Characteristic LINE KantanHelp FAQ ü Train “classifier” only for the checkers in PICO ü Imbalance class problem ü Query-intent mapping ambiguity problem ü Some noisy queries
  72. LINE KantanHelp FAQ PICO-based Preprocessing 74.45% 2.97% gain Baseline 71.48%

    Rule-based preprocessing 72.43% Top-1 accuracy: Classifier-BERT, for single model ü 5000 validation set 2.02% 0.95%
  73. Take Home Message!

  74. Small Mislabeling is fine! • AI projects without data strategy

    are difficult to succeed. • AI data automatic cleaning pipeline is very competitive • CLOVA/LINE Chatbot Builder improves service quality by automatically purifying data through PICO • PICO architecture can be applied to other data sets
  75. Human-free data cleaning can improve your AI performance

  76. Thank you !

  77. References • [Bo Han, et al.: 2019] Han, Bo et

    al.,“Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in NIPS, pp. 8536-8546, 2018. • [Konyushkova’2017] K. Konyushkova, et al., “Learning active learning from real and synthetic data,” in NIPS, 2017. • [Lu Jiang, et al.:2018] Jiang, Lu et al., “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels.” in ICML, pp. 2309-2318, 2018. • [Y. Bengio: 2009] Bengio, Yoshua et al, “Curriculum learning,” in ICML, pp. 41–48, 2009 • [Yoo’2019] D. Yoo et al., “Learning loss for Active learning,” in CVPR, 2019