Small Mislabeling is fine: How LINE catches label noise

2019 DevDay Small Mislabeling Is Fine: How LINE Catches Label
Noise > Jaewook Kang (Marco) > NAVER Chatbot Model

What is Label Noise?

Dog Dog Dog Dog ? Dog ? Dog ? https://www.freecodecam
p.org/news/chihuahua- or-muffin-my-search-for- the-best-computer- vision-api- cbda4d6b425d/

Dog Dog Dog Dog ? Dog ? Dog ? Clean
Dog Noisy Dog https://www.freecodecam p.org/news/chihuahua- or-muffin-my-search-for- the-best-computer- vision-api- cbda4d6b425d/

“Unintentional” mislabeling which incorrectly describes data in the same category

Label Noise is everywhere !!

What’s wrong with Label Noise?

Impact of Label Noise on Model Training vWithout Label Noise
Feature extractor Classifier Input X Prediction Score Optimizer “Dog” Sally features 1) Brown Skin 2) Black Eyes Low Loss Label Y Model

Impact of Label Noise on Model Training vWith Label Noise
Feature extractor Classifier Prediction Score Label Y Optimizer “Dog” Sally feature 1) Brown Skin 2) Black Eyes 3) Chocolate balls 4) Raisin 5) Sugar Input X High Loss Noisy dog Model

Label noise makes feature extraction of model difficult

The difficult feature extraction degrades the model’s performance

https://www.trustinsights.ai/blog/2019/08/ 5-ways-your-ai-projects-fail-part-3-data- related-ai-failures/ http://www.ciokorea.com/news/127988 https://medium.com/@kanaugust/ai

Then, how can we resolve the Label Noise problem ?

Trained Model = Method( Data, Model Structure)

Approach 1: Model Structure v Label Noise Robust Model

Heavy Model == Heavy Serving + Training Cost Global warming
is coming

Approach 2-1:Curriculum Learning [Y. Bengio: 2009]

Approach 2-2: MentorNet [Lu Jiang, et al.:2018],[Bo Han, et al.:
2019]

Heavy training cost + additional data Global warming is coming

Approach 3: Data Cleaning Method

Active Learning for Label Cleaning [Konyushkova’2017],[Yoo’2019] Noisy Label Data ReLabeled
Data Noise Label Filtering Model Inference ReLabeling By Human

How can we remove Label Noise without human assistance?

Our Plan Noisy Label Data ReLabeled Data Model Inference Relabeling
By Model

Today’s Talk How does Line correct the Label noise ?
- Label-correcting AutoML: PICO - Application to FAQ/Chat dataset

Label-correcting AutoML: PICO

Idea • Problem: A white duck is mislabeled. Can you
look around and fix it? Labeled as the Same Class yellow! yellow! yellow! yellow! yellow!

Idea • Problem: A white duck is mislabeled. Can you
look around and fix it? The Mislabeled (Label Noise) The Well- labeled Labeled as the Same Class yellow! yellow! yellow! yellow! yellow!

How to let AI do the same task?

Split – Train – Check Algorithm

Split - Train - Check Algorithm Split: Train Set Valid
Set Class1-Labeled Class1- Mislabeled Class2-Labeled

Split - Train - Check Algorithm Split: • Train set:
Ref. dataset for correction • Valid set: Checking targets (detaching label from the targets) Train Set Valid Set

Split - Train - Check Algorithm Train: • Training the
checker model + Train Set Training “Checker”

Split - Train - Check Algorithm Check: • Checking the
label of the valid set Valid Set + Trained Checker

Split - Train - Check Algorithm • Mislabeling Correction! Mislabeled
Corrected Yellow? White!

Split - Train - Check Algorithm Split Orig. Dataset +
Check Train Label Update

The current method cannot check entire data set

Split - Train - Check Algorithm Split Orig. Dataset +
Check Train Label Update

Is there anyway to completely check the entire data ?

MultiSplit – Train – Check - Vote

MultiSplit: Preparing a variety version of the split branches Vote:
Collecting the results from each “Split-Train-Check” for label update

MultiSplit - Train – Check -Vote Orig. Dataset

MultiSplit - Train – Check -Vote MultiSplit Orig. Dataset k
branches

MultiSplit - Train – Check -Vote + MultiSplit Orig. Dataset
Train k branches

MultiSplit - Train – Check -Vote MultiSplit Orig. Dataset +
+ Train k branches

MultiSplit - Train – Check -Vote Train MultiSplit Orig. Dataset
+ + + k branches

+ + + + … k branches k checkers

+ + + + … Check … k branches k checkers Single segment is checked n time

+ + + + … Check … Vote & update k branches k checkers Single segment is checked n time

MultiSplit - Train – Check -Vote Vote & update HOW?

How to Vote ?

MultiSplit - Train – Check -Vote Majority Vote:

MultiSplit - Train – Check -Vote Algo Majority Vote: •
The most naïve method • white: yellow = 3 : 0 + + + + + You are white Sure, you are white You m ust be white The checking target

Utilize that the checker results are classification-score (probability)

PICO: Probabilistic Iterative COrrection + + + + + W:0.7,
Y: 0.3 흰 :0.85, 노 : 0.15 흰 :0.62, 노 : 0.38

PICO’s Three Keys Bayesian Combining of the branch results Thomas
Bayes 1701-1761 Hidden Markov Modeling of labeling history over iterations Andrey A. Markov 1856–1922 Iterative Probabilistic correction for labeling update Robert G. Gallager 1931-present

0) PICO Vote Starts! + + + + … Trained
k Checkers

1) Soft-Voting of Branch Results + + + + …
Likelihood 1 Likelihood 2 Joint Likelihood p(q |C t+1 ) p(q |C t+1 ) ∝ check k (q) k∈ne(q) ∏ Take n related check results from k q Data

4) Label Update Posterior p(C t+1 | q) Update Label
+ + + + … C ! t+1 = argmax p(C t+1 | q) Data p(C t+1 |C t ) ≈ 1 N p(C t+1 |C t ,q) ∀q ∑ Update Class trans. prob.

5) Passing Probability Map to Next Iter. Posterior + +
+ + … Retraining k Checkers Based on Updated Labels C ! t+1 Data PICO is iterative! Update Label Update Class trans prob.

0) Next Iteration Starts! Posterior + + + + …
PICO is iterative! Trained k Checkers for next iteration Retraining k Checkers Based on Updated Labels Data Update Label Update Class trans prob.

Gradually remove label noise through iterative probabilistic voting !

All in One!

PICO Architecture Branch 0 Data Splitter Train Split-Train-Check branches Branch
1 branchN Train Train … The original dataset Relabeled dataset PICO is iterative! Checker Checker Checker PICO

PICO Implementation Branch 0 Data Splitter Train Split-Train-Check branches Branch
1 Branch N Train Train … The original dataset Updated dataset PICO is iterative! Checker Checker Checker PICO

Our Plan Noisy Label Data ReLabeled Data Model Inference Relabeling
By Model

Our Plan became PICO! Noisy Label Data ReLabeled Data Split-Train-Check
branches PICO

Application to FAQ/Chat dataset!

FAQ Chatbot • Providing qualified service by limiting conversation domain
• Query-Intent Classification methods showing good performance Neural Intent Classifier LINEのパスワドがわからないのでどうしたらいいですか？ Word embedding Sentence embedding OR Embedding Input query パスワド LINEミュジックラインスタンプラインペイ

FAQ chatbot service wrt. LINE use • Appx 90000 data
/ 1000 intent classes • Checker model: classifier-BERT • Characteristic LINE KantanHelp FAQ ü Train “classifier” only for the checkers in PICO ü Imbalance class problem ü Query-intent mapping ambiguity problem ü Some noisy queries

LINE KantanHelp FAQ PICO-based Preprocessing 74.45% 2.97% gain Baseline 71.48%
Rule-based preprocessing 72.43% Top-1 accuracy: Classifier-BERT, for single model ü 5000 validation set 2.02% 0.95%

Take Home Message!

Small Mislabeling is fine! • AI projects without data strategy
are difficult to succeed. • AI data automatic cleaning pipeline is very competitive • CLOVA/LINE Chatbot Builder improves service quality by automatically purifying data through PICO • PICO architecture can be applied to other data sets

Human-free data cleaning can improve your AI performance

Thank you !

References • [Bo Han, et al.: 2019] Han, Bo et
al.,“Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in NIPS, pp. 8536-8546, 2018. • [Konyushkova’2017] K. Konyushkova, et al., “Learning active learning from real and synthetic data,” in NIPS, 2017. • [Lu Jiang, et al.:2018] Jiang, Lu et al., “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels.” in ICML, pp. 2309-2318, 2018. • [Y. Bengio: 2009] Bengio, Yoshua et al, “Curriculum learning,” in ICML, pp. 41–48, 2009 • [Yoo’2019] D. Yoo et al., “Learning loss for Active learning,” in CVPR, 2019

Small Mislabeling is fine: How LINE catches lab...

Small Mislabeling is fine: How LINE catches label noise

More Decks by LINE DevDay 2019

Other Decks in Technology

Featured

Transcript