Slide 1

Slide 1 text

Bengali.AI Handwritten Grapheme Classification Explanation of Summary and Solutions @tereka114

Slide 2

Slide 2 text

Self Introduction • tereka(@tereka114) • https://www.kaggle.com/tereka • 6 years • Computer Vision • Kaggle Competitions Master • Home Credit Default Risk 2nd • IEEE's Signal Processing Society 10th

Slide 3

Slide 3 text

Goal • You will get knowledge of top solutions and key factor of winning • Learn the idea using that competition and use that idea to other competitions

Slide 4

Slide 4 text

Competition Overview • Predict components of Bengali hand writing images • Grapheme root(169 class) • Vowel diacritic(11 class) • Consonant diacritic(7 class) • Grapheme is 1295 class • But test data have unseen graphemes • Code Competition • 9hours(CPU)/2hours(GPU)

Slide 5

Slide 5 text

Metrics • Macro Average Recall

Slide 6

Slide 6 text

Sample Images https://www.kaggle.com/kaushal2896/bengali-graphemes-starter-eda-multi-output-cnn

Slide 7

Slide 7 text

Distributions Vowel diacritic Consonant diacritic 3 and 6 label is low Grapheme root https://www.kaggle.com/pestipeti/bengali-quick-eda

Slide 8

Slide 8 text

Private Leaderboard huge shake up

Slide 9

Slide 9 text

Why huge shake up? • Unseen Grapheme prediction is important. • Guess there are only few unseen public leaderboard, but private leader board have many unseen grapheme. • Itʼs key of solution.

Slide 10

Slide 10 text

Interesting points of competition • How to split Unseen/seen patterns. • Arc Face/BCE/Individual model.. • Vs Unseen Grapheme • How did top competitorʼs prediction unseen grapheme • Font generation is better?

Slide 11

Slide 11 text

Top Solution Summaries • Unseen and Seen model • Arc Face • Binary Cross Entropy • Model • Efficient Net/Se-ResNeXt • 3 target or individual 3 target(1 target per model) • Data Augmentation • FMix • Generate Character • Cycle GAN • From Font

Slide 12

Slide 12 text

5th Place Solution https://www.kaggle.com/c/bengaliai-cv19/discussion/136129 • Model • SE-ResNeXt50 + Head • Loss • Grapheme Root(CE) +Consonant Loss(MLBC) + Vowel Loss(CE) • Grapheme Loss: ArcCos + CE • Separate Consonant Components ※Multi Label Binary CrossEntropy

Slide 13

Slide 13 text

4th Place Solution https://www.kaggle.com/c/bengaliai-cv19/discussion/136982 • ArcFace Model • InceptionResNetV2, SE-ResNeXt101 • Target 1295 • Cutout • Unseen Model • 3 head for vowel and consonant, 1-head for Grapheme root • Cross Entropy • Mixup

Slide 14

Slide 14 text

3rd Place Solution https://www.kaggle.com/c/bengaliai-cv19/discussion/135982 • Unseen/Grapheme • Use ArcFace and cosine similarity • Unseen • Predict individual components • Seen • Predict three components • Model • pc-softmax • Blending/Weighted average

Slide 15

Slide 15 text

2nd Place Solution • Model • SE-ResNeXt 50 & 101 • FMix is better than cutmix • https://arxiv.org/abs/2002.12047 • Individual Prediction (3 target) • Unseen • train model from font data. • Blending https://www.kaggle.com/c/bengaliai-cv19/discussion/135966

Slide 16

Slide 16 text

FMix FMix is best mixed sample data augmentation

Slide 17

Slide 17 text

1st Place Solution • Out of Distribution Model • Predict 1295 class(grapheme) • If all class predictions are low, itʼs judged unseen class. • Seen Model • Efficientnet B7 based 14784 class (Grapheme * Consonant * Vowel) • Unseen Model • Cycle GAN(Font and Image) + Efficientnet B0(2 font models) https://www.kaggle.com/c/bengaliai-cv19/discussion/135984

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

CycleGAN Tranining https://www.kaggle.com/c/bengaliai-cv19/discussion/135984 • Training font and image • 2ttf is better than 1ttf.(ttf file is font) • Convert original image to Font image and predict Efficient net b0 • Get a narrow portion of the handwriting. • Abstract character structure. • Detail in the interview article. • https://medium.com/kaggle-blog/top-marks-for-student-kaggler-in-bengali- ai-a-winners-interview-with-linsho-kaku-dd321b324c74

Slide 20

Slide 20 text

Cycle GAN

Slide 21

Slide 21 text

What I learned? • Unseen prediction is very important in this competition. • Arc Face/Binary Cross Entropy • Character Generation can get generalized feature • Cycle GAN can get a absolute feature and impressive. • Character generation from font also can get generalization. • Mixed augmentation can get better result(like FMix)

Slide 22

Slide 22 text

Thank you for your listening