Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Bengali.AI Handwritten Grapheme Classification...

Bengali.AI Handwritten Grapheme Classification Explanation of Summary and Solutions

Bengali.AI Handwritten Grapheme ClassificationExplanation of Summary and Solutions

Kaggle: https://www.kaggle.com/c/bengaliai-cv19
Youtube: https://www.youtube.com/watch?v=5zrWsxaIRPg&feature=youtu.be

tereka114

March 15, 2022
Tweet

More Decks by tereka114

Other Decks in Programming

Transcript

  1. Self Introduction • tereka(@tereka114) • https://www.kaggle.com/tereka • 6 years •

    Computer Vision • Kaggle Competitions Master • Home Credit Default Risk 2nd • IEEE's Signal Processing Society 10th
  2. Goal • You will get knowledge of top solutions and

    key factor of winning • Learn the idea using that competition and use that idea to other competitions
  3. Competition Overview • Predict components of Bengali hand writing images

    • Grapheme root(169 class) • Vowel diacritic(11 class) • Consonant diacritic(7 class) • Grapheme is 1295 class • But test data have unseen graphemes • Code Competition • 9hours(CPU)/2hours(GPU)
  4. Distributions Vowel diacritic Consonant diacritic 3 and 6 label is

    low Grapheme root https://www.kaggle.com/pestipeti/bengali-quick-eda
  5. Why huge shake up? • Unseen Grapheme prediction is important.

    • Guess there are only few unseen public leaderboard, but private leader board have many unseen grapheme. • Itʼs key of solution.
  6. Interesting points of competition • How to split Unseen/seen patterns.

    • Arc Face/BCE/Individual model.. • Vs Unseen Grapheme • How did top competitorʼs prediction unseen grapheme • Font generation is better?
  7. Top Solution Summaries • Unseen and Seen model • Arc

    Face • Binary Cross Entropy • Model • Efficient Net/Se-ResNeXt • 3 target or individual 3 target(1 target per model) • Data Augmentation • FMix • Generate Character • Cycle GAN • From Font
  8. 5th Place Solution https://www.kaggle.com/c/bengaliai-cv19/discussion/136129 • Model • SE-ResNeXt50 + Head

    • Loss • Grapheme Root(CE) +Consonant Loss(MLBC) + Vowel Loss(CE) • Grapheme Loss: ArcCos + CE • Separate Consonant Components ※Multi Label Binary CrossEntropy
  9. 4th Place Solution https://www.kaggle.com/c/bengaliai-cv19/discussion/136982 • ArcFace Model • InceptionResNetV2, SE-ResNeXt101

    • Target 1295 • Cutout • Unseen Model • 3 head for vowel and consonant, 1-head for Grapheme root • Cross Entropy • Mixup
  10. 3rd Place Solution https://www.kaggle.com/c/bengaliai-cv19/discussion/135982 • Unseen/Grapheme • Use ArcFace and

    cosine similarity • Unseen • Predict individual components • Seen • Predict three components • Model • pc-softmax • Blending/Weighted average
  11. 2nd Place Solution • Model • SE-ResNeXt 50 & 101

    • FMix is better than cutmix • https://arxiv.org/abs/2002.12047 • Individual Prediction (3 target) • Unseen • train model from font data. • Blending https://www.kaggle.com/c/bengaliai-cv19/discussion/135966
  12. 1st Place Solution • Out of Distribution Model • Predict

    1295 class(grapheme) • If all class predictions are low, itʼs judged unseen class. • Seen Model • Efficientnet B7 based 14784 class (Grapheme * Consonant * Vowel) • Unseen Model • Cycle GAN(Font and Image) + Efficientnet B0(2 font models) https://www.kaggle.com/c/bengaliai-cv19/discussion/135984
  13. CycleGAN Tranining https://www.kaggle.com/c/bengaliai-cv19/discussion/135984 • Training font and image • 2ttf

    is better than 1ttf.(ttf file is font) • Convert original image to Font image and predict Efficient net b0 • Get a narrow portion of the handwriting. • Abstract character structure. • Detail in the interview article. • https://medium.com/kaggle-blog/top-marks-for-student-kaggler-in-bengali- ai-a-winners-interview-with-linsho-kaku-dd321b324c74
  14. What I learned? • Unseen prediction is very important in

    this competition. • Arc Face/Binary Cross Entropy • Character Generation can get generalized feature • Cycle GAN can get a absolute feature and impressive. • Character generation from font also can get generalization. • Mixed augmentation can get better result(like FMix)