7 classes - 137 x 236 x 1 Test Images - About 200,000 images - 137 x 236 x 1 - 4 parquets Copyright 2020 @ Maxwell_110 Bengali Model Pipeline Customized SE-ResNet 50 - `NOT` pretrained - Iterative Stratified 5 folds - 137 x 236 x 1 Input Image Size - Images divided by 255 - Adam - 3 Stage Learning 1. 13 - 18 CyclicLR (Triangle, 8 epoch/cycle, 1e-3~-4), Xentropy 2. LRonP (36 epoch, 3 pat, 6 ES, 1e-5), Reduced Focal Loss 3. LRonP (36 epoch, 3 pat, 6 ES, 1e-5), Xentropy - Batch Size: 64 - Augmenatation: Width Shift (20%), Erosion, CutOut (holes=8, 3 types of Size), GridMask (rotate=15deg) - CutMixUp CutMix (p=1/3, alpha=0.5) / MixUp (p=1/3, alpha=0.2) Inception ResNet V2 - Pretrained - Iterative Stratified 5 folds ( same folds as Maxwell ) - 180 x 180 x 3 Input Image Size - Images divided by 255 - Adam - 2 Stage Cyclic Learning (4 epoch/cycle, 5e-5 ~ 2e-3) 1. Weighted Reduced Focal Loss, 40 epoch, weight = 1 / observation counts 2. LRonPlateau (1e-5), Xentropy - Batch Size: 64 - Augmenatation: Rotate (8deg), Zoom (1.2), height/width shift (15%), CutOut (holes=20, max_h=25, max_w=40) - MixUp MixUp (p=1, alpha=0.4) 137 236 137 236 Prediction 1 (Maxwell) Prediction 2 (Nejumi) * 0.75 + - Resize to (180, 180) * 0.25 = Blended Prediction Public : 0.9871 / 78 th Private : 0.9557 / 6 th Submission with Post Processing - Multiply coefficients to each predicted probability - Coefficients for each label (186 types) - Use NM solver to calculate optimal coeffcients grapheme_root : [c_g_1, c_g_2, ..., c_g_168] vowel_diacritic : [c_v_1, c_v_2, ... , c_v_11] consonant_diacritic : [c_c_1, c_c_2, ... , c_c_7] Apply correction coefficients to blended predictions Inference Limitations: - Inference on kernel - GPU inference time <= 2 hours - Memory Limit <= 13 GB Our Resources Maxwell: TITAN RTX, Geforce 1080Ti x 2, GCP Nejumi: TITAN RTX, Geforce 1080Ti x 2, Vast.ai ( https://vast.ai/ )*1 *1 Nejumi is cloud addict Prediction 1 CV(w/ pp) : 0.9888 Public : 0.9864 Private : 0.9527 (12th) Input (1ch) GeM 2D 512 fc Grapheme 168 nodes Vowel 11 nodes Consonant 7 nodes SoftMax + + + Add 1280 fc 512 fc 512 fc 512 fc Customized SE-ResNet 50 Block ( 3 x 3 bottom kernel ) Inception ResNet V2 Block Input (1ch) Input (3ch) Prediction 2 CV (w/ pp) : 0.9845 Public : 0.9810 Private : 0.9449 (19th) Convert to 3 ch image Separable Conv 2D BN ReLu GAP 2D Grapheme 168 nodes Vowel 11 nodes Consonant 7 nodes SoftMax 180 180 180 180