Slide 2
Slide 2 text
Train Images
- 200,840 images
- 168 / 11 / 7 classes
- 137 x 236 x 1
Test Images
- About 200,000 images
- 137 x 236 x 1
- 4 parquets
Copyright 2020 @ Maxwell_110
Bengali Model Pipeline
Customized SE-ResNet 50
- `NOT` pretrained
- Iterative Stratified 5 folds
- 137 x 236 x 1 Input Image Size
- Images divided by 255
- Adam
- 3 Stage Learning
1. 13 - 18 CyclicLR (Triangle, 8 epoch/cycle, 1e-3~-4), Xentropy
2. LRonP (36 epoch, 3 pat, 6 ES, 1e-5), Reduced Focal Loss
3. LRonP (36 epoch, 3 pat, 6 ES, 1e-5), Xentropy
- Batch Size: 64
- Augmenatation:
Width Shift (20%), Erosion, CutOut (holes=8, 3 types of Size),
GridMask (rotate=15deg)
- CutMixUp
CutMix (p=1/3, alpha=0.5) / MixUp (p=1/3, alpha=0.2)
Inception ResNet V2
- Pretrained
- Iterative Stratified 5 folds ( same folds as Maxwell )
- 180 x 180 x 3 Input Image Size
- Images divided by 255
- Adam
- 2 Stage Cyclic Learning (4 epoch/cycle, 5e-5 ~ 2e-3)
1. Weighted Reduced Focal Loss, 40 epoch,
weight = 1 / observation counts
2. LRonPlateau (1e-5), Xentropy
- Batch Size: 64
- Augmenatation:
Rotate (8deg), Zoom (1.2), height/width shift (15%),
CutOut (holes=20, max_h=25, max_w=40)
- MixUp
MixUp (p=1, alpha=0.4)
137
236
137
236
Prediction 1
(Maxwell)
Prediction 2
(Nejumi)
* 0.75 +
- Resize to (180, 180)
* 0.25 = Blended
Prediction
Public : 0.9871 / 78 th
Private : 0.9557 / 6 th
Submission with Post Processing
- Multiply coefficients to each predicted probability
- Coefficients for each label (186 types)
- Use NM solver to calculate optimal coeffcients
grapheme_root : [c_g_1, c_g_2, ..., c_g_168]
vowel_diacritic : [c_v_1, c_v_2, ... , c_v_11]
consonant_diacritic : [c_c_1, c_c_2, ... , c_c_7]
Apply correction coefficients
to blended predictions
Inference Limitations:
- Inference on kernel
- GPU inference time <= 2 hours
- Memory Limit <= 13 GB
Our Resources
Maxwell: TITAN RTX, Geforce 1080Ti x 2, GCP
Nejumi: TITAN RTX, Geforce 1080Ti x 2,
Vast.ai ( https://vast.ai/ )*1
*1 Nejumi is cloud addict
Prediction 1
CV(w/ pp)
: 0.9888
Public : 0.9864
Private : 0.9527 (12th)
Input (1ch)
GeM 2D
512 fc
Grapheme
168 nodes
Vowel
11 nodes
Consonant
7 nodes
SoftMax
+ +
+
Add
1280 fc 512 fc
512 fc 512 fc
Customized
SE-ResNet 50
Block
( 3 x 3 bottom kernel )
Inception ResNet V2
Block
Input (1ch)
Input (3ch)
Prediction 2
CV (w/ pp)
: 0.9845
Public : 0.9810
Private : 0.9449 (19th)
Convert to
3 ch image
Separable Conv 2D
BN
ReLu
GAP 2D
Grapheme
168 nodes
Vowel
11 nodes
Consonant
7 nodes
SoftMax
180
180
180
180