Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kaggle Bengali.AI 6 th place solution

Maxwell
March 17, 2020

Kaggle Bengali.AI 6 th place solution

Higher resolution model pipeline picture

Maxwell

March 17, 2020
Tweet

More Decks by Maxwell

Other Decks in Science

Transcript

  1. Bengali.AI
    Handwritten Grapheme Classification
    ইঁদুর এবং ভালুক

    View Slide

  2. Train Images
    - 200,840 images
    - 168 / 11 / 7 classes
    - 137 x 236 x 1
    Test Images
    - About 200,000 images
    - 137 x 236 x 1
    - 4 parquets
    Copyright 2020 @ Maxwell_110
    Bengali Model Pipeline
    Customized SE-ResNet 50
    - `NOT` pretrained
    - Iterative Stratified 5 folds
    - 137 x 236 x 1 Input Image Size
    - Images divided by 255
    - Adam
    - 3 Stage Learning
    1. 13 - 18 CyclicLR (Triangle, 8 epoch/cycle, 1e-3~-4), Xentropy
    2. LRonP (36 epoch, 3 pat, 6 ES, 1e-5), Reduced Focal Loss
    3. LRonP (36 epoch, 3 pat, 6 ES, 1e-5), Xentropy
    - Batch Size: 64
    - Augmenatation:
    Width Shift (20%), Erosion, CutOut (holes=8, 3 types of Size),
    GridMask (rotate=15deg)
    - CutMixUp
    CutMix (p=1/3, alpha=0.5) / MixUp (p=1/3, alpha=0.2)
    Inception ResNet V2
    - Pretrained
    - Iterative Stratified 5 folds ( same folds as Maxwell )
    - 180 x 180 x 3 Input Image Size
    - Images divided by 255
    - Adam
    - 2 Stage Cyclic Learning (4 epoch/cycle, 5e-5 ~ 2e-3)
    1. Weighted Reduced Focal Loss, 40 epoch,
    weight = 1 / observation counts
    2. LRonPlateau (1e-5), Xentropy
    - Batch Size: 64
    - Augmenatation:
    Rotate (8deg), Zoom (1.2), height/width shift (15%),
    CutOut (holes=20, max_h=25, max_w=40)
    - MixUp
    MixUp (p=1, alpha=0.4)
    137
    236
    137
    236
    Prediction 1
    (Maxwell)
    Prediction 2
    (Nejumi)
    * 0.75 +
    - Resize to (180, 180)
    * 0.25 = Blended
    Prediction
    Public : 0.9871 / 78 th
    Private : 0.9557 / 6 th
    Submission with Post Processing
    - Multiply coefficients to each predicted probability
    - Coefficients for each label (186 types)
    - Use NM solver to calculate optimal coeffcients
    grapheme_root : [c_g_1, c_g_2, ..., c_g_168]
    vowel_diacritic : [c_v_1, c_v_2, ... , c_v_11]
    consonant_diacritic : [c_c_1, c_c_2, ... , c_c_7]
    Apply correction coefficients
    to blended predictions
    Inference Limitations:
    - Inference on kernel
    - GPU inference time <= 2 hours
    - Memory Limit <= 13 GB
    Our Resources
    Maxwell: TITAN RTX, Geforce 1080Ti x 2, GCP
    Nejumi: TITAN RTX, Geforce 1080Ti x 2,
    Vast.ai ( https://vast.ai/ )*1
    *1 Nejumi is cloud addict
    Prediction 1
    CV(w/ pp)
    : 0.9888
    Public : 0.9864
    Private : 0.9527 (12th)
    Input (1ch)
    GeM 2D
    512 fc
    Grapheme
    168 nodes
    Vowel
    11 nodes
    Consonant
    7 nodes
    SoftMax
    + +
    +
    Add
    1280 fc 512 fc
    512 fc 512 fc
    Customized
    SE-ResNet 50
    Block
    ( 3 x 3 bottom kernel )
    Inception ResNet V2
    Block
    Input (1ch)
    Input (3ch)
    Prediction 2
    CV (w/ pp)
    : 0.9845
    Public : 0.9810
    Private : 0.9449 (19th)
    Convert to
    3 ch image
    Separable Conv 2D
    BN
    ReLu
    GAP 2D
    Grapheme
    168 nodes
    Vowel
    11 nodes
    Consonant
    7 nodes
    SoftMax
    180
    180
    180
    180

    View Slide

  3. View Slide