Upgrade to Pro — share decks privately, control downloads, hide ads and more …

iMet 7th place solution & my approach to image data competition

phalanx
July 13, 2019
16k

iMet 7th place solution & my approach to image data competition

phalanx

July 13, 2019
Tweet

Transcript

  1. iMet 7th place solution
    &
    My approach to image data competiton
    Kaggle Tokyo Meetup #6
    @phalanx
    13 July, 2019

    View Slide

  2. Self Introduction
    • Name: phalanx
    • Data Scientist at DeNA(17 July~)
    • Machine Learing: 1.5 year
    • Kaggle: 1 year
    • Kaggle Master
    • TGS 1st place
    • iMet 7th place
    • Petfinder 17th place
    • HPA 36th place
    @ZFPhalanx

    View Slide

  3. Agenda
    • iMet 7th place solution
    • Overview
    • Our solution
    • 1st place solution
    • Summary
    • My approach to image data competition
    • Machine resource
    • pipeline
    • Approach
    • Day 1
    • Day 2 ~ 2 month
    • 2 month ~ last day

    View Slide

  4. iMet 7th place solution

    View Slide

  5. Competition Overview
    • Competition Name: iMet Collection 2019 – FGVC6
    • Data: artwork from Metropolitan Museum
    • Task: Multi Label Image Classification
    • Metric: mean f2 score (calculated by each sample)
    • 2 stage competition
    • Kernels-only competition

    View Slide

  6. Sample images
    Culture: american
    Tag: generals, men,
    historical figures
    Tag: landscapes
    Culture: Italian, padua
    Tag: lamps,male nudes,
    Trumpets

    View Slide

  7. Result
    • Stage 1: 10th place
    • Stage 2: 7th place

    View Slide

  8. Data Description
    • Data: artwork from Metropolitan Museum
    • Data size
    • Train data:109,237
    • 1st stage test data: 7,500
    • 2nd stage test data: 38,801
    • Target: culture, tag
    • Culture: 398 classes
    • Tag: 705 classes
    • Annotation
    • One annotator
    • Adviser from the Met Culture: american
    Tag: generals, men,
    historical figures

    View Slide

  9. Challenges
    • Label noise
    • Class imbalance
    • Long image

    View Slide

  10. Label noise
    • Many images lack annotations
    • Over 100,000 images, 1103 labels
    • Only one annotator, no verification step
    Ground truth
    Tag: landscapes
    Probably truth
    Tag: landscapes,
    Trees, bodies of water,
    houses

    View Slide

  11. Class imbalance
    Chenyang Zhang, et al., The iMet Collection 2019 Challenge Dataset, arXiv preprint arXiv:1906.00901, 2019

    View Slide

  12. Long image
    7531x300
    Resize 320x320
    Culture: Italian
    Tag: utilitarian objects
    Culture: Italian?
    Tag: utilitarian objects?

    View Slide

  13. Solution

    View Slide

  14. Solution
    1st stage: train models

    View Slide

  15. 1st stage: train models
    • Train multiple models
    • SE-ResNeXt101, resnet34, PNasNet-5, etc...
    • Different model have different capability in different labels
    • Ensemble is effective
    • Different image size
    • 224x224, 320x320, 336x336, 350x350, 416x416
    • Optimizer: Adam
    • Augmentation
    • RandomResizedCrop, Horizontal Flip, Random Erasing

    View Slide

  16. 1st stage: train models
    • Separate label into culture and tag
    • feature is different between culture and tag
    • Apply this method for SE-ResNeXt101/50(considering inference time)
    • Improve LB +0.01

    View Slide

  17. 1st stage: train models
    • Visualize attention in culture/tag model(Grad-CAM)
    tag model culture model
    Input image
    Feature: american
    Feature: men Culture: american
    Tag: generals, men,
    historical figures

    View Slide

  18. 1st stage: train models
    • Visualize attention in culture/tag model(Grad-CAM)
    Input image
    tag model culture model
    Feature: ships Feature: british
    Culture: british
    Tag: bodies of water, ships

    View Slide

  19. 1st stage: train models
    • Visualize attention in culture/tag model(Grad-CAM)
    tag model culture model
    Input image
    Feature: japan
    Feature: men Culture: japan
    Tag: inscriptions, men

    View Slide

  20. Knowledge Distillation (didn’t work)
    • Purpose: decrease inference time, improve performance
    • Attention transfer (Activation-Based method)
    • Teacher: SE-ResNeXt101
    • Student: resnet34
    Sergey Zagoruyko, Nikos Komadakis, Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer,
    In ICCV, 2017

    View Slide

  21. Class imbalance
    • Loss function
    • Focal Loss + lovasz-hinge
    • Bit better than binary cross entropy
    • Drop labels with frequency less than 20(didn’t work)
    • Metric is calculated by each samples
    • These classes don’t affect score
    • Not improve, score is same

    View Slide

  22. Long image
    • I didn’t deal with it
    • CV is almost the same as norma images
    data CV
    Not long images 0.6462
    Only long images 0.6456
    All images 0.6460
    Culture: Italian?
    Tag: utilitarian objects?
    Resize

    View Slide

  23. 2nd stage: stacking with CNN
    2nd stage: stacking with CNN

    View Slide

  24. 2nd stage: Stacking with CNN
    • CNN
    • Kernel_size: (h, w) = (3, 1)
    • Learn correlation between model
    • Dense
    • Lean correlation between class
    Lean correlation
    Between model
    Lean correlation
    Between class

    View Slide

  25. Stacking with CNN
    • Separate label into culture/tag
    • Learn correlation between class in culture/tag
    • Learn CNN with all label
    • Lean correlation between culture and tag

    View Slide

  26. Stacking with CNN
    • Stacking result(before team merge)
    • After team merge
    • add 2 predictions: Public LB 0.664, Privat LB 0.659(7th place)
    method Public LB
    Weighted average 0.649
    Stacking: separate culture/tag ① 0.654
    Stacking: all label ② 0.647
    Average ①+② 0.658

    View Slide

  27. 1st place solution
    Konstantin Gavrilchik
    Kaggle Master
    Airbus Ship: 1st
    IMaterialist: 3rd
    • Model
    • SENet154, PNasNet-5, SEResNeXt101
    • 1st stage: training the zoo
    • Focal loss, batch size: 1000-1500(accumulation 10-20 times)
    • 2nd stage: filtering prediction
    • Drop image from train with very high error
    • Hard negative mining(5%)
    • 3rd stage: pseudo labeling
    • 4th stage: culture and tags separately
    • Train model with pretrained weight for only tag class(705)
    • Switch focal loss -> bce

    View Slide

  28. 1st place solution
    Konstantin Gavrilchik
    Kaggle Master
    Airbus Ship: 1st
    IMaterialist: 3rd
    • Model
    • SENet154, PNasNet-5, SEResNeXt101
    • 5th stage: second level model
    • Create binary classification dataset (len(data)*1103)
    • Predict each class relates to each image
    • Train lightGBM with below features
    • Probabilities of each models, sum/division/multiplication of each pair/triple
    • mean/median/std/max/min of each model
    • Brightness/colorness of each image
    • Max side size, binary flag(height more than width or no)
    • ImagenetNet predictions
    • Postprocessing: different threshold for culture and tag

    View Slide

  29. Summary
    • iMet Collection 2019 – FGVC6
    • Multi task image classification
    • Our method
    • Train multiple models
    • Stacking with CNN
    • Possible improvements
    • Pseudo labeling
    • Hard example mining

    View Slide

  30. My approach
    to image data competition

    View Slide

  31. Machine resource
    • GPU: 1080ti*3
    • RAM: 64GB
    • Kaggle Kernel(P100*1, 4 instance)

    View Slide

  32. pipeline

    View Slide

  33. First day
    • Read ‘Rules’, ‘Overview’, ‘Evaluation’, ‘Data’, ‘Timeline’
    • EDA:(70%)
    • mean, std, min, max per channel in RGB, HSV(train+test)
    • mean, std, min, max of height, width(train+test)
    • Extract train targets
    • Check the images visually
    • Create baseline(30%)
    • Model: small model(resnet18, 34) -> huge model(resnet101, 152)
    • Optimizer: adam(lr:3e-3~1e-4)
    • Image_size: (mean/2, mean/2), (mean, mean), (mean*2, mean*2)
    • Augmentation: no augmentation

    View Slide

  34. After training baseline model
    • Check your model prediction, gap between CV and LB
    • Visualize attention
    • It is helpful for modeling

    View Slide

  35. 2nd day ~ 2month
    • Journal survey(30%)
    • Famous paper
    • Paper related to task domain
    • EDA(30%)
    • Extract data considering training result
    • Coding(30%)
    • Create/fix your pipeline
    • Implement journal
    • Fix baseline model(augmentation, modeling, etc...)
    • Others(10%)
    • Survey related competition
    • Read ‘kernel’, ‘discussion’

    View Slide

  36. 2nd day ~ 2month
    • Use time efficiently

    View Slide

  37. 2month ~ last day
    • Journal survey(30%10%)
    • Famous paper
    • Paper related to task domain
    • EDA(30%)
    • Extract data considering training result
    • Coding(30%50%)
    • Create/fix your pipeline
    • Implement journal
    • Fix model(augmentation, modeling, etc...)
    • Prepare model ensemble

    View Slide

  38. Thank you!
    Questions?

    View Slide