iMet 7th place solution & my approach to image data competition

Db0553d2aacb394f95a0dd064d0311bf?s=47 phalanx
July 13, 2019
12k

iMet 7th place solution & my approach to image data competition

Db0553d2aacb394f95a0dd064d0311bf?s=128

phalanx

July 13, 2019
Tweet

Transcript

  1. iMet 7th place solution & My approach to image data

    competiton Kaggle Tokyo Meetup #6 @phalanx 13 July, 2019
  2. Self Introduction • Name: phalanx • Data Scientist at DeNA(17

    July~) • Machine Learing: 1.5 year • Kaggle: 1 year • Kaggle Master • TGS 1st place • iMet 7th place • Petfinder 17th place • HPA 36th place @ZFPhalanx
  3. Agenda • iMet 7th place solution • Overview • Our

    solution • 1st place solution • Summary • My approach to image data competition • Machine resource • pipeline • Approach • Day 1 • Day 2 ~ 2 month • 2 month ~ last day
  4. iMet 7th place solution

  5. Competition Overview • Competition Name: iMet Collection 2019 – FGVC6

    • Data: artwork from Metropolitan Museum • Task: Multi Label Image Classification • Metric: mean f2 score (calculated by each sample) • 2 stage competition • Kernels-only competition
  6. Sample images Culture: american Tag: generals, men, historical figures Tag:

    landscapes Culture: Italian, padua Tag: lamps,male nudes, Trumpets
  7. Result • Stage 1: 10th place • Stage 2: 7th

    place
  8. Data Description • Data: artwork from Metropolitan Museum • Data

    size • Train data:109,237 • 1st stage test data: 7,500 • 2nd stage test data: 38,801 • Target: culture, tag • Culture: 398 classes • Tag: 705 classes • Annotation • One annotator • Adviser from the Met Culture: american Tag: generals, men, historical figures
  9. Challenges • Label noise • Class imbalance • Long image

  10. Label noise • Many images lack annotations • Over 100,000

    images, 1103 labels • Only one annotator, no verification step Ground truth Tag: landscapes Probably truth Tag: landscapes, Trees, bodies of water, houses
  11. Class imbalance Chenyang Zhang, et al., The iMet Collection 2019

    Challenge Dataset, arXiv preprint arXiv:1906.00901, 2019
  12. Long image 7531x300 Resize 320x320 Culture: Italian Tag: utilitarian objects

    Culture: Italian? Tag: utilitarian objects?
  13. Solution

  14. Solution 1st stage: train models

  15. 1st stage: train models • Train multiple models • SE-ResNeXt101,

    resnet34, PNasNet-5, etc... • Different model have different capability in different labels • Ensemble is effective • Different image size • 224x224, 320x320, 336x336, 350x350, 416x416 • Optimizer: Adam • Augmentation • RandomResizedCrop, Horizontal Flip, Random Erasing
  16. 1st stage: train models • Separate label into culture and

    tag • feature is different between culture and tag • Apply this method for SE-ResNeXt101/50(considering inference time) • Improve LB +0.01
  17. 1st stage: train models • Visualize attention in culture/tag model(Grad-CAM)

    tag model culture model Input image Feature: american Feature: men Culture: american Tag: generals, men, historical figures
  18. 1st stage: train models • Visualize attention in culture/tag model(Grad-CAM)

    Input image tag model culture model Feature: ships Feature: british Culture: british Tag: bodies of water, ships
  19. 1st stage: train models • Visualize attention in culture/tag model(Grad-CAM)

    tag model culture model Input image Feature: japan Feature: men Culture: japan Tag: inscriptions, men
  20. Knowledge Distillation (didn’t work) • Purpose: decrease inference time, improve

    performance • Attention transfer (Activation-Based method) • Teacher: SE-ResNeXt101 • Student: resnet34 Sergey Zagoruyko, Nikos Komadakis, Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, In ICCV, 2017
  21. Class imbalance • Loss function • Focal Loss + lovasz-hinge

    • Bit better than binary cross entropy • Drop labels with frequency less than 20(didn’t work) • Metric is calculated by each samples • These classes don’t affect score • Not improve, score is same
  22. Long image • I didn’t deal with it • CV

    is almost the same as norma images data CV Not long images 0.6462 Only long images 0.6456 All images 0.6460 Culture: Italian? Tag: utilitarian objects? Resize
  23. 2nd stage: stacking with CNN 2nd stage: stacking with CNN

  24. 2nd stage: Stacking with CNN • CNN • Kernel_size: (h,

    w) = (3, 1) • Learn correlation between model • Dense • Lean correlation between class Lean correlation Between model Lean correlation Between class
  25. Stacking with CNN • Separate label into culture/tag • Learn

    correlation between class in culture/tag • Learn CNN with all label • Lean correlation between culture and tag
  26. Stacking with CNN • Stacking result(before team merge) • After

    team merge • add 2 predictions: Public LB 0.664, Privat LB 0.659(7th place) method Public LB Weighted average 0.649 Stacking: separate culture/tag ① 0.654 Stacking: all label ② 0.647 Average ①+② 0.658
  27. 1st place solution Konstantin Gavrilchik Kaggle Master Airbus Ship: 1st

    IMaterialist: 3rd • Model • SENet154, PNasNet-5, SEResNeXt101 • 1st stage: training the zoo • Focal loss, batch size: 1000-1500(accumulation 10-20 times) • 2nd stage: filtering prediction • Drop image from train with very high error • Hard negative mining(5%) • 3rd stage: pseudo labeling • 4th stage: culture and tags separately • Train model with pretrained weight for only tag class(705) • Switch focal loss -> bce
  28. 1st place solution Konstantin Gavrilchik Kaggle Master Airbus Ship: 1st

    IMaterialist: 3rd • Model • SENet154, PNasNet-5, SEResNeXt101 • 5th stage: second level model • Create binary classification dataset (len(data)*1103) • Predict each class relates to each image • Train lightGBM with below features • Probabilities of each models, sum/division/multiplication of each pair/triple • mean/median/std/max/min of each model • Brightness/colorness of each image • Max side size, binary flag(height more than width or no) • ImagenetNet predictions • Postprocessing: different threshold for culture and tag
  29. Summary • iMet Collection 2019 – FGVC6 • Multi task

    image classification • Our method • Train multiple models • Stacking with CNN • Possible improvements • Pseudo labeling • Hard example mining
  30. My approach to image data competition

  31. Machine resource • GPU: 1080ti*3 • RAM: 64GB • Kaggle

    Kernel(P100*1, 4 instance)
  32. pipeline

  33. First day • Read ‘Rules’, ‘Overview’, ‘Evaluation’, ‘Data’, ‘Timeline’ •

    EDA:(70%) • mean, std, min, max per channel in RGB, HSV(train+test) • mean, std, min, max of height, width(train+test) • Extract train targets • Check the images visually • Create baseline(30%) • Model: small model(resnet18, 34) -> huge model(resnet101, 152) • Optimizer: adam(lr:3e-3~1e-4) • Image_size: (mean/2, mean/2), (mean, mean), (mean*2, mean*2) • Augmentation: no augmentation
  34. After training baseline model • Check your model prediction, gap

    between CV and LB • Visualize attention • It is helpful for modeling
  35. 2nd day ~ 2month • Journal survey(30%) • Famous paper

    • Paper related to task domain • EDA(30%) • Extract data considering training result • Coding(30%) • Create/fix your pipeline • Implement journal • Fix baseline model(augmentation, modeling, etc...) • Others(10%) • Survey related competition • Read ‘kernel’, ‘discussion’
  36. 2nd day ~ 2month • Use time efficiently

  37. 2month ~ last day • Journal survey(30%10%) • Famous paper

    • Paper related to task domain • EDA(30%) • Extract data considering training result • Coding(30%50%) • Create/fix your pipeline • Implement journal • Fix model(augmentation, modeling, etc...) • Prepare model ensemble
  38. Thank you! Questions?