iMet 7th place solution & my approach to image data competition

Slide 1

Slide 1 text

iMet 7th place solution & My approach to image data competiton Kaggle Tokyo Meetup #6 @phalanx 13 July, 2019

Slide 2

Slide 2 text

Self Introduction • Name: phalanx • Data Scientist at DeNA(17 July~) • Machine Learing: 1.5 year • Kaggle: 1 year • Kaggle Master • TGS 1st place • iMet 7th place • Petfinder 17th place • HPA 36th place @ZFPhalanx

Slide 3

Slide 3 text

Agenda • iMet 7th place solution • Overview • Our solution • 1st place solution • Summary • My approach to image data competition • Machine resource • pipeline • Approach • Day 1 • Day 2 ~ 2 month • 2 month ~ last day

Slide 4

Slide 4 text

iMet 7th place solution

Slide 5

Slide 5 text

Competition Overview • Competition Name: iMet Collection 2019 – FGVC6 • Data: artwork from Metropolitan Museum • Task: Multi Label Image Classification • Metric: mean f2 score (calculated by each sample) • 2 stage competition • Kernels-only competition

Slide 6

Slide 6 text

Sample images Culture: american Tag: generals, men, historical figures Tag: landscapes Culture: Italian, padua Tag: lamps,male nudes, Trumpets

Slide 7

Slide 7 text

Result • Stage 1: 10th place • Stage 2: 7th place

Slide 8

Slide 8 text

Data Description • Data: artwork from Metropolitan Museum • Data size • Train data:109,237 • 1st stage test data: 7,500 • 2nd stage test data: 38,801 • Target: culture, tag • Culture: 398 classes • Tag: 705 classes • Annotation • One annotator • Adviser from the Met Culture: american Tag: generals, men, historical figures

Slide 9

Slide 9 text

Challenges • Label noise • Class imbalance • Long image

Slide 10

Slide 10 text

Label noise • Many images lack annotations • Over 100,000 images, 1103 labels • Only one annotator, no verification step Ground truth Tag: landscapes Probably truth Tag: landscapes, Trees, bodies of water, houses

Slide 11

Slide 11 text

Class imbalance Chenyang Zhang, et al., The iMet Collection 2019 Challenge Dataset, arXiv preprint arXiv:1906.00901, 2019

Slide 12

Slide 12 text

Long image 7531x300 Resize 320x320 Culture: Italian Tag: utilitarian objects Culture: Italian? Tag: utilitarian objects?

Slide 13

Slide 13 text

Solution

Slide 14

Slide 14 text

Solution 1st stage: train models

Slide 15

Slide 15 text

1st stage: train models • Train multiple models • SE-ResNeXt101, resnet34, PNasNet-5, etc... • Different model have different capability in different labels • Ensemble is effective • Different image size • 224x224, 320x320, 336x336, 350x350, 416x416 • Optimizer: Adam • Augmentation • RandomResizedCrop, Horizontal Flip, Random Erasing

Slide 16

Slide 16 text

1st stage: train models • Separate label into culture and tag • feature is different between culture and tag • Apply this method for SE-ResNeXt101/50(considering inference time) • Improve LB +0.01

Slide 17

Slide 17 text

1st stage: train models • Visualize attention in culture/tag model(Grad-CAM) tag model culture model Input image Feature: american Feature: men Culture: american Tag: generals, men, historical figures

Slide 18

Slide 18 text

1st stage: train models • Visualize attention in culture/tag model(Grad-CAM) Input image tag model culture model Feature: ships Feature: british Culture: british Tag: bodies of water, ships

Slide 19

Slide 19 text

1st stage: train models • Visualize attention in culture/tag model(Grad-CAM) tag model culture model Input image Feature: japan Feature: men Culture: japan Tag: inscriptions, men

Slide 20

Slide 20 text

Knowledge Distillation (didn’t work) • Purpose: decrease inference time, improve performance • Attention transfer (Activation-Based method) • Teacher: SE-ResNeXt101 • Student: resnet34 Sergey Zagoruyko, Nikos Komadakis, Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, In ICCV, 2017

Slide 21

Slide 21 text

Class imbalance • Loss function • Focal Loss + lovasz-hinge • Bit better than binary cross entropy • Drop labels with frequency less than 20(didn’t work) • Metric is calculated by each samples • These classes don’t affect score • Not improve, score is same

Slide 22

Slide 22 text

Long image • I didn’t deal with it • CV is almost the same as norma images data CV Not long images 0.6462 Only long images 0.6456 All images 0.6460 Culture: Italian? Tag: utilitarian objects? Resize

Slide 23

Slide 23 text

2nd stage: stacking with CNN 2nd stage: stacking with CNN

Slide 24

Slide 24 text

2nd stage: Stacking with CNN • CNN • Kernel_size: (h, w) = (3, 1) • Learn correlation between model • Dense • Lean correlation between class Lean correlation Between model Lean correlation Between class

Slide 25

Slide 25 text

Stacking with CNN • Separate label into culture/tag • Learn correlation between class in culture/tag • Learn CNN with all label • Lean correlation between culture and tag

Slide 26

Slide 26 text

Stacking with CNN • Stacking result(before team merge) • After team merge • add 2 predictions: Public LB 0.664, Privat LB 0.659(7th place) method Public LB Weighted average 0.649 Stacking: separate culture/tag ① 0.654 Stacking: all label ② 0.647 Average ①+② 0.658

Slide 27

Slide 27 text

1st place solution Konstantin Gavrilchik Kaggle Master Airbus Ship: 1st IMaterialist: 3rd • Model • SENet154, PNasNet-5, SEResNeXt101 • 1st stage: training the zoo • Focal loss, batch size: 1000-1500(accumulation 10-20 times) • 2nd stage: filtering prediction • Drop image from train with very high error • Hard negative mining(5%) • 3rd stage: pseudo labeling • 4th stage: culture and tags separately • Train model with pretrained weight for only tag class(705) • Switch focal loss -> bce

Slide 28

Slide 28 text

1st place solution Konstantin Gavrilchik Kaggle Master Airbus Ship: 1st IMaterialist: 3rd • Model • SENet154, PNasNet-5, SEResNeXt101 • 5th stage: second level model • Create binary classification dataset (len(data)*1103) • Predict each class relates to each image • Train lightGBM with below features • Probabilities of each models, sum/division/multiplication of each pair/triple • mean/median/std/max/min of each model • Brightness/colorness of each image • Max side size, binary flag(height more than width or no) • ImagenetNet predictions • Postprocessing: different threshold for culture and tag

Slide 29

Slide 29 text

Summary • iMet Collection 2019 – FGVC6 • Multi task image classification • Our method • Train multiple models • Stacking with CNN • Possible improvements • Pseudo labeling • Hard example mining

Slide 30

Slide 30 text

My approach to image data competition

Slide 31

Slide 31 text

Machine resource • GPU: 1080ti*3 • RAM: 64GB • Kaggle Kernel(P100*1, 4 instance)

Slide 32

Slide 32 text

pipeline

Slide 33

Slide 33 text

First day • Read ‘Rules’, ‘Overview’, ‘Evaluation’, ‘Data’, ‘Timeline’ • EDA:(70%) • mean, std, min, max per channel in RGB, HSV(train+test) • mean, std, min, max of height, width(train+test) • Extract train targets • Check the images visually • Create baseline(30%) • Model: small model(resnet18, 34) -> huge model(resnet101, 152) • Optimizer: adam(lr:3e-3~1e-4) • Image_size: (mean/2, mean/2), (mean, mean), (mean*2, mean*2) • Augmentation: no augmentation

Slide 34

Slide 34 text

After training baseline model • Check your model prediction, gap between CV and LB • Visualize attention • It is helpful for modeling

Slide 35

Slide 35 text

2nd day ~ 2month • Journal survey(30%) • Famous paper • Paper related to task domain • EDA(30%) • Extract data considering training result • Coding(30%) • Create/fix your pipeline • Implement journal • Fix baseline model(augmentation, modeling, etc...) • Others(10%) • Survey related competition • Read ‘kernel’, ‘discussion’

Slide 36

Slide 36 text

2nd day ~ 2month • Use time efficiently

Slide 37

Slide 37 text

2month ~ last day • Journal survey(30%10%) • Famous paper • Paper related to task domain • EDA(30%) • Extract data considering training result • Coding(30%50%) • Create/fix your pipeline • Implement journal • Fix model(augmentation, modeling, etc...) • Prepare model ensemble

Slide 38

Slide 38 text

Thank you! Questions?