iMet 7th place solution & my approach to image data competition

iMet 7th place solution & My approach to image data
competiton Kaggle Tokyo Meetup #6 @phalanx 13 July, 2019

Self Introduction • Name: phalanx • Data Scientist at DeNA(17
July~) • Machine Learing: 1.5 year • Kaggle: 1 year • Kaggle Master • TGS 1st place • iMet 7th place • Petfinder 17th place • HPA 36th place @ZFPhalanx

Agenda • iMet 7th place solution • Overview • Our
solution • 1st place solution • Summary • My approach to image data competition • Machine resource • pipeline • Approach • Day 1 • Day 2 ~ 2 month • 2 month ~ last day

iMet 7th place solution

Competition Overview • Competition Name: iMet Collection 2019 – FGVC6
• Data: artwork from Metropolitan Museum • Task: Multi Label Image Classification • Metric: mean f2 score (calculated by each sample) • 2 stage competition • Kernels-only competition

Sample images Culture: american Tag: generals, men, historical figures Tag:
landscapes Culture: Italian, padua Tag: lamps,male nudes, Trumpets

Result • Stage 1: 10th place • Stage 2: 7th
place

Data Description • Data: artwork from Metropolitan Museum • Data
size • Train data:109,237 • 1st stage test data: 7,500 • 2nd stage test data: 38,801 • Target: culture, tag • Culture: 398 classes • Tag: 705 classes • Annotation • One annotator • Adviser from the Met Culture: american Tag: generals, men, historical figures

Challenges • Label noise • Class imbalance • Long image

Label noise • Many images lack annotations • Over 100,000
images, 1103 labels • Only one annotator, no verification step Ground truth Tag: landscapes Probably truth Tag: landscapes, Trees, bodies of water, houses

Class imbalance Chenyang Zhang, et al., The iMet Collection 2019
Challenge Dataset, arXiv preprint arXiv:1906.00901, 2019

Long image 7531x300 Resize 320x320 Culture: Italian Tag: utilitarian objects
Culture: Italian? Tag: utilitarian objects?

Solution

Solution 1st stage: train models

1st stage: train models • Train multiple models • SE-ResNeXt101,
resnet34, PNasNet-5, etc... • Different model have different capability in different labels • Ensemble is effective • Different image size • 224x224, 320x320, 336x336, 350x350, 416x416 • Optimizer: Adam • Augmentation • RandomResizedCrop, Horizontal Flip, Random Erasing

1st stage: train models • Separate label into culture and
tag • feature is different between culture and tag • Apply this method for SE-ResNeXt101/50(considering inference time) • Improve LB +0.01

1st stage: train models • Visualize attention in culture/tag model(Grad-CAM)
tag model culture model Input image Feature: american Feature: men Culture: american Tag: generals, men, historical figures

Input image tag model culture model Feature: ships Feature: british Culture: british Tag: bodies of water, ships

tag model culture model Input image Feature: japan Feature: men Culture: japan Tag: inscriptions, men

Knowledge Distillation (didn’t work) • Purpose: decrease inference time, improve
performance • Attention transfer (Activation-Based method) • Teacher: SE-ResNeXt101 • Student: resnet34 Sergey Zagoruyko, Nikos Komadakis, Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, In ICCV, 2017

Class imbalance • Loss function • Focal Loss + lovasz-hinge
• Bit better than binary cross entropy • Drop labels with frequency less than 20(didn’t work) • Metric is calculated by each samples • These classes don’t affect score • Not improve, score is same

Long image • I didn’t deal with it • CV
is almost the same as norma images data CV Not long images 0.6462 Only long images 0.6456 All images 0.6460 Culture: Italian? Tag: utilitarian objects? Resize

2nd stage: stacking with CNN 2nd stage: stacking with CNN

2nd stage: Stacking with CNN • CNN • Kernel_size: (h,
w) = (3, 1) • Learn correlation between model • Dense • Lean correlation between class Lean correlation Between model Lean correlation Between class

Stacking with CNN • Separate label into culture/tag • Learn
correlation between class in culture/tag • Learn CNN with all label • Lean correlation between culture and tag

Stacking with CNN • Stacking result(before team merge) • After
team merge • add 2 predictions: Public LB 0.664, Privat LB 0.659(7th place) method Public LB Weighted average 0.649 Stacking: separate culture/tag ① 0.654 Stacking: all label ② 0.647 Average ①+② 0.658

1st place solution Konstantin Gavrilchik Kaggle Master Airbus Ship: 1st
IMaterialist: 3rd • Model • SENet154, PNasNet-5, SEResNeXt101 • 1st stage: training the zoo • Focal loss, batch size: 1000-1500(accumulation 10-20 times) • 2nd stage: filtering prediction • Drop image from train with very high error • Hard negative mining(5%) • 3rd stage: pseudo labeling • 4th stage: culture and tags separately • Train model with pretrained weight for only tag class(705) • Switch focal loss -> bce

1st place solution Konstantin Gavrilchik Kaggle Master Airbus Ship: 1st
IMaterialist: 3rd • Model • SENet154, PNasNet-5, SEResNeXt101 • 5th stage: second level model • Create binary classification dataset (len(data)*1103) • Predict each class relates to each image • Train lightGBM with below features • Probabilities of each models, sum/division/multiplication of each pair/triple • mean/median/std/max/min of each model • Brightness/colorness of each image • Max side size, binary flag(height more than width or no) • ImagenetNet predictions • Postprocessing: different threshold for culture and tag

Summary • iMet Collection 2019 – FGVC6 • Multi task
image classification • Our method • Train multiple models • Stacking with CNN • Possible improvements • Pseudo labeling • Hard example mining

My approach to image data competition

Machine resource • GPU: 1080ti*3 • RAM: 64GB • Kaggle
Kernel(P100*1, 4 instance)

pipeline

First day • Read ‘Rules’, ‘Overview’, ‘Evaluation’, ‘Data’, ‘Timeline’ •
EDA:(70%) • mean, std, min, max per channel in RGB, HSV(train+test) • mean, std, min, max of height, width(train+test) • Extract train targets • Check the images visually • Create baseline(30%) • Model: small model(resnet18, 34) -> huge model(resnet101, 152) • Optimizer: adam(lr:3e-3~1e-4) • Image_size: (mean/2, mean/2), (mean, mean), (mean*2, mean*2) • Augmentation: no augmentation

After training baseline model • Check your model prediction, gap
between CV and LB • Visualize attention • It is helpful for modeling

2nd day ~ 2month • Journal survey(30%) • Famous paper
• Paper related to task domain • EDA(30%) • Extract data considering training result • Coding(30%) • Create/fix your pipeline • Implement journal • Fix baseline model(augmentation, modeling, etc...) • Others(10%) • Survey related competition • Read ‘kernel’, ‘discussion’

2nd day ~ 2month • Use time efficiently

2month ~ last day • Journal survey(30%10%) • Famous paper
• Paper related to task domain • EDA(30%) • Extract data considering training result • Coding(30%50%) • Create/fix your pipeline • Implement journal • Fix model(augmentation, modeling, etc...) • Prepare model ensemble

Thank you! Questions?

iMet 7th place solution & my approach to image ...

iMet 7th place solution & my approach to image data competition

More Decks by phalanx

Featured

Transcript