Aptos2019 48th Solution

@icebee__ @mocobt UI4PMVUJPOίϯϖͷ;Γ͔͑Γ ࣗ࡞ύΠϓϥΠϯͱ͋Ε͜Ε "1504#MJOEOFTT%FUFDUJPO

ࣗݾ঺հ @icebee__ •ۚ༥ܥͷݚڀһ •ࠓճͷߩݙ͸ɼύΠϓϥΠϯͷ࡞੒ʹΑΔ࣮ݧͷޮ཰Խ •CGͷR&D •ࠓճͷߩݙ͸ɼจݙௐࠪ΍EDAʹΑΔﬁnding @mocobt Kaggleʹ͸νʔϜݻఆͰࢀՃ͍ͯ͠·͢ʂʂ ࢀߟ: https://mocobt.hatenablog.com/entry/2019/07/14/013922

1ϖʔδͰΘ͔Δ APTOS 2019 Input: ؟ఈը૾ Output: ओ؍ධՁ஋ ݈߁ ةݥ 0
1 2 3 4 ౶೘ප໢ບ঱ͷॏ঱౓෼ྨίϯϖςΟγϣϯ ࢀߟ: https://www.kaggle.com/c/aptos2019-blindness-detection https://en.wikipedia.org/wiki/Ophthalmoscopy

APTOSͷ݁Ռ …ͳΜͰνʔϜ໊͕ʮΈ͔Μʯͳͷʁ Private 4IBLF͠ͱΔʙ 4IBLF͠ͱΔʙ (Top 2%) Public ͠Ί͖Γ౰೔ʹΑ͏΍͘ (Top
8%)

Έ͔Μͷ༝དྷ ※ը૾͸ΠϝʔδͰ͢ɽҎԼɼϓϥΠόγʔΛߟྀͯ͠ɼΈ͔Μͷը૾Λ༻͍·͢ɽ ؟ఈը૾͕Έ͔Μʹݟ͖͑ͯͨɽ ඒຯͦ͠͏… Έ͔Μ৯΂͍ͨ… αΠί͔ͳʁ (໧ͬͯνʔϜ໊ʹઃఆ)

Agenda ࣮ݧޮ཰Խ Our Solution

લ൒: Our Solution ࣮ݧޮ཰Խ Our Solution Our Solution

Problems & Approaches ෼ྨͰղ͔͘ʁճؼͰղ͍ͨޙʹᮢ஋ॲཧ͢Δ͔ʁ ճؼͰղ͍ͨޙʹɼᮢ஋࠷దԽ Ξϯαϯϒϧͷ΍Γํ͕Ṗ ճؼϞσϧͷग़ྗฏۉ + ᮢ஋΋ฏۉԽ ϥϕϧ͕͋Γ͑Μ΄Ͳෆۉߧ
ಉλεΫ޲͚σʔλͰਫ૿͠ ը૾αΠζ΍ً౓͕όϥόϥ Original CropͰରԠ Our Solution

ճؼͰղ͍ͨޙʹɼᮢ஋࠷దԽ ݁Ռతʹɼճؼ+ᮢ஋ॲཧ͕ڧ͔ͬͨɽ ᮢ஋ͷ࠷దԽख๏: https://www.kaggle.com/abhishek/optimizer-for-quadratic-weighted-kappa ෼ྨ (Classiﬁcation) ճؼ (Regression) + ᮢ஋ॲཧ
Our Solution: ෼ྨͰղ͔͘ʁճؼͰղ͍ͨޙʹᮢ஋ॲཧ͢Δ͔ʁ Class ֬཰ 0 1 2 3 4 0.1 0.4 0.3 0.15 0.05 Class 1 0.0 0 1 2 3 Value Class 0.8 1.2 2.6 3.2 4.0 ग़ྗ: 1.4 Class 2 4

ճؼϞσϧͷग़ྗฏۉ + ᮢ஋΋ฏۉԽ Our Solution: Ξϯαϯϒϧͷ΍Γํ͕Ṗ ֤ϞσϧຖʹInference + શϞσϧͷฏۉᮢ஋Ͱ෼ྨ ộ
Ϟσϧ1ͷ༧ଌ஋ Ϟσϧ2ͷ༧ଌ஋ Ϟσϧnͷ༧ଌ஋ 1.54 2.16 1.87 }ฏۉ༧ଌ஋ 1.93 ộ ᮢ஋1 ᮢ஋2 ᮢ஋3 ᮢ஋4 ộ ộ ộ Ϟσϧ1ͷᮢ஋ 0.90 1.62 2.64 3.31 Ϟσϧ2ͷᮢ஋ 0.82 1.53 2.36 3.39 ộ Ϟσϧnͷᮢ஋ 0.76 1.77 2.75 3.46 ฏۉᮢ஋ 0.85 1.89 2.55 3.33 Class 2 ֤ϞσϧͰClassΛ֬ఆ͔ͯ͠ΒฏۉऔΔλΠϓͷΞϯαϯϒϧ͸৳ͼͳ͔ͬͨ

Traditional EDA of APTOS Data • Class 0 ͕൒਺Ҏ্ •
Class 4 ͕ѹ౗తʹগͳ͍ Our Solution: ϥϕϧ͕͋Γ͑Μ΄Ͳෆۉߧ

ಉλεΫ޲͚σʔλͰਫ૿͠ 2015೥։࠵ͷྨࣅίϯϖDataΛҰ෦ར༻ (ҎԼɼDRD 2015) ࢀߟ: https://www.kaggle.com/c/diabetic-retinopathy-detection Resizeࡁ: https://www.kaggle.com/benjaminwarner/resized-2015-2019-blindness-detection-images ࢖͍ํ ύλʔϯ1:
DRD 2015ͷClass 0Ҏ֎Λશͯ࢖༻ ύλʔϯ2: APTOSͷ֤Classͷׂ߹Λอͬͨ··ɼՄೳͳݶΓDRD 2015Ͱਫ૿͠ ͦͷଞTraining Dataʹର͢ΔରԠ APTOS͔Βॏෳը૾Λ͢΂ͯআڈ ࢀߟ: https://www.kaggle.com/maxwell110/duplicated-list-csv-ﬁle Maxwell͞Μ͋Γ͕ͱ͏͍͟͝·͢ Our Solution: ϥϕϧ͕͋Γ͑Μ΄Ͳෆۉߧ

Traditional EDA of APTOS Data • Class 0 ͕൒਺Ҏ্ •
Class 4 ͕ѹ౗తʹগͳ͍ • ը૾αΠζ΋όϥόϥ… • ϦʔΫͯ͠Δ…ʁ • 480x640ͷը૾ଟ͗͢ Our Solution: ը૾αΠζ΍ً౓͕όϥόϥ

Step 1 (in) Step 2 Step 3 Step 4 Step
5 (out) Ben’s CropͰً౓౷Ұ (1st Solution@DRD 2015) Original CropͰରԠ Our Solution: ը૾αΠζ΍ً౓͕όϥόϥ • த৺ͱΤοδ͔Β൒ܘऔಘ • தԝʹը૾഑ஔ&ϦαΠζ APTOS ؟ఈը૾ͷ໰୊఺ ܗঢ়͕҆ఆ͠ͳ͍ (ԁܗ or ΍͚ʹ֦େ͞Ε͍ͯΔ) ը૾αΠζͱً౓͕όϥόϥ Input: ؟ఈը૾ Output: ը૾αΠζ, ً౓, (͋Δఔ౓)ܗঢ়͕Ұఆͷը૾ Assumption: ؟ఈ͕΄΅ԁܗ Pros: όϥόϥ໰୊Λ͍͍ͩͨղܾ Cons: ͪΐͬͱ஗͍ Ben’s Edge Crop • ෳ਺ճɼ௚ઢऔಘ • ަ఺͔Βԁத৺Λܭࢉ ԁ্ೋ఺Λ݁Ϳઢ෼͔Βɼ ԁͷத৺Λ௨Δ௚ઢΛಋग़ Τοδݕग़ (Canny๏)

Our Total Solution Mean Average Blending Private: 48th / 2943
LB: 0.9259 (Top 2%) Public: 230th / 2943 LB: 0.8078 (Top 8%) Link: https://www.kaggle.com/muhakabartay/bump-to-0-800-e-2ln-more-efficient Most stable Kernel DRD 2015 APTOS (duplicated removed) Remove 0 label Preserve the label ratio of 2019 Train Train Preprocessing Preprocessing Ben’s Edge Crop (Size: 320) Normalization with ImageNet stats Normalization with ImageNet stats Train Training (regression) • EfficientNet-b4 (pretrained) • Batch size: 16 • Adam (lr: 0.001, weight_decay: 1e-5) • MSELoss • Augmentation • RandomRotation • Random HorizontalFlip • Threshold optimization • EfficientNet-b4 (pretrained) • Batch size: 32 • Adam (lr: 0.001, weight_decay: 1e-5) • MSELoss • Augmentation • RandomRotation • Random HorizontalFlip • Threshold optimization Training (regression) Training (regression) • EfficientNet-b5 (pretrained) • Batchsize: 64 • Adam (lr: 0.001, weight_decay: 1e-5) • QWK (early stopping: MSELoss) • Augmentation: • Defaults of fast.ai • Random VerticalFlip • Threshold optimization Preprocessing [fast.ai’s transform] Resize: squish Padding: reflection Only Resizing (Size: 256) Stratified 1fold Validation Stratified 5fold Validation Random split Validation Our Solution Prediction Prediction Prediction • Fold: 5 • TTA: 5 • RandomRotate • RandomHorizontalFlip • Fold: 1 • TTA: 10 • RandomRotate • RandomHorizontalFlip • Fold: 1 • TTA: None • Weight: 5 Private LB: 0.923899 Public LB: 0.806519 Private LB: 0.917967 Public LB: 0.790956 Private LB: 0.910… Public LB: 0.792… 25 10 5 @icebee__ @mocobt Ben’s Edge Crop (Size: 256)

ࢼͯ͠μϝͩͬͨ͜ͱ DRD2015ͰPre-training → APTOSͰ Training Validation͕ѱ͔ͬͨͷ͔ɼ৳ͼͣ Messidor Databaseͷར༻ ϥϕϧఆ͕ٛҧ͏͍͔ͤɼ৳ͼͣ Class
0Λ൑ผ͢ΔϞσϧ࡞੒ →ผϞσϧͷ༧ଌʹ࢖༻ Public͸Class 1, 2, 3ΛClass 0ͱޡ൑அ͢Δ͜ͱ͕ଟ͘ɼ͋·Γޮ͔ͣ ঘɼPrivateͰ͸ޮ͍͍ͯͨ໛༷ Weighted MSE LossΛద༻ (਺͕গͳ͍Classʹ܏ࣼ) ΄΅৳ͼͣ Our Solution

ࢼͤͳ͔ͬͨ͜ͱ Our Solution Batch size > 32 or Image size
> 320 Memory଍Γͣɽͨͩfast.aiͰ͸Batch size 64͕P100Ͱී௨ʹಈ͍͍ͯͨ… ্Ґ੎͸ը૾αΠζ͕େ͖͔ͬͨɽ࣮૷͕ແཧͩͱఘΊ͍ͯͨͷͰɼ൓ল… Inception, SEResNeXt, DenseNet165ͳͲଞϞσϧ EﬃcientNetͰखҰഋͩͬͨ…(Dense͸ࢼ͕ͨ͠ɼฃΘͳ͔ͬͨ) GradCAMͳͲͷՄࢹԽख๏Λ༻͍ͨEDA ࣮૷ؒʹ߹Θͣ… ͜Ε͸ຊ౰ʹࢼͨ͠΄͏͕ྑ͔ͬͨ… (ࢀߟ: Nakama>ω</ ͞Μͷϒϩά) Pseudo labeling ๨Εͯͨ

ޙ൒: ࣮ݧޮ཰Խ ࣮ݧޮ཰Խ Our Solution ࣮ݧޮ཰Խ

࣮ݧͷޮ཰Խ ࣮ݧޮ཰Խ ύΠϓϥΠϯͰࣗಈԽ ࣮ݧ؅ཧ

ύΠϓϥΠϯུ֓ਤ ࣮ݧޮ཰Խ: ύΠϓϥΠϯͰࣗಈԽ Preprocessing Training Prediction

ύΠϓϥΠϯུ֓ਤ Preprocessing Training Prediction ࣮ݧޮ཰Խ: ύΠϓϥΠϯͰࣗಈԽ

ύΠϓϥΠϯུ֓ਤ Preprocessing Training Prediction Config Train ✓Save as caches Preprocessed
Log Model ✓Main log ✓Learning curve ✓Confusion Matrix ✓Pytorch Model ✓Threshold data Config Preprocessed Test Config Model Submit ࣮ݧޮ཰Խ: ύΠϓϥΠϯͰࣗಈԽ

ύΠϓϥΠϯ·ͱΊ Kernel࢖༻ن੍໰୊΋ɼTraining͸΄΅Өڹͳ͠ ίϯϖޙظ͸GCPͰΠϯελϯεΛཱͯ·ͬͯͨ͘ Submit࣌ͷʮKernel্ཱ͕ͪΒͶ͑…ͳΜ΍͜Ε…ʯ໰୊͸ྲྀੴʹରॲͰ͖ͣ ύΠϓϥΠϯ͔ΒAPIܦ༝ͰDatasetΛUploadͯ͠ɼखؒ࡟ݮ Kaggle APIͷDocument͕ෆ଍͍ͯͯ͠ɼͦͦۤ͜͜࿑ͨ͠ धཁ͕͋ΔͳΒɼ·ͱΊͯهࣄʹ͠·͢ ͨͩ͠ɼύΠϓϥΠϯपΓ͸ڽΓ͗͢ΔͱίέΔ (ܦݧஊ)
࣮૷ίετ͹͔Γ͔͞Έɼຊےͷίϯϖ΁ͷऔΓ૊Έ͕ૄ͔ʹͳΓ͕ͪ ࣮ݧޮ཰Խ: ύΠϓϥΠϯͰࣗಈԽ

SlackʹΑΔ࣮ݧ؅ཧ ࣮ݧޮ཰Խ: ࣮ݧ؅ཧ …ͳΔ΄Ͳ..Α͘Θ͔Βͳ͍… Log༻ͷChannel΋࡞͕ͬͨɼฤू͠ʹ͍͘ & աڈͷ݁ՌΛৼΓฦΓʹ͔ͬͨ͘

࠷ڧͷ࣮ݧ؅ཧ ~ Grand Excel Master ~ ࣮ݧID: Kaggle Datasetͷ؅ཧ༻ LB
& CV Ծઆ & ࣮૷಺༰ & ݁Ռ ࣮ݧޮ཰Խ: ࣮ݧ؅ཧ

࣮ݧID: Kaggle Datasetͷ؅ཧ༻ LB & CV Ծઆ & ࣮૷಺༰ &
݁Ռ ࣮ݧ؅ཧ͔ΒΘ͔Δ൓ল఺ Ծઆͷཱͯํ͕Ṗ • ଞॴͰݟͨ஌ݟʹج͍ͮͨԾઆ͚ͩ • EDAʹج͍ͮͨԾઆΛཱͯΒΕ͍ͯͳ͍ ۭཝଟ͠ • ݁ՌΛ൓ᢸ͍ͯ͠ͳ͍ • ͦ΋ͦ΋ࠞಉߦྻΛݟͨͷ͕ʒ5೔લͱ͔ • ·͋…औΕͨͷͰ… ࣮ݧޮ཰Խ: ࣮ݧ؅ཧ

ࠓճɼ࣮ࡍʹ͔͔͓ͬͨۚ ࣮ݧޮ཰Խ 35,214ԁ ΪοΫϦࠊ࣏ྍඅ (ίϯϖऴྃ௚ޙʹෛই) 53,030ԁ

·ͱΊ APTOSͰԿͱ͔ۜϝμϧήοτ Original CropͰ্ͦͦ͜͜Ґߦ͚ͨ ը૾αΠζͱPseudo Labeling্͕Ґ੎ͱͷେ͖ͳࠩͩͬͨ ΞʔΩςΫνϟͷҧ͍͸ͦ͜·Ͱॏཁͳ໰୊Ͱ͸ͳ͍…ͱࢥ͏ ࣮ݧޮ཰Խʹ͸ύΠϓϥΠϯศརɼKaggleAPI΋ศར (չ͍͕͠) Excel͸݁ՌͷৼΓฦΓָ͕
ΪοΫϦࠊ͸೑ମతʹ΋ۚમతʹ΋ͱͯ΋ͭΒ͍ (͓·͚) RepositoryͱৼΓฦΓϒϩά • https://github.com/icebee16/kaggle_APTOS2019 • http://icebee.hatenablog.com/entry/2019/09/10/221351 • https://mocobt.hatenablog.com/entry/2019/09/09/013658

References ίϯϖͱಉ͡λεΫΛѻͬͨ࿦จ [Sayres et al. Ophthalmology 2019, Volume 126, Issue
4, Pages 552-564] [Krause et al. Ophthalmology 2018, Volume 125, Issue 8, Pages 1264-1272] [Poplin et al. Nature Biomedical Engineering 2018] [Gulshan et al. JAMA, The Journal of American Medical Association 2016, 316(22)] Survey Papers [Qureshi et al. Symmetry 2019, 11(6), 749] [Erfuth et al. Progress in Retinal and Eye Research 2018, Volume 67, Pages 1-29] [Fenner et al. Ophthalmology and Therapy 2018, Volume 7, Issue 2, Pages 333-346] [Almotiri et al. Applied Sciences 2018, 8(2), 155]

Aptos2019 48th Solution

Aptos2019 48th Solution

icebee16

More Decks by icebee16

Other Decks in Technology

Featured

Transcript

@icebee__ @mocobt UI4PMVUJPOίϯϖͷ;Γ͔͑Γ ࣗ࡞ύΠϓϥΠϯͱ͋Ε͜Ε "1504#MJOEOFTT%FUFDUJPO

ࣗݾ঺հ @icebee__ •ۚ༥ܥͷݚڀһ •ࠓճͷߩݙ͸ɼύΠϓϥΠϯͷ࡞੒ʹΑΔ࣮ݧͷޮ཰Խ •CGͷR&D •ࠓճͷߩݙ͸ɼจݙௐࠪ΍EDAʹΑΔﬁnding @mocobt Kaggleʹ͸νʔϜݻఆͰࢀՃ͍ͯ͠·͢ʂʂ ࢀߟ: https://mocobt.hatenablog.com/entry/2019/07/14/013922

1ϖʔδͰΘ͔Δ APTOS 2019 Input: ؟ఈը૾ Output: ओ؍ධՁ஋ ݈߁ ةݥ 0

APTOSͷ݁Ռ …ͳΜͰνʔϜ໊͕ʮΈ͔Μʯͳͷʁ Private 4IBLF͠ͱΔʙ 4IBLF͠ͱΔʙ (Top 2%) Public ͠Ί͖Γ౰೔ʹΑ͏΍͘ (Top

Έ͔Μͷ༝དྷ ※ը૾͸ΠϝʔδͰ͢ɽҎԼɼϓϥΠόγʔΛߟྀͯ͠ɼΈ͔Μͷը૾Λ༻͍·͢ɽ ؟ఈը૾͕Έ͔Μʹݟ͖͑ͯͨɽ ඒຯͦ͠͏… Έ͔Μ৯΂͍ͨ… αΠί͔ͳʁ (໧ͬͯνʔϜ໊ʹઃఆ)

Agenda ࣮ݧޮ཰Խ Our Solution

લ൒: Our Solution ࣮ݧޮ཰Խ Our Solution Our Solution

Problems & Approaches ෼ྨͰղ͔͘ʁճؼͰղ͍ͨޙʹᮢ஋ॲཧ͢Δ͔ʁ ճؼͰղ͍ͨޙʹɼᮢ஋࠷దԽ Ξϯαϯϒϧͷ΍Γํ͕Ṗ ճؼϞσϧͷग़ྗฏۉ + ᮢ஋΋ฏۉԽ ϥϕϧ͕͋Γ͑Μ΄Ͳෆۉߧ

ճؼͰղ͍ͨޙʹɼᮢ஋࠷దԽ ݁Ռతʹɼճؼ+ᮢ஋ॲཧ͕ڧ͔ͬͨɽ ᮢ஋ͷ࠷దԽख๏: https://www.kaggle.com/abhishek/optimizer-for-quadratic-weighted-kappa ෼ྨ (Classiﬁcation) ճؼ (Regression) + ᮢ஋ॲཧ

ճؼϞσϧͷग़ྗฏۉ + ᮢ஋΋ฏۉԽ Our Solution: Ξϯαϯϒϧͷ΍Γํ͕Ṗ ֤ϞσϧຖʹInference + શϞσϧͷฏۉᮢ஋Ͱ෼ྨ ộ

Traditional EDA of APTOS Data • Class 0 ͕൒਺Ҏ্ •

ಉλεΫ޲͚σʔλͰਫ૿͠ 2015೥։࠵ͷྨࣅίϯϖDataΛҰ෦ར༻ (ҎԼɼDRD 2015) ࢀߟ: https://www.kaggle.com/c/diabetic-retinopathy-detection Resizeࡁ: https://www.kaggle.com/benjaminwarner/resized-2015-2019-blindness-detection-images ࢖͍ํ ύλʔϯ1:

Traditional EDA of APTOS Data • Class 0 ͕൒਺Ҏ্ •

Step 1 (in) Step 2 Step 3 Step 4 Step

Our Total Solution Mean Average Blending Private: 48th / 2943

ࢼͯ͠μϝͩͬͨ͜ͱ DRD2015ͰPre-training → APTOSͰ Training Validation͕ѱ͔ͬͨͷ͔ɼ৳ͼͣ Messidor Databaseͷར༻ ϥϕϧఆ͕ٛҧ͏͍͔ͤɼ৳ͼͣ Class

ࢼͤͳ͔ͬͨ͜ͱ Our Solution Batch size > 32 or Image size

ޙ൒: ࣮ݧޮ཰Խ ࣮ݧޮ཰Խ Our Solution ࣮ݧޮ཰Խ

࣮ݧͷޮ཰Խ ࣮ݧޮ཰Խ ύΠϓϥΠϯͰࣗಈԽ ࣮ݧ؅ཧ

ύΠϓϥΠϯུ֓ਤ ࣮ݧޮ཰Խ: ύΠϓϥΠϯͰࣗಈԽ Preprocessing Training Prediction

ύΠϓϥΠϯུ֓ਤ Preprocessing Training Prediction ࣮ݧޮ཰Խ: ύΠϓϥΠϯͰࣗಈԽ

ύΠϓϥΠϯུ֓ਤ Preprocessing Training Prediction Conﬁg Train ✓Save as caches Preprocessed

SlackʹΑΔ࣮ݧ؅ཧ ࣮ݧޮ཰Խ: ࣮ݧ؅ཧ …ͳΔ΄Ͳ..Α͘Θ͔Βͳ͍… Log༻ͷChannel΋࡞͕ͬͨɼฤू͠ʹ͍͘ & աڈͷ݁ՌΛৼΓฦΓʹ͔ͬͨ͘

࠷ڧͷ࣮ݧ؅ཧ ~ Grand Excel Master ~ ࣮ݧID: Kaggle Datasetͷ؅ཧ༻ LB

࣮ݧID: Kaggle Datasetͷ؅ཧ༻ LB & CV Ծઆ & ࣮૷಺༰ &

ࠓճɼ࣮ࡍʹ͔͔͓ͬͨۚ ࣮ݧޮ཰Խ 35,214ԁ ΪοΫϦࠊ࣏ྍඅ (ίϯϖऴྃ௚ޙʹෛই) 53,030ԁ

·ͱΊ APTOSͰԿͱ͔ۜϝμϧήοτ Original CropͰ্ͦͦ͜͜Ґߦ͚ͨ ը૾αΠζͱPseudo Labeling্͕Ґ੎ͱͷେ͖ͳࠩͩͬͨ ΞʔΩςΫνϟͷҧ͍͸ͦ͜·Ͱॏཁͳ໰୊Ͱ͸ͳ͍…ͱࢥ͏ ࣮ݧޮ཰Խʹ͸ύΠϓϥΠϯศརɼKaggleAPI΋ศར (չ͍͕͠) Excel͸݁ՌͷৼΓฦΓָ͕

References ίϯϖͱಉ͡λεΫΛѻͬͨ࿦จ [Sayres et al. Ophthalmology 2019, Volume 126, Issue