Upgrade to Pro — share decks privately, control downloads, hide ads and more …

APTOS 2019 28th place solution

F6c0cb53d72908942998923f1a05c71b?s=47 Maxwell
September 09, 2019

APTOS 2019 28th place solution

APTOS 2019 Blindness Detection
https://www.kaggle.com/c/aptos2019-blindness-detection/overview

Team 2 AI Ophthalmologists 28th place solution

twitter: https://twitter.com/Maxwell_110

F6c0cb53d72908942998923f1a05c71b?s=128

Maxwell

September 09, 2019
Tweet

Transcript

  1. APTOS 2019 Blindness Detection Detect diabetic retinopathy to stop blindness

    before it's too late 2 AI Ophthalmologists ( Maxwell + tereka )
  2. 1. Competition Overview 2. Result 3. Model Pipeline 4. What

    we did ( worked/Not worked ) 5. Postmortem
  3. 1. Competition Overview 2. Result 3. Model Pipeline 4. What

    we did ( worked/Not worked ) 5. Postmortem
  4. Asia Pacific Tele-Ophthalmology Society 2019 Detect Diabetic Retinopathy to stop

    blindness before it’s too late Build a model to help identify diabetic retinopathy automatically  Aravind Eye Hospital technicians travel to rural areas to capture images  Shortage of high trained doctors to review the images and provide diagnosis in rural areas of India Aravind Eye Hospital Madurai, Tamil Nadu Rural areas
  5. Localized Various Findings of Diabetic Retinopathy

  6. Data  train : 3,662 public : 1,928 private :

    ~ 11,000  png format images  target label 0 : No DR 1 : Mild 2 : Moderate 3 : Severe 4 : Proliferative DR 0 : No DR 1 : Mild 2 : Moderate 3 : Severe 4 : Proliferative DR
  7. 1. Less Images  External data (DRD, IDRiD) allowed 2.

    Target Imbalance 3. Target Distribution  Difference of train and public  How is private ? 4. Duplicated Images 5. Various Image Sizes  Difference of train and test 6. Label Inconsistency  Different labels by each ophthalmologist Data Issue Duplicated Images Target Distributions Credit to: https://www.kaggle.com/currypurin/image-shape-distribution-previous-and-present Image Size Distributions
  8. https://youtu.be/oOeZ7IgEN4o?t=145 Label Inconsistency ( Ophthalmoligists are inconsistent )

  9. Evaluation • Quadratic Weighted Kappa = − , , ,

    , , , , = −  Confusion Matrix Based Metric  Label Prediction ( Threshold Optimization )  Unstable  Target Distribution Dependent  Depend on label distribution  Unstable like a chi^2 test , Actual Predicted 10 5 5 20 5 35 0 40 15 0 25 40 30 40 30 100 class 1 class 2 class 3 class 1 class 2 class 3 , 6 8 6 20 12 16 12 40 30 40 30 40 30 40 30 100 class 1 class 2 class 3 class 1 class 2 class 3 , 0 1 4 1 0 1 4 1 0 class 1 class 2 class 3 class 1 class 2 class 3 Example = .
  10. 1. Competition Overview 2. Result 3. Model Pipeline 4. What

    we did ( worked/Not worked ) 5. Postmortem
  11. Submission 2 ( 3 models blended ) Public LB :

    0.825 / 65th Private LB : 0.927 / 28th CV 0.9363 on APTOS all 0.7542 on Higher Grade APTOS Submission 1 ( 2 models blended ) Public LB : 0.832 / 38 th Private LB : 0.925 / 47 - 60 th ? CV 0.9359 on APTOS all 0.7557 on Higher Grade APTOS Private LB Public LB Our Final Selection
  12. Hard to get a consistent correlation Our Submission History

  13. Spearman Correlation between public and private https://github.com/Greenwind1/shakeshakeshake/blob/master/shakeshake.csv APTOS 2019 :

    0.883 VSB Power Line : 0.656 LANL Earthquake : 0.607 Malware : 0.916
  14. 1. Competition Overview 2. Result 3. Model Pipeline 4. What

    we did ( worked/Not worked ) 5. Postmortem
  15. APTOS 2019 Blindness Detection 320 320 Remove black background Resize

    - HorizontalFlip - Brightness - Contrast - RGBshift - Scale - Rotate - RandomErasing - Stratified 5 fold - Pretrained on ImageNet - BCE loss - Early Stopping with BCE - Adam 1e-4 RGB brightness normalization (ImageNet base) + Preprocessing SE-ResNext 50 ( Pre-trained ) Fine tuning APTOS IDRiD DRD 2015 Test TTA ( 3 times ) - HorizontalFlip - Brightness - Contrast - RGBShift - Scale - Rotate Prediction Encoding + + Remove black background Preprocessing CLAHE ( adaptive histogram equalization ) 300 260 Resize Grade Balanced Sampling + + Augmentation - HorizontalFlip - Brightness - Contrast - RGBshift - Scale - Rotate - Shear - Stratified 5 fold - Pretrained on ImageNet - Input normalized with BN - Clipped MSE (CMSE) - Early Stopping with CMSE - Adam 1e-3 / 5e-4 Changing Sampling Rate with Disease Grade ( 1 : 2 : 2 : 2 : 2 ) TTA ( 3 / 3 times ) - Brightness - Contrast - RGBShift - Scale - Rotate Weighted Blending  Blending Coefficients SE-ResNext 50: 0.469 EfficinetNet B2: 0.273 EfficientNet B3: 0.258  QWK optimization with Nelder-Mead 320 Copyright 2019 @ Maxwell_110 Public: 38 th Local 0.936 on APTOS train Public LB 0.832 Private: 28 th Private LB 0.927 Pre-training Augmentation Preprocessing Augmentation EfficientNet B2 ( Pre-trained ) EfficientNet B3 ( Pre-trained ) EfficientNet B2 ( Regression ) EfficientNet B3 ( Regression ) SE-ResNext 50 ( Ordinal Regression ) Predict Ensemble Preprocessing Preprocessing Using all data
  16. Our Strategy  QWK depends on label distribution  Distribution

    of train and public test are different  Build 2 models trained with different label distributions (train-like and public-like)  Blend those and get a robust prediction Private = Train (private LB QWK = train QWK)
  17. 1. Competition Overview 2. Result 3. Model Pipeline 4. What

    we did ( worked/Not worked ) 5. Postmortem
  18. 4 – 1. Ordinal Regression Q. ? = [ 1,

    0, 0 ] [ 0, 0, 1 ] [ 0, 1, 0 ] 0 : No DR = [ 1, 0, 0 ] 1 : Mild = [ 0, 1, 0 ] 2 : Moderate = [ 0, 0, 1 ] Cao, 2019, Rank-consistent-Ordinal-Regression-Neural-Networks
  19. ≠ [ 1, 0, 0 ] [ 0, 0, 1

    ] [ 0, 1, 0 ] Independent NOT Independent Rank Values 0 : No DR 1 : Mild 2 : Moderate < < When diagnosis = 2, the eye will contain diagnosis 1 disease or equivalent. A.
  20. [ 1, 0, 0 ] [ 0, 0, 1 ]

    [ 0, 1, 0 ] FC Layer Naive Classification Rank - Consistent Ordinal Regression Soft-Max Loss Multi - Class BCE Loss Multi - Label [ 0, 0 ] [ 1, 1] [ 1, 0 ] FC Layer = 0 + 0 = 0 = 1 + 0 = 1 = 1 + 1 = 2
  21. Theory = + − (= ) : The Number of

    Classes : Retina Image Features ( Input of CNN ) ∈ , 1, 2, 3, : Rank Label ⟹ (), … , , … , (−) () ∈ > ex ) = ⟹ , , , , ( ) = , ( ) = 0 : No DR 1 : Mild 2 : Moderate 3 : Severe 4 : Proliferative DR  K - 1 binary tasks share the same weight parameter , but have independent bias units . = = − ( ) = () = > . () = = + = ( , + ) Predicted Rank Condition. ≥ ≥ … ≥ − Requred for the Ordinal Information and Rank - Monotonicity => Let's prove in the next slide ! Output Function of CNN ex ) = , = , = , = = [ . , . , . , . ] = + + + =
  22. Proof by contradiction. ( Using the loss function and its

    optimal solution (W*, b*) ) Theorem ≥ ≥ … ≥ − satisfied with the optimal solution (W*, b*) w* : Optimal CNN Weights with train data b* : Optimal Biases of the final layer with train data  Binary Cross Entropy as Loss Function , = − = = − [ () log ( , + ) + − log 1 − ( , + ) ]  Sufficient Condition () = ≥ () = ≥ … ≥ (−) = ⇒ ≥ ≥ … ≥ − () = ≥ () = ≥ … ≥ (−) = ⇔ ∗ ≥ ∗ ≥ … ≥ − ∗ Suppose, (∗, ∗) is an optimal solution and < + for some . Let, = ∶ () = (+) = , = ∶ () = (+) = , = ∶ () = , (+) = ∪ ∪ = , , … , Denote, = ( , + ) = log( (+1 )) − log > ′ = log(1 − ( )) − log 1 − +1 > If replacing with + , the change of Loss is given as = − ∈ + ∈ ′ − ∈ If replacing + with , = ∈ − ∈ ′ − ∈ ′ Then we have, + = − ∈ + ′ < ⇔ < 0 or < show this ! contradictory More Optimal solution exists !!! ( Replacing to satisfy ∗ ≥ ∗ ≥ … ≥ − ∗ )
  23. satisfied with the optimal solution (W*, b*) w* : Optimal

    CNN Weights with train data b* : Optimal Biases of the final layer with train data When using Mini-Batch training and OOF prediction, this assumption will be violated a little. () = ≥ () = ≥ … ≥ (−) = ~ 14,000 examples e.g ) [p1, p2, p3, p4] = [0.90, 0.48, 0.52, 0.30] Hard Encoding with threshold 0.5 => diagnosis 3 !?
  24. Prediction Encoding ( Soft Encoding ) To blend with naïve

    regression models, need to encode 4 probabilities to one scalar value. = = + () = + () = + () = ( ≤ ≤ ) e.g ) [p1, p2, p3, p4] = [0.90, 0.48, 0.52, 0.30] Hard Encoding with threshold 0.5 => diagnosis 3 Soft Encoding => 2.20
  25. 4 – 2. Others a. EfficientNet https://arxiv.org/abs/1905.11946 b. MixUp and

    CutMix https://arxiv.org/abs/1710.09412 https://arxiv.org/abs/1905.04899 c. Ben's preprocessing d. Statistics Value Mean, Std, Quantile, ... Implemented as `Lambda Layer` e. Psuedo Labeling f. RAdam https://arxiv.org/abs/1908.03265 fig1. diagnosis 1 fig2. diagnosis 2 MixUp ( = fig1 * 0.5 + fig2 * 0.5 ) = + = + diagnosis 1.5 ? maybe not... I think it's more than 2, due to non linear transformation CutMix fig2 <--- ---> fig1 diagnosis 1.7 ? My guess is 2.0 or so... only on private
  26. 1. Competition Overview 2. Result 3. Model Pipeline 4. What

    we did ( worked/Not worked ) 5. Postmortem
  27. We failed to train more complicated models, EfficientNetB4, B5, SE-ResNeXt

    101, ...  The Number of Batch was low ?  Gradient Accumulation did not work. BN layer did bad ? More GPU memory will verify this !
  28.  24GB GDDR6  1770 MHz  576 Tensor Core

     4608 CUDA core
  29. Thank you ! Any Question ?