Humpback whale identification challenge（通称クジラコンペ）反省会

Slide 1

Slide 1 text

鯨コンペ反省会あるいは1週間チャレンジ @yu4u

Slide 2

Slide 2 text

やるやる詐欺① 1

Slide 3

Slide 3 text

やるやる詐欺② 2

Slide 4

Slide 4 text

team merger deadline (2/21) に初サブ 3 • いきなり独⾃モデル初サブで死亡 • 元々前職の知り合いとやろうという話をしていたのでミジンコなのにマージしてもらう

Slide 5

Slide 5 text

4 • 最近の距離学習はクラス分類として学習できるらしいことを聞く • 速攻ArcFaceベースの⼿法でやってみるとまともな精度が出た︕ • ↓読みましょうモダンな深層距離学習 (deep metric learning) ⼿法: SphereFace, CosFace, ArcFace https://qiita.com/yu4u/items/078054dfb5592cbb80cc

Slide 6

Slide 6 text

本題

Slide 7

Slide 7 text

Summary 6 • 上位それぞれがかなり異なる⼿法で解いている • 各⼿法⾃体がシンプル、1モデルでも⾼い精度が出せる • 異なるモデルのアンサンブルも有効という、かなり良コンペだったのではないか

Slide 8

Slide 8 text

鯨コンペ概要 7 • train/test: 🐳ちゃんの可愛いしっぽ画像 – ある程度align, cropされている • trainの各画像には、whale_idがついている • whale_idには識別されていない “new_whale” が存在 • 各test画像に対し、正解のwhale_idを当てる問題 • 精度指標はMAP@5, “new_whale” が正解となる🐳ちゃんが30%弱 train test

Slide 9

Slide 9 text

鯨コンペ概要 8 • train/test: 🐳ちゃんの可愛いしっぽ画像 – ある程度align, cropされている • trainの各画像には、whale_idがついている • whale_idには識別されていない “new_whale” が存在 • 各test画像に対し、正解のwhale_idを当てる問題 • 精度指標はMAP@5, “new_whale” が正解となる🐳ちゃんが30%弱 train train: 25361 (unique id: 5004) new_whale 9664 w_23a388d 73 w_9b5109b 65 w_9c506f6 62 … test: 7960

Slide 10

Slide 10 text

罠 9 • new_whaleにもwhale_idが振られている🐳画像がある • 同じ🐳だが異なるwhale_idが振られているものがある（いっぱい）

Slide 11

Slide 11 text

鯨コンペ概要 10 • タスク – 問題としては顔認識と同じ – 実はGoogle Landmark Recognition Challengeとも同じ • 考えられる解法 – 距離学習（顔認識デファクト、landmark challengeで使われた） – クラス分類として解く（new_whaleが課題） – 局所特徴マッチングで解く（landmark challengeで使われた）実際はどうだったのか︖

Slide 12

Slide 12 text

神kernel① 11 • 🐳ちゃんしっぽdetector • 顔認識においては、必ず顔検出が前処理として⼊る • どのアプローチでも必ず効果がある https://www.kaggle.com/martinpiotte/bounding-box-model Maskカーネルもあるよ https://www.kaggle.com/c/humpback-whale-identification/discussion/78453

Slide 13

Slide 13 text

神kernel② 12 https://www.kaggle.com/seesee/siamese-pretrained-0-822

Slide 14

Slide 14 text

神kernel② 13 • みんなSiameseNet, SiameseNet⾔うようになったkernel • SiameseNetは通常contrastive lossを⽤いて距離学習を⾏う – 学習が⼤変、不安定 • このkernelのSiameseNetは画像を2枚⼊⼒しそれらが同⼀の🐳かどうかを出⼒する – クラス分類なので学習が簡単 – 精度も恐らくこちらのほうが⾼い CNN CNN d2 weight share 特徴ベクトル 🐳 🐳 CNN CNN weight share 🐳 🐳 contrastive loss x1 x2 x1 +x2 x1 *x2 |x1 -x2 | |x1 -x2 |2 全てpairwise の演算 CNN 0~1 binary crossentropy 通常のSiameseNet +contrastive loss Kernelの SiameseNet f(x1 , x2 ) = f(x2 , x1 ) となる設計 https://www.kaggle.com/seesee/siamese-pretrained-0-822

Slide 15

Slide 15 text

神kernel② 14 • みんなSiameseNet, SiameseNet⾔うようになったkernel • SiameseNetは通常contrastive lossを⽤いて距離学習を⾏う – 学習が⼤変、不安定 • このkernelのSiameseNetは画像を2枚⼊⼒しそれらが同⼀の🐳かどうかを出⼒する – クラス分類なので学習が簡単 – 精度も恐らくこちらのほうが⾼い CNN CNN d2 weight share 特徴ベクトル 🐳 🐳 CNN CNN weight share 🐳 🐳 contrastive loss x1 x2 x1 +x2 x1 *x2 |x1 -x2 | |x1 -x2 |2 全てpairwise の演算 CNN 0~1 binary crossentropy 通常のSiameseNet +contrastive loss Kernelの SiameseNet f(x1 , x2 ) = f(x2 , x1 ) となる設計特徴抽出ネットワーク分類ネットワーク https://www.kaggle.com/seesee/siamese-pretrained-0-822

Slide 16

Slide 16 text

神Kernel②の訓練 15 1. train🐳から特徴抽出 – 特徴抽出NWをforward 2. positive pair抽出 3. negative pair抽出 a. 全🐳特徴ベクトル間のスコア(*-1)を計算しcost matrixとする（分類NWをforward。画像数 C2 。分類NWは軽いので可能） b. cost matrixの同じ🐳の部分を無限⼤に。対⾓も c. cost matrixに対しlinear assignment problem (LAP)を解いて costの⼩さいペアリストを取得＝違う🐳なのにスコアが⾼い組み合わせを作る使ったペアはコスト無限⼤に。5epoch使い回す 4. pos, negペアをネットワーク全体で学習同⼀🐳内画像で同⼀画像がペアにならないようにする最初はcost matrix に乱数を加えて⼿⼼を加える

Slide 17

Slide 17 text

神Kernel②の訓練 16 1. train🐳から特徴抽出 – 特徴抽出NWをforward 2. positive pair抽出 3. negative pair抽出 a. 全🐳特徴ベクトル間のスコア(*-1)を計算しcost matrixとする（分類NWをforward。画像数 C2 。分類NWは軽いので可能） b. cost matrixの同じ🐳の部分を無限⼤に。対⾓も c. cost matrixに対しlinear assignment problem (LAP)を解いて costの⼩さいペアリストを取得＝違う🐳なのにスコアが⾼い組み合わせを作る使ったペアはコスト無限⼤に。5epoch使い回す 4. pos, negペアをネットワーク全体で学習同⼀🐳内画像で同⼀画像がペアにならないようにするこんなんだと重すぎて無理 CNN 🐳 🐳 0~1 最初はcost matrix に乱数を加えて⼿⼼を加える

Slide 18

Slide 18 text

神Kernel②の推論 17 1. train🐳から特徴抽出（特徴抽出NWをforward） 2. test🐳から特徴抽出（特徴抽出NWをforward） 3. test🐳 vs. train🐳のスコアを算出（分類NWをforward） 4. for each test🐳: スコア順にtrain🐳の🐳IDを正解に加える但し、スコアがしきい値以下の場合、正解にnew_whaleがなければnew_whaleを加える whale_id毎のmeanのほうが良いかも

Slide 19

Slide 19 text

1st Solution 18 • 5004クラスをflipして10008クラスしてそれぞれbinary classification 🐳 global average pooling channel⽅向に pooling BCE+ lovasz_loss 512x256 BBOX RGB+mask https://www.kaggle.com/c/humpback-whale-identification/discussion/82366 test時はflipも⼊⼒して平均を取る（対応するクラスが分かっている） https://github.com/earhian/Humpback-Whale-Identification-1st-

Slide 20

Slide 20 text

3rd Solution 19 • Train original bbox regressor (5 fold CV and trained 5 models) • 320x320 input, DenseNet121 + ArcFace (s=65, m=0.5), weight decay 0.0005, dropout 0.5 • Augmentation: average blur, motion blur; add, multiply, grayscale; scale, translate, shear, rotate; align (single bbox) or no-align • Inference – train: 各🐳毎に5 BBOXを利⽤して特徴ベクトルを出す 🐳ID毎に更に平均 – test: 各🐳毎に5 BBOXを利⽤して特徴ベクトルを出す↑と⽐較 https://www.kaggle.com/c/humpback-whale-identification/discussion/82484

Slide 21

Slide 21 text

未練 20

Slide 22

Slide 22 text

4th Solution 21 • SIFT+RANSACで全ペアbrute force! 1. Loop through all test/train pairs 2. Match keypoints using faiss 3. Double homography filtering of keypoints (LMedS followed by RANSAC) 4. xgboost prediction to validate homography matrix 5. if # of matches > threshold, then use prediction • Top-1の結果を↑で算出し、top-2 ~ 5をSiameseNetで算出 https://www.kaggle.com/c/humpback-whale-identification/discussion/82356 Landmark コンペでやってたフル解像度の🐳 CLAHE (Contrast Limited Adaptive Histogram Equalization) で正規化 UNetでしっぽセグメンテーション

Slide 23

Slide 23 text

5th Solution 22 • SiameseNet (DenseNet121 backbone) • Original BBOX regressor • Augmentation: shear, rotation, flipping, contrast, Gaussian noise, blurring, color augmentations, greying, random crops • LAPをサブブロックで⾏う。サブブロックは毎回ランダムに⽣成 • 4-fold stratified cross validation + 15-model ensemble • pseudo label -> update folds (e.g. LB 0.938 -> LB 0.950 -> LB 0.965, etc.) • Stacking（そこまで効果なし） https://www.kaggle.com/c/humpback-whale-identification/discussion/82352 https://weiminwang.blog/2019/03/01/whale-identification-5th-place-approach- using-siamese-networks-with-adversarial-training/ 半分くらいベースにしたカーネルの説明

Slide 24

Slide 24 text

7th Solution 23 • SE-ResNeXt-50 -> global concat (max, avg) pool -> BN -> Dropout -> Linear -> ReLU -> BN -> Dropout -> clf (5004) • 4 head classification • use bbox • center loss, ring loss, GeM pooling • verification by local features (Hessian-AffNet + HardNet) https://github.com/ducha-aiki/mods-light-zmq • バックボーンは、⾊々試したが、チームメイトの距離学習を⾏ったネットワーク（SE-ResNeXt-50）をfinetuneするのが良かった • new_whale is inserted to softmaxed predictions with constant threshold, which is set on validation set by bruteforce search in range from 0 to 0.95. https://www.kaggle.com/c/humpback-whale-identification/discussion/82352 https://github.com/ducha-aiki/whale-identification-2018 距離学習のようなことをしているのでsoftmax閾値でもいけた︖

Slide 25

Slide 25 text

7th Solution 24 https://www.kaggle.com/c/humpback-whale-identification/discussion/82352

Slide 26

Slide 26 text

7th Solution 25 • 距離学習ベースのアプローチ – training on RGB images: 256x256, 384x384, 448x448, 360x720 – Augmentations: random erasing, affine transformations (scale, translation, shear), brightness/contrast – Models: resnet34, resnet50, resnet101, densenet121, densenet162, seresnext50 ̶ backbone architectures that weʼve tried, followed by GeM pooling layer +L2 + multiplier – Loss: hard triplet loss • 実際のサブミッションには利⽤されず、クラス分類ベースの⼿法のベースネットワークとして利⽤された https://www.kaggle.com/c/humpback-whale-identification/discussion/82502

Slide 27

Slide 27 text

9th Solution 26 • Summary: Adam, Cosine with restarts, CosFace, ArcFace, High- resolution images, Weighted sampling, new_whale distillation, Pseudo labeled test, Resnet34, BNInception, Densenet121, AutoAugment, CoordConv, GAPNet • 1024x1024 resnet34, 512x152 BNInception, 640x640 DenseNet121 • CosFace: s=32, m=0.35. ArcFace: m1=1.0, m2=0.4, m3=0.15 • Augumentation: Horizontal Flip, Rotate with 16 degree limit, ShiftScaleRotate with 16 degree limit, RandomBrightnessContrast, RandomGamma, Blur, Perspective transform: tile left, right and corner, Shear, MotionBlur, GridDistortion, ElasticTransform, Cutout https://www.kaggle.com/c/humpback-whale-identification/discussion/82427 CosFace + ArcFace

Slide 28

Slide 28 text

10th Solution 27 （SiameseNet part） • Summary – Siamese architecture – Metric learning featuring brand-new CVPR 2019 method (will be published soon) – Classification on features – Large blend for new whale/not new whale binary classification • Tricks – Flip augmentation for both positive and negative pairs – ResNet-18, ResNet-34, SE-ResNeXt-50, ResNet-50, image size: 299->384->512 – 0.929 LB -> ensemble 0.940 https://www.kaggle.com/c/humpback-whale-identification/discussion/82430

Slide 29

Slide 29 text

10th Solution 28 （Metric learning part）Another solution will be explained later in detail by @asanakoy. In two words, it is metric learning with multiple branches and margin loss, trained on multiple resolution crops using bboxes, grayscale and RGB input images. He also used his brand-new method from CVPR which allowed for 1-2% score boost. らしい（Classification part）concat features from branch models and train classifcation model （Post processing）took their TOP-4 predictions for each whale. Then, for all of our models, we took their predictions on these set of classes. We used a blend of LogReg, SVM, several KNN models, and LightGBM to solve a binary classification problem. https://www.kaggle.com/c/humpback-whale-identification/discussion/82430

Slide 30

Slide 30 text

15th Solution 29 • At the beginning, we using pure softmax to classification 5005 class. The best result we obtain is around 0.86X using seresnext50. • Then we resort to sphereface. To use sphereface, we abandon new whales, which means we only use around 19K images. This gives us 0.920 using seresnext-50 (multi-layer fusion, 384384), 0.919 using resnext50 (multi-layer fusion,384384). • We also tried arcface, which gives us 0.911 using seresnext-50 (multi-layer fusion, 384*384). https://www.kaggle.com/c/humpback-whale-identification/discussion/82361

Slide 31

Slide 31 text

My Solution① 30 • 768x256🐳(BBOX), resnext101_32x4d backbone, ArcFace • known🐳のみ、訓練時はduplicate🐳IDを1つにまとめる • 10枚以下の画像の🐳は10枚以上になるようにover sampling • Augmentation: grayscale, Gaussian noise, Gaussian blur, rotation, shear, piecewise affine, color shift, contrast, crop • train🐳 vs. test🐳のcos類似度を同⼀IDに対して平均 768x256 24x8 6x2 24576 bn, avepool(4) flatten, dropout 512 FC 5004 FC ArcFace cross entropy ResNeXt101 Feature vector private LB: 0.92239 public LB: 0.91156 NO VALIDATION SET ;D due to time constraint

Slide 32

Slide 32 text

My Solution② 31 • Ensemble with 512x512 SiameseNet model • test画像 vs. train画像のmatrixをtest画像 vs. 🐳IDのmatrixにする • TTA: bounding boxのスケールをオリジナル＋2スケール利⽤ 768x256 ArcFace 512x512 SiameseNet 5004 7960 768x256 ArcFace 𝑃 = # !"# #$ 𝑤! 𝑃! % 𝑃# 𝑃$ 𝑃& 𝑃#' 𝑃## 𝑃#$ 𝑃 𝛼は0~1 0に近づくとvotingぽくなる 1は普通のweighted average 個人的にはとりあえず0.5にする test🐳画像 train🐳”ID” 閾値で切って new_whaleを差し込み submissionファイル化個々の値は 0~1 private: 0.92239 public: 0.91156 512x512 ArcFace private: 0.92981 publoc: 0.91558 private: 0.90242 public: 0.88183 private: 0.89706 public: 0.86712 … 基本閾値未tuning

Slide 33

Slide 33 text

Milestones 32 • 2/21: 独⾃モデルミジンコ初サブ • 2/22: ArcFaceを知る • 2/24: 448x448 model 0.786 • 2/26: 768x256 model 0.879 • 2/27: 768x256 model 0.887 • 2/28: 768x256 model 0.910 • 2/28ド深夜: 3モデル完成、アンサンブル実装 private: 0.95448, public: 0.94632 • 超能⼒ハイパラ調整により3subでアンサンブルガチャに勝利 • スコアベースアンサンブル、全く違うモデルのアンサンブル