分類が難しいデータを探して重点的に学習する • Large margin local embedding[28] • Class rectification loss[27] • いずれも顔認識などの深層距離学習由来の研究 • 実装のしやすい3つを紹介 • Two-phase learning • Mean false error loss • Focal-loss
(Python package for imbalanced learning) 不均衡データのサンプリング等を扱ったPythonのライブラリ(scikit-learn準拠) Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning (Guillaume Lemaitre et al., 2016) ・レビュー論⽂ [2] Learning from Imbalanced Data (Haibo He and Edward A. Garcia, 2008) Non-deepな範囲で不均衡データの対策⼿法をまとめたレビュー論⽂.本資料の前半の多くはこちらに基づいている [3] Survey on deep learning with class imbalance (Justin M. Johnson and Taghi M. Khoshgoftaar, 2019) Deepに関連した不均衡データの対策をまとめたレビュー論⽂.本資料の後半の多くはこちらに基づいている. [4] Learning from class-imbalanced data: Review of methods and applications (Guo Haixiang et al., 2017) [5] Learning from imbalanced data: open challenges and future directions (B Krawczyk, 2016) ・Non-Deep関連 [6] ADASYN: Adaptive synthetic sampling approach for imbalanced learning (Haibo He et al., 2008) [7] An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme (Jingjun Bia and Chongsheng Zhang, 2018)
insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics (Victoria López et al, 2013) データレベル,アルゴリズムレベル,ハイブリッド含めて多くのデータセットで精度検証.性能としてはEnsembleがベスト. 性能を下げる要因として,クラス不均衡以外の要因(ノイズ,クラス間の重なり etc.)についてもまとめて議論されている. [9] Clustering-based undersampling in class-imbalanced data (Lin Wei-Chao et al., 2017) [10] Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification (Miho Ohsaki et al., 2017) [11] Experimental Perspectives on Learning from Imbalanced Data (Jason Van Hulse et al., 2007) この論⽂ではトータルで⾒るとRandom Under Samplingが良かったが,ベストな⼿法はアルゴリズムや評価指標次第という結論. [12] KRNN: k Rare-class Nearest Neighbour classification (Xiuzhen Zhang et al., 2017) [13] Learning from Imbalanced Data in Presence of Noisy and Borderline Examples (Krystyna Napierała et al., 2010) [14] Multiclass Imbalance Problems- Analysis and Potential Solutions (Shuo Wang and Xin Yao 2012) [15] SMOTE Synthetic Minority Over-sampling Technique (N. V. Chawla et al., 2002) [16] SMOTE‒IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering (José A.Sáez et al., 2015) [17] Under-sampling class imbalanced datasets by combining clustering analysis and instance selection (Chih-Fong Tsai et al., 2019) [18] Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric (Sabri Boughorbel et al., 2017) [19] Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions (V. Garc ́ıa et al, 2009)
systematic study of the class imbalance problem in convolutional neural networks (Mateusz Buda et al., 2017) クラス不均衡がCNNの画像分類に与える影響を検証.この論⽂ではOver-samplingが良いという結論. [21] Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data (Salman H. Khan et al., 2015) 損失⾏列⾃体を学習パラメータとして学習する.ただし,この論⽂内の損失関数は⼀般的なものとは少し異なった⽤いられ⽅をしており, 学習中におけるモデルの最終出⼒を調整するために⽤いられる. [22] Cost-Sensitive Learning with Neural Networks (Matjaz Kukarˇ and Igor Kononenko, 1998) [23] Deep Learning for Imbalance Data Classification using Class Expert Generative Adversarial Network Generative Adversarial Network (Fanny and Tjeng Wawan, Cenggoro, 2018) [24] Dynamic sampling in convolutional neural networks for imbalanced data classification (Samira Pouyanfar et al., 2018) [25] Effective data generation for imbalanced learning using conditional generative adversarial networks (Georgios Douzas and Fernando Bacao, 2018) クラスラベルで条件付けしてGANを学習し,学習したGeneratorでover-samplingを⾏う. [26] Focal Loss for Dense Object Detection (Tsung-Yi Lin et al., 2017) [27] Imbalanced Deep Learning by Minority Class Incremental Rectification (Qi Dong et al., 2018) 深層距離学習と関連がある研究. Minorityクラスのhard-positiveやhard-negativeを探し,triplet-lossに似たlossを⽤いて学習する.
Deep Representation for Imbalanced Classification (Chen Huang et al., 2016) [27]の論⽂の先⾏研究に位置する. quintuplet(5つ組)に対してロスを定義することで,クラス内のデータ構造(クラス内クラスターの存在) も考慮に⼊れて距離学習を⾏う. [29] Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning (Hansang Lee et al., 2016) [30] Training deep neural networks on imbalanced data sets (Shoujin Wang et al., 2016) 少数派クラスと多数派クラスのロスの平均を⽬的関数とする,Mean False Error(MFE)とMean Squared False Error(MSFE)を提案.