Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning from imbalanced data

Avatar for Toni Pizà Toni Pizà
January 25, 2017
16

Learning from imbalanced data

Lightning Talk / PyData Mallorca, January 2017 at Parc Bit

Avatar for Toni Pizà

Toni Pizà

January 25, 2017
Tweet

Transcript

  1. Class‐imbalance problem One of the classes is strongly underrepresented Examples

    Quality control Fraud / Network intrusion detection Detection of oil spills Medical diagnosis Customer churn 2
  2. Class‐imbalance problem Most classi cation algorithms assume balanced distributions Di

    culties learning the concepts related to minority class Different cost of misclassi cation Accuracy paradox 3
  3. Approaches to the problem Methods at algorithm level (cost function

    based) Methods at data level (sampling based) 5
  4. Approaches to the problem Methods at algorithm level (cost function

    based) Methods at data level (sampling based) Under sampling Over sampling Ensemble methods (Sampling + Boosting) Change the distribution of the imbalanced data sets, to provide the learner with balanced data to improve the detection rate of the minority class. 6
  5. References · Learning from Imbalanced Data http://www.cs.utah.edu/~piyush/teaching/Imbalanc edLearning.pdf · SMOTE:

    Synthetic Minority Over-sampling Technique https://www.jair.org/media/953/live-953-2037- jair.pdf · On the Class Imbalance Problem http://sci2s.ugr.es/keel/pdf/speci c/congreso/guo_o n_2008.pdf 8