there is class imbalance ◦ Majority dominates the dataset ◦ Even small level of imbalance make a difference • When Learning process suffer due to imbalanced data set ◦ Minority classes are more important or can’t be sacrifice
Undersampling ◦ Decreases the majority class examples ▪ may lose useful information. • Oversampling ◦ Increases the minority class examples ▪ Overfitting
majority class examples such that more informative examples are kept. • One sided sampling ◦ Initial subset contains all the examples ◦ Randomly remove the example from subset ◦ Constructed 1-NN classifier to classify
synthetic examples instead of exact copies to reduce the risk of overfitting • SMOTE ◦ Select random example ◦ Select one of its neighbors ◦ Create a new example
• Algorithm ◦ Generating synthetic examples in each iteration ◦ Training classifier on new data ◦ Computing the loss of new classifier ◦ Update the weights(adaboost) • Advantages: More diversity to the data