Reviving Failed Classifiers with Random Forests

Reviving Failed Classifiers with Random Forests

Slides are a work-in-progress, may not contain all references and attributions.

Ed09e933a899fcae158439f11f66fed0?s=128

Emaad Manzoor

May 17, 2013
Tweet

Transcript

  1. 2.

    Derived From Antonio Criminisi, Jamie Shotton, and Ender Konukoglu, Decision

    Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning, Foundations and Trends in Computer Graphics and Vision, 2012.
  2. 3.

    Why Random Forests? Very good empirical performance Fast to test

    and train on large data Can deal with a huge number of features Naturally multi-class Noise-resistant, generalizes well Non-parametric Easy to use, just two hyperparameters
  3. 4.

    Why Random Forests? XBOX Kinect Fast body-part recognition Kaggle Malicious

    URL Classification 1 million features 1 million unlabeled samples 50,000 labeled samples Kaggle Air Quality Prediction Kaggle Travel Time Prediction …
  4. 9.

    What are Decision Trees? Gender Car Owner? Travel Cost /

    Km Income Level Transportation Mode M 0 Cheap Low Bus M 1 Cheap Medium Bus F 1 Cheap Medium Train F 0 Cheap Low Bus M 1 Cheap Medium Bus M 0 Standard Medium Train F 1 Standard Medium Train F 1 Expensive High Car M 2 Expensive Medium Car F 2 Expensive High Car
  5. 10.

    What are Decision Trees? Travel Cost/Km Car Gender Bus Train

    Car Ownership Bus Train Standard Female
  6. 11.

    What are Decision Trees? Gender Car Owner? Travel Cost /

    Km Income Level Transportation Mode M 1 Standard High ? M 0 Cheap Medium ? F 1 Cheap High ?
  7. 12.

    What are Decision Trees? Travel Cost/Km Car Gender Bus Train

    Car Ownership Bus Train Standard Female
  8. 13.

    What are Decision Trees? Gender Car Owner? Travel Cost /

    Km Income Level Transportation Mode M 1 Standard High Train M 0 Cheap Medium Bus F 1 Cheap High Train
  9. 14.

    Measuring Node Impurity GiniIndex =1− p j 2 j ∑

    Entropy = − p j log 2 (p j ) j ∑
  10. 15.

    Measuring Node Impurity P(Bus) = 4 10 P(Train) = 3

    10 P(Car) = 3 10 Transportation Mode Bus Bus Train Bus Bus Train Train Car Car Car Entropy = − p j log 2 (p j ) j ∑ ⇒ − 4 10 log 2 ( 4 10 ) + 3 10 log 2 ( 3 10 ) + 3 10 log 2 ( 3 10 ) $ % & & & & & & & & & & ' ( ) ) ) ) ) ) ) ) ) ) ⇒1.571
  11. 16.

    Measuring Node Impurity P(Bus) = 4 10 P(Train) = 3

    10 P(Car) = 3 10 Transportation Mode Bus Bus Train Bus Bus Train Train Car Car Car GiniIndex =1− p j 2 j ∑ ⇒1− 16 100 + 9 100 + 9 100 $ % & ' ( ) ⇒ 0.660
  12. 19.

    How do we split? Travel Cost / Km Transportation Mode

    Cheap Bus Cheap Bus Cheap Train Cheap Bus Cheap Bus Standard Train Standard Train Expensive Car Expensive Car Expensive Car Cheap Bus Cheap Bus Cheap Train Cheap Bus Cheap Bus Standard Train Standard Train Expensive Car Expensive Car Expensive Car Entropy = 0.722 GiniIndex = 0.320 Entropy = 0.0 GiniIndex = 0.0 Entropy = 0.0 GiniIndex = 0.0
  13. 21.

    How do we split? InfoGain =1.571−( 5 10 *0.722 +

    2 10 *0 + 3 10 *0) =1.210 Bus Bus Train Bus Bus Train Train Car Car Car Cheap Bus Cheap Bus Cheap Train Cheap Bus Cheap Bus Standard Train Standard Train Expensive Car Expensive Car Expensive Car Entropy = 0.722 GiniIndex = 0.320 Entropy = 0.0 GiniIndex = 0.0 Entropy = 0.0 GiniIndex = 0.0 Entropy =1.571 GiniIndex = 0.660
  14. 22.

    How do we train? Greedy: At each node, split on

    the feature giving the maximum information gain
  15. 23.

    How do we train? Gain Measure Car Ownership Gender Travel

    Cost / KM Income Level Entropy 0.534 0.125 1.210 0.695 Gini Index 0.207 0.060 0.500 0.293
  16. 24.

    How do we train? M 0 Cheap Low Bus M

    1 Cheap Medium Bus F 1 Cheap Medium Train F 0 Cheap Low Bus M 1 Cheap Medium Bus M 0 Standard Medium Train F 1 Standard Medium Train Gender Car Owner? Travel Cost / Km Income Level Transportation Mode F 1 Expensive High Car M 2 Expensive Medium Car F 2 Expensive High Car
  17. 26.

    How do we train? Second Iteration Since expensive and standard

    travel costs have lead to pure classes, we don’t need these data instances any longer.
  18. 27.

    How do we train? Travel Cost/Km Car Train Standard Gender

    Car Owner? Travel Cost / Km Income Level Transportation Mode M 0 Cheap Low Bus M 1 Cheap Medium Bus F 1 Cheap Medium Train F 0 Cheap Low Bus M 1 Cheap Medium Bus
  19. 28.

    How do we train? Travel Cost/Km Car Gender Bus Train

    Further training Standard Female
  20. 29.

    How do we train? Travel Cost/Km Car Gender Bus Train

    Car Ownership Bus Train Standard Female
  21. 30.

    What are the issues? Greedy approach tends to overfit; some

    sort of regularization is required by pruning tree depth.
  22. 31.

    What are the issues? Low bias, extremely high variance; perturbing

    your training data even a little results in a completely different tree.
  23. 35.

    Random Forests High performance without parameter tuning Can be used

    for classification, regression & clustering. Just two parameters, but performance is not very sensitive to them, and the rules of thumb work very well. Built-in measure of generalization error (Out-Of-Bag Error). Fast in training and prediction, embarrassingly parallel. Intrinsically multi-class.
  24. 36.
  25. 38.

    What is bagging? Construct multiple bootstrap training sets by sampling

    with replacement. Reduces the variance of the ensemble. Gives us a smooth decision bounday.
  26. 39.

    What are random splits? Pick a random subset of the

    features at each node, and a random subset of thresholds.
  27. 44.

    How random? How many features and thresholds to try? One:

    extremely random Few: Fast training, may underfit, may go too deep Many: Slower training, may overfit
  28. 45.

    How random? When do you stop growing a tree? Maximum

    depth Minimum entropy gain Delta class distribution Pruning
  29. 47.