Reviving Failed Classifiers with Random Forests

Reviving Failed Classifiers With RandomForests emaadm@ | yo/techfm

Derived From Antonio Criminisi, Jamie Shotton, and Ender Konukoglu, Decision
Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning, Foundations and Trends in Computer Graphics and Vision, 2012.

Why Random Forests? Very good empirical performance Fast to test
and train on large data Can deal with a huge number of features Naturally multi-class Noise-resistant, generalizes well Non-parametric Easy to use, just two hyperparameters

Why Random Forests? XBOX Kinect Fast body-part recognition Kaggle Malicious
URL Classification 1 million features 1 million unlabeled samples 50,000 labeled samples Kaggle Air Quality Prediction Kaggle Travel Time Prediction …

What is Classification? All about finding the best separating hyper-plane

What is Classification? Binary, Multi-class Single-label, Multi-label

What is Classification? Example Feature Vector Trained Algorithm Class

What are Decision Trees? Gender Car Owner? Travel Cost /
Km Income Level Transportation Mode M 0 Cheap Low Bus M 1 Cheap Medium Bus F 1 Cheap Medium Train F 0 Cheap Low Bus M 1 Cheap Medium Bus M 0 Standard Medium Train F 1 Standard Medium Train F 1 Expensive High Car M 2 Expensive Medium Car F 2 Expensive High Car

What are Decision Trees? Travel Cost/Km Car Gender Bus Train
Car Ownership Bus Train Standard Female

Km Income Level Transportation Mode M 1 Standard High ? M 0 Cheap Medium ? F 1 Cheap High ?

What are Decision Trees? Travel Cost/Km Car Gender Bus Train

Km Income Level Transportation Mode M 1 Standard High Train M 0 Cheap Medium Bus F 1 Cheap High Train

Measuring Node Impurity GiniIndex =1− p j 2 j ∑
Entropy = − p j log 2 (p j ) j ∑

Measuring Node Impurity P(Bus) = 4 10 P(Train) = 3
10 P(Car) = 3 10 Transportation Mode Bus Bus Train Bus Bus Train Train Car Car Car Entropy = − p j log 2 (p j ) j ∑ ⇒ − 4 10 log 2 ( 4 10 ) + 3 10 log 2 ( 3 10 ) + 3 10 log 2 ( 3 10 ) $ % & & & & & & & & & & ' ( ) ) ) ) ) ) ) ) ) ) ⇒1.571

Measuring Node Impurity P(Bus) = 4 10 P(Train) = 3
10 P(Car) = 3 10 Transportation Mode Bus Bus Train Bus Bus Train Train Car Car Car GiniIndex =1− p j 2 j ∑ ⇒1− 16 100 + 9 100 + 9 100 $ % & ' ( ) ⇒ 0.660

Intuition behind the Gini Impurity

How do we split? Calculate the impurity for every feature
value pair

How do we split? Travel Cost / Km Transportation Mode
Cheap Bus Cheap Bus Cheap Train Cheap Bus Cheap Bus Standard Train Standard Train Expensive Car Expensive Car Expensive Car Cheap Bus Cheap Bus Cheap Train Cheap Bus Cheap Bus Standard Train Standard Train Expensive Car Expensive Car Expensive Car Entropy = 0.722 GiniIndex = 0.320 Entropy = 0.0 GiniIndex = 0.0 Entropy = 0.0 GiniIndex = 0.0

How do we split? InfoGain = purity(Parent)− n split N
purity( s=split ∑ Split)

How do we split? InfoGain =1.571−( 5 10 *0.722 +
2 10 *0 + 3 10 *0) =1.210 Bus Bus Train Bus Bus Train Train Car Car Car Cheap Bus Cheap Bus Cheap Train Cheap Bus Cheap Bus Standard Train Standard Train Expensive Car Expensive Car Expensive Car Entropy = 0.722 GiniIndex = 0.320 Entropy = 0.0 GiniIndex = 0.0 Entropy = 0.0 GiniIndex = 0.0 Entropy =1.571 GiniIndex = 0.660

How do we train? Greedy: At each node, split on
the feature giving the maximum information gain

How do we train? Gain Measure Car Ownership Gender Travel
Cost / KM Income Level Entropy 0.534 0.125 1.210 0.695 Gini Index 0.207 0.060 0.500 0.293

How do we train? M 0 Cheap Low Bus M
1 Cheap Medium Bus F 1 Cheap Medium Train F 0 Cheap Low Bus M 1 Cheap Medium Bus M 0 Standard Medium Train F 1 Standard Medium Train Gender Car Owner? Travel Cost / Km Income Level Transportation Mode F 1 Expensive High Car M 2 Expensive Medium Car F 2 Expensive High Car

How do we train? Travel Cost/Km Car Need to train
further Train Standard

How do we train? Second Iteration Since expensive and standard
travel costs have lead to pure classes, we don’t need these data instances any longer.

How do we train? Travel Cost/Km Car Train Standard Gender
Car Owner? Travel Cost / Km Income Level Transportation Mode M 0 Cheap Low Bus M 1 Cheap Medium Bus F 1 Cheap Medium Train F 0 Cheap Low Bus M 1 Cheap Medium Bus

How do we train? Travel Cost/Km Car Gender Bus Train
Further training Standard Female

How do we train? Travel Cost/Km Car Gender Bus Train

What are the issues? Greedy approach tends to overfit; some
sort of regularization is required by pruning tree depth.

What are the issues? Low bias, extremely high variance; perturbing
your training data even a little results in a completely different tree.

What are the issues? Monothetic splits have difficulty with correlated
data

And so it remained for more than 5 years...

Random Forests Invented (and trademarked) by Brieman, 2001 Also invented
CART, 1984

Random Forests High performance without parameter tuning Can be used
for classification, regression & clustering. Just two parameters, but performance is not very sensitive to them, and the rules of thumb work very well. Built-in measure of generalization error (Out-Of-Bag Error). Fast in training and prediction, embarrassingly parallel. Intrinsically multi-class.

Cons? Mathematically ill understood. Hard to analyze theoretically. Work as
a black-box, in contrast to decision trees.

Random Forests Merges two important ideas: bagging and random split
selection

What is bagging? Construct multiple bootstrap training sets by sampling
with replacement. Reduces the variance of the ensemble. Gives us a smooth decision bounday.

What are random splits? Pick a random subset of the
features at each node, and a random subset of thresholds.

A random example. x y

How random? How many features and thresholds to try? One:
extremely random Few: Fast training, may underfit, may go too deep Many: Slower training, may overfit

How random? When do you stop growing a tree? Maximum
depth Minimum entropy gain Delta class distribution Pruning

Naïve Bayes? Random Ferns

References ICCV 2009 Tutorial on Random Forests MSR Technical Report
on Random Forests

Reviving Failed Classifiers with Random Forests

Reviving Failed Classifiers with Random Forests

More Decks by Emaad Manzoor

Other Decks in Science

Featured

Transcript