Slide 3
Slide 3 text
Related Works
Wrapper Methods
Find a subset of features F ⊂ X that provide a minimal loss with a classifier C(X, y, F ).
Typically provide a smaller loss on a data sets then embedded and filter-based feature selection
methods. F may vary depending on the choice of the classifier.
examples: SVM-RFE, distributed wrapper, small loss + high complexity
Embedded Methods
Optimize the parameters of the classifier and the feature selection at the same time.
θ∗ = arg min
θ∈Θ
{E[ (θ, D)] + Ω(θ)} = arg min
θ∈Θ
y − XTθ 2
2
+ λ θ 1
examples: LASSO, Elastic-net, streamwise feature selection, online feature selection
Filter Methods
Find a subset of features F ⊂ X that maximize a function J(X) that is not tied to classification loss
(classifier independent).
Generally faster than wrapper and embedded methods, but we cannot assume F will produce minimal
loss.
examples: RELIEF, mRMR, Cond. likelihood maximization, submodular
Big Feature Subset Selection (Extensions for Neyman-Pearson Feature Selection) Ditzler, Austen, Rosen & Polikar (CIDM 2014)