VS in FDA Short Course I

1 SHORT COURSE: REVIEW OF VARIABLE SELECTION METHODS JEFF GOLDSMITH,
PHD DEPARTMENT OF BIOSTATISTICS

2 Outline of short course • Review variable selection methods
• Scalar-on-function regression • Function-on-scalar regression • Other models and approaches Throughout, we will emphasize methods and code.

3 VS in linear models • X is an n
x p matrix • p is large (often >> n) • OLS estimates are highly variable or unidentifiable • Variable selection methods remove unnecessary predictors from the model • Focus on automated approaches, rather than model building • Emphasis is often on prediction accuracy, though post-selection methods for inference exist y = X + ✏

4 Methods for VS • Shrinkage penalties • Lasso •
SCAD • MCP • Bayesian variable selection • Spike and slab priors • Shrinkage priors • Other methods

5 Penalty-based approaches • Minimize wrt • Find solution for
fixed • Find best solution across values of , often via cross validation (y X )T (y X ) + p X k=1 p (| k |)

6 Available penalties • Most common penalty is Lasso: •
Compared to ridge, penalizes magnitude of coefficients and encourages sparsity: p ( ) = | | Figure from ISLR

7 Available penalties • Lasso has non-zero derivative, so even
large coefficients are shrunk • Recent alternatives have sought to address this • Smoothly clipped absolute deviation penalty (SCAD) • Minimax concave penalty (MCP) • For example, solutions under SCAD penalty are

8 Available penalties Figure from “NEARLY UNBIASED VARIABLE SELECTION UNDER
MINIMAX CONCAVE PENALTY “

9 Bayesian VS • Most common approach is the “Spike
and Slab” approach • One narrow prior and one wide prior, with a latent binary indicator separating predictors between the two • Often the spike is a point-mass at zero, but recent work has used a narrow but continuous prior • Recently, shrinkage priors (e.g. Horseshoe prior; Dirichlet-Laplace prior) have become popular • Good computational and (sometimes) theoretical properties

10 Emphasizing penalties • For this short course, we’ll emphasize
the penalty-based approach • Very popular outside FDA • Very accessible code and packages

11 Group VS • The preceding has focused on shrinking
individual covariates to zero • We’ll shortly see that shrinking groups of variables to zero is useful for our purposes today • To that end, group variable selection methods are needed: • Group versions of each major penalty exist (y X )T (y X ) + G X g=1 p (|| g ||)

12 Composite penalties • Composite penalties will come up in
passing, but not be a major emphasis of today’s material • Briefly, multiple penalties can be used in the same minimization problem: (y X )T (y X ) + p X k=1 p1, 1 (| k |) + p X k=1 p2, 2 (| k |)

13 Key references • Fan and Li (2001). Variable Selection
via Nonconcave Penalized Likelihood and its Oracle Properties. JASA. • Zhang (2010). Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Annals of Statistics. • Rockova and George (2014). EMVS: The EM Approach to Bayesian Variable Selection. JASA.

14 Switch to code

VS in FDA Short Course I

VS in FDA Short Course I

Jeff Goldsmith

More Decks by Jeff Goldsmith

Featured

Transcript

1 SHORT COURSE: REVIEW OF VARIABLE SELECTION METHODS JEFF GOLDSMITH,

2 Outline of short course • Review variable selection methods

3 VS in linear models • X is an n

4 Methods for VS • Shrinkage penalties • Lasso •

5 Penalty-based approaches • Minimize wrt • Find solution for

6 Available penalties • Most common penalty is Lasso: •

7 Available penalties • Lasso has non-zero derivative, so even

8 Available penalties Figure from “NEARLY UNBIASED VARIABLE SELECTION UNDER

9 Bayesian VS • Most common approach is the “Spike

10 Emphasizing penalties • For this short course, we’ll emphasize

11 Group VS • The preceding has focused on shrinking

12 Composite penalties • Composite penalties will come up in

13 Key references • Fan and Li (2001). Variable Selection

14 Switch to code