Upgrade to Pro — share decks privately, control downloads, hide ads and more …

VS in FDA Short Course I

Jeff Goldsmith
March 22, 2017
700

VS in FDA Short Course I

Jeff Goldsmith

March 22, 2017
Tweet

Transcript

  1. 2 Outline of short course • Review variable selection methods

    • Scalar-on-function regression • Function-on-scalar regression • Other models and approaches Throughout, we will emphasize methods and code.
  2. 3 VS in linear models • X is an n

    x p matrix • p is large (often >> n) • OLS estimates are highly variable or unidentifiable • Variable selection methods remove unnecessary predictors from the model • Focus on automated approaches, rather than model building • Emphasis is often on prediction accuracy, though post-selection methods for inference exist y = X + ✏
  3. 4 Methods for VS • Shrinkage penalties • Lasso •

    SCAD • MCP • Bayesian variable selection • Spike and slab priors • Shrinkage priors • Other methods
  4. 5 Penalty-based approaches • Minimize wrt • Find solution for

    fixed • Find best solution across values of , often via cross validation (y X )T (y X ) + p X k=1 p (| k |)
  5. 6 Available penalties • Most common penalty is Lasso: •

    Compared to ridge, penalizes magnitude of coefficients and encourages sparsity: p ( ) = | | Figure from ISLR
  6. 7 Available penalties • Lasso has non-zero derivative, so even

    large coefficients are shrunk • Recent alternatives have sought to address this • Smoothly clipped absolute deviation penalty (SCAD) • Minimax concave penalty (MCP) • For example, solutions under SCAD penalty are
  7. 9 Bayesian VS • Most common approach is the “Spike

    and Slab” approach • One narrow prior and one wide prior, with a latent binary indicator separating predictors between the two • Often the spike is a point-mass at zero, but recent work has used a narrow but continuous prior • Recently, shrinkage priors (e.g. Horseshoe prior; Dirichlet-Laplace prior) have become popular • Good computational and (sometimes) theoretical properties
  8. 10 Emphasizing penalties • For this short course, we’ll emphasize

    the penalty-based approach • Very popular outside FDA • Very accessible code and packages
  9. 11 Group VS • The preceding has focused on shrinking

    individual covariates to zero • We’ll shortly see that shrinking groups of variables to zero is useful for our purposes today • To that end, group variable selection methods are needed: • Group versions of each major penalty exist (y X )T (y X ) + G X g=1 p (|| g ||)
  10. 12 Composite penalties • Composite penalties will come up in

    passing, but not be a major emphasis of today’s material • Briefly, multiple penalties can be used in the same minimization problem: (y X )T (y X ) + p X k=1 p1, 1 (| k |) + p X k=1 p2, 2 (| k |)
  11. 13 Key references • Fan and Li (2001). Variable Selection

    via Nonconcave Penalized Likelihood and its Oracle Properties. JASA. • Zhang (2010). Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Annals of Statistics. • Rockova and George (2014). EMVS: The EM Approach to Bayesian Variable Selection. JASA.