Upgrade to Pro — share decks privately, control downloads, hide ads and more …

VS in FDA Short Course III

VS in FDA Short Course III

Jeff Goldsmith

March 23, 2017
Tweet

More Decks by Jeff Goldsmith

Other Decks in Education

Transcript

  1. 2 Linear FoSR • Functional response yi • Scalar predictor

    xi • Functional covariate is of interest • Linear model • Most common approach yi( t ) = 0( t ) + p X l=1 xil l( t ) + ✏i( t )
  2. 3 Basis expansion • The functional coefficient is usually expanded

    in terms of a basis • Several basis options are possible • FPC • Splines (my preference) • Wavelets • Fourier
  3. 4 Basis expansion • For response data on a common

    finite grid, the model can be expressed • Y is the matrix of row-stacked responses • X is the usual design matrix • is the matrix of basis functions evaluated over the common grid • B is the matrix of basis coefficients • E is the matrix of row-stacked errors Y = XB T + E
  4. 5 Recast model • By vectorizing the response and the

    linear predictor, we obtain the equivalent model formulation vec(Y T ) = (X ⌦ )vec(BT ) + vec(ET ) • vec() concatenates the columns of the matrix argument • is the kronecker product • This reformulates function-on-scalar regression as a usual least- squares problem • Goal is to estimate the columns of B or, equivalently, the elements of vec(B) ⌦
  5. 6 Variable selection • No limit on the size of

    X; number of predictors can be quite large • Such cases necessitate the need for variable selection in this context • As in scalar-on-function regression, variable selection here means • Again, can be accomplished through group variable selection l(t) = 0 8t
  6. 8 Sparse or incomplete data • The preceding assumed that

    all curves are observed over the same domain, but this is not always the case • Could smooth or interpolate, but this isn’t my preferred solution • The kronecker product representation is essentially a convenience • For a subject i observed over times tij one can instead use • Stacking the elements of this model produces a similar formulation to the previous model, but uses subject-specific expansions yi( tij) = ( xi ⌦ ( tij)) vec ( B T ) + ✏i( tij)
  7. 9 Correlated errors • Errors are correlated within a subject,

    but variable selection methods assume independent errors • Three approaches: • Ignore this issue • Use GLS in place of OLS by “pre-whitening” the left and right side of the matrix formulation of the model: • I.e. define where is the error covariance matrix, and similarly modify the RHS • Jointly model the coefficient vector and the residual covariance • Easiest in a Bayesian setting ✏i(t) ⌃ = LLT Y ⇤ = Y (L 1)T
  8. 10 Smoothness constraints • The preceding does not include smoothness

    constraints on estimated coefficients • Such constraints often take the form of a penalty l Z [ l(t)00]2dt • Can be expressed in terms of a ridge penalty on the basis coefficients • Here, this would require the use of composite penalties and additional computational burden
  9. 11 Key references • Wang, Chen, and Li, (2007). Group

    SCAD regression analysis for microarray time course gene expression data. Bioinformatics. • Chen, Goldsmith, and Ogden (2016). Variable Selection in Function-on- Scalar Regression. Stat • Barber, Reimherr, and Schill (Submitted). The Function-on-Scalar LASSO with Applications to Longitudinal GWAS. • Parodi and Reimherr (Submitted). FLAME: Simultaneous variable selection and smoothing for high-dimensional function-on-scalar regression.