Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Risk Estimation for the Group Lasso

Risk Estimation for the Group Lasso

SPARS'13, EPFL, Lausanne, August 2013

Samuel Vaiter

July 08, 2013
Tweet

More Decks by Samuel Vaiter

Other Decks in Science

Transcript

  1. Risk Estimation for the Group Lasso (and its degrees of

    freedom) Samuel VAITER CNRS, Université Paris-Dauphine, France Joint work with C. DELEDALLE, G. PEYRÉ, J. FADILI and C. DOSSAL
  2. Linear Inverse Problems x0 RN unknown signal linear operator Y

    = x0 +W W white gaussian noise Y observations Forward model
  3. Linear Inverse Problems x0 RN unknown signal linear operator Y

    = x0 +W W white gaussian noise Y observations Forward model y = x0 + w Realization
  4. Linear Inverse Problems x0 RN unknown signal linear operator Objective

    Recover x0 from y Y = x0 +W W white gaussian noise Y observations Forward model y = x0 + w Realization
  5. Linear Inverse Problems x0 RN unknown signal linear operator Objective

    Recover x0 from y How ? Construction of an estimator x(y) Y = x0 +W W white gaussian noise Y observations Forward model y = x0 + w Realization
  6. Variational Regularization x (y) arg min x RN 1 2

    y x 2 + J(x) Fidelity Regularity
  7. Variational Regularization x (y) arg min x RN 1 2

    y x 2 + J(x) How ? Risk Estimation Objective Find such that x (y) x0 Fidelity Regularity
  8. Variational Regularization x (y) arg min x RN 1 2

    y x 2 + J(x) How ? Risk Estimation Objective Find such that x (y) x0 Fidelity Regularity Notations µ (y) = x (y) µ0 = x0
  9. Risk Estimation RId( ) Objective Minimize RA( ) over R+

    projection risk AΦ = Φ∗(Φ∗Φ)+Φ = PKerΦ⊥ RA( ) = EW (Aµ0 Aµ (Y ))
  10. (Generalized) Stein Unbiased Risk Estimation Sensitivity analysis: If µ weakly

    differentiable µ(y + ) = µ(y)+ µ(y)· +O( 2) (Generalized) Stein Unbiased Risk Estimator GSUREA(y) = Ay Aµ(y) 2 2tr(A A)+2 2 ˆ dfA (y) ˆ dfA (y) = tr(A µ(y) A )
  11. (Generalized) Stein Unbiased Risk Estimation Sensitivity analysis: If µ weakly

    differentiable µ(y + ) = µ(y)+ µ(y)· +O( 2) (Generalized) Stein Unbiased Risk Estimator GSUREA(y) = Ay Aµ(y) 2 2tr(A A)+2 2 ˆ dfA (y) ˆ dfA (y) = tr(A µ(y) A ) Lemma (Stein, ‘81 - Eldar, ‘09 - V. et al, ’12 ) EW (GSUREA(Y )) = EW (Aµ0 Aµ(Y ))
  12. Lasso and Group Lasso supp(x) = {i : xi =

    0} x 1 = |x1|+|x2|+|x3| x 1 = N i=1 |xi | [Dossal et al., 2012]
  13. Lasso and Group Lasso supp(x) = {i : xi =

    0} x 1 = |x1|+|x2|+|x3| x 1 = N i=1 |xi | x B = b∈B xb suppB (x) = {b : xb = 0} x B = x2 1 + x2 2 +|x3| [Dossal et al., 2012]
  14. Lasso and Group Lasso supp(x) = {i : xi =

    0} x 1 = |x1|+|x2|+|x3| x 1 = N i=1 |xi | x B = b∈B xb suppB (x) = {b : xb = 0} x B = x2 1 + x2 2 +|x3| J(x) = b Lb(x) p where Lb : RN R|b| are linear operators Extension to [Dossal et al., 2012]
  15. Local Variations x(y) arg min x RN 1 2 y

    x 2 + J(x) µ(y)= arg min x RN 1 2 y x 2 + J(x)
  16. Local Variations x(y) arg min x RN 1 2 y

    x 2 + J(x) µ(y)= arg min x RN 1 2 y x 2 + J(x) Theorem y µ(y) is Lipschitz (hence weakly differentiable)
  17. Local Variations x(y) arg min x RN 1 2 y

    x 2 + J(x) µ(y)= arg min x RN 1 2 y x 2 + J(x) Theorem y µ(y) is Lipschitz (hence weakly differentiable) What’s next ? Compute µ(y) on a full domain set
  18. Local Support Constancy Lemma suppB (x( ¯ y)) = suppB

    (x(y)) for ¯ y close to y H . H = I I b I HI,b HI,b = bd y {(y,xI ) Rb RI, : b (y I xI ) = I ( I xI y)+ N (xI ) = 0} Normalization operator Transition space N (x) = xg xg g I
  19. Structure of the Transition Space · 1 0 1 1

    1 1 1 1 2 2 2 2 2 2 · B 0 1 1 2 2 3 3 3 3
  20. Structure of the Transition Space · 1 0 1 1

    1 1 1 1 2 2 2 2 2 2 · B 0 1 1 2 2 3 3 3 3 Lemma H has zero-measure (more precisely, semi-algebraic set)
  21. Differential Computation x( ¯ y) arg min x RN 1

    2 ¯ y x 2 + J(x) First-order condition on I = suppB (x(y)) (xI ( ¯ y), ¯ y) = I ( I xI ( ¯ y) ¯ y)+ N (xI ( ¯ y))
  22. Differential Computation x( ¯ y) arg min x RN 1

    2 ¯ y x 2 + J(x) I (xI ) = (xI , ¯ y) = I I + xI PxI Lemma x(y) with I (xI (y)) invertible and I = suppB (x(y)) First-order condition on I = suppB (x(y)) (xI ( ¯ y), ¯ y) = I ( I xI ( ¯ y) ¯ y)+ N (xI ( ¯ y))
  23. Differential Computation x( ¯ y) arg min x RN 1

    2 ¯ y x 2 + J(x) I (xI ) = (xI , ¯ y) = I I + xI PxI Lemma x(y) with I (xI (y)) invertible and I = suppB (x(y)) First-order condition on I = suppB (x(y)) (xI ( ¯ y), ¯ y) = I ( I xI ( ¯ y) ¯ y)+ N (xI ( ¯ y)) Corollary y H , µI (y) = I I (x(y)) 1 I Implicit function theorem
  24. Degrees of Freedom Main Theorem where I = suppB (x

    ) and x any solution such that I (x I ) is invertible GSUREA(Y ) = Ay Aµ(Y ) 2 2tr(A A) +2 2tr(A I I (x ) 1 I A ) is an unbiased estimate of the A-risk
  25. In Practice: Tomography +y x opt (y) x0 J(x) =

    x B sub-sampled radon transform 16 measures
  26. Conclusion Open Problems – Fast algorithms to compute – Extension

    to piecewise regular gauges sensitivity analysis risk estimation parameter selection