Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ACTIVE SUBSPACES: Emerging Ideas for Dimension ...

Paul Constantine
December 04, 2015

ACTIVE SUBSPACES: Emerging Ideas for Dimension Reduction in Parameter Studies

Applied Math Colloquium at CU Boulder, December 4, 2015 and Princeton Program in Applied and Computational Math Colloquium, December 14, 2015.

Paul Constantine

December 04, 2015
Tweet

More Decks by Paul Constantine

Other Decks in Science

Transcript

  1. Qiqi Wang MIT, AeroAstro ACTIVE SUBSPACES Emerging Ideas for Dimension

    Reduction in Parameter Studies This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0011077. Thanks to: David Gleich Purdue, CS PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Applied Mathematics & Statistics Colorado School of Mines activesubspaces.org! @DrPaulynomial! SLIDES: DISCLAIMER: These slides are meant to complement the oral presentation. Use out of context at your own risk.
  2. Tell me about your models. •  Do you work with

    a computer simulation? •  What are the inputs? What are the outputs? •  What is the question you want to answer?
  3. f( x ) Uncertainty quantification for hypersonic scramjets (with G.

    Iaccarino, J. Larsson, M. Emory) Sensitivity analysis in hydrological models (with R. Maxwell, J. Jefferson, J. Gilbert) Shape optimization in aerospace vehicles (with J. Alonso, T. Lukaczyk)
  4. f( x ) Sensitivity analysis in solar cell models (with

    B. Zaharatos, M. Campanelli) Sensitivity analysis and reduced order models in HIV modeling (with T. Loudon, S. Pankavich) Annular combustor models (with M. Bauerheim, S. Moreau, F. Nicoud) Calibration of an atmospheric reentry vehicle model (with P. Congedo, T. Magin)
  5. APPROXIMATION OPTIMIZATION INTEGRATION ˜ f( x ) ⇡ f( x

    ) Z f( x ) ⇢ d x minimize x f( x )
  6. Dimension 10 points / dimension 1 second / evaluation 1

    10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) REDUCED-ORDER MODELS
  7. Dimension 10 points / dimension 1 second / evaluation 1

    10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) BETTER DESIGNS
  8. Dimension 10 points / dimension 1 second / evaluation 1

    10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) DIMENSION REDUCTION
  9. f( x ) ⇡ p X k=1 ak k( x

    ), k a k0 ⌧ p f ( x ) ⇡ f1( x1) + · · · + fm( xm) There are many structure-exploiting methods for high-dimensional approximation. STRUCTURE METHODS Sparse grids, hyperbolic crosses, HDMR / ANOVA, … f ( x ) ⇡ r X k=1 fk( x1) · · · fk( xm) Low-rank methods, separation of variables, TT, ACA, PGD, … Compressed sensing, …
  10. Define the active subspace. Consider a function and its gradient

    vector, The average outer product of the gradient and its eigendecomposition, Partition the eigendecomposition, Rotate and separate the coordinates, ⇤ =  ⇤1 ⇤2 , W = ⇥ W 1 W 2 ⇤ , W 1 2 Rm⇥n x = W W T x = W 1W T 1 x + W 2W T 2 x = W 1y + W 2z active variables inactive variables f = f( x ), x 2 Rm, rf( x ) 2 Rm, ⇢ : Rm ! R + C = Z rf rfT ⇢ d x = W ⇤W T
  11. i = Z rfT wi 2 ⇢ d x ,

    i = 1, . . . , m The eigenpairs identify perturbations that change the function more, on average. LEMMA LEMMA Z (ryf)T (ryf) ⇢ d x = 1 + · · · + n Z (rzf)T (rzf) ⇢ d x = n+1 + · · · + m
  12. Discover the active subspace with random sampling. Draw samples: Compute:

    and fj = f( xj) Approximate with Monte Carlo Equivalent to SVD of samples of the gradient Called an active subspace method in T. Russi’s 2010 Ph.D. thesis, Uncertainty Quantification with Experimental Data in Complex System Models xj ⇠ ⇢ C ⇡ 1 N N X j=1 rfj rfT j = ˆ W ˆ ⇤ ˆ W T 1 p N ⇥ rf1 · · · rfN ⇤ = ˆ W p ˆ ⇤ ˆ V T rfj = rf( xj)
  13. Using Gittens and Tropp (2011) How many gradient samples? Bound

    on gradient norm squared Relative accuracy Dimension (with high probability) N = ⌦ ✓ L2 1 2 k "2 log( m ) ◆ = ) | k ˆk |  " k
  14. Gittens and Tropp (2011), Golub and Van Loan (1996), Stewart

    (1973) Bound on gradient norm squared Relative accuracy Dimension (with high probability) Spectral gap N = ⌦ ✓ L2 1"2 log( m ) ◆ = ) dist ( W 1, ˆ W 1)  4 1" n n+1 How many gradient samples?
  15. 1 p N ⇥ rf1 · · · rfN ⇤

    ⇡ ˆ W 1 q ˆ ⇤1 ˆ V T 1 Low-rank approximation of the collection of gradients: Let’s be abundantly clear about the problem we are trying to solve. Low-dimensional linear approximation of the gradient: rf( x ) ⇡ ˆ W 1 a ( x ) f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ Approximate a function of many variables by a function of a few linear combinations of the variables: ✔   ✖   ✖  
  16. f( x ) ⇡ g ⇣ ˆ W T 1

    x ⌘ APPROXIMATION
  17. f( x ) ⇡ g ⇣ ˆ W T 1

    x ⌘ How do you construct g? What is the approximation error? What is the effect of the approximate eigenvectors?
  18. How do you construct g? f( x ) = f(W

    1y + W 2z ) g(y) = f(W 1y) f( x ) ⇡ g(W T 1 x ) = f(W 1W T 1 x ) f( x ) ⇡ g (W T 1 x ) We know … How about … So that … x 1 -1 0 1 x 2 -1 0 1 But remember … i = Z (rfT wi)2 ⇢ d x
  19. How do you construct g? f( x ) = f(W

    1y + W 2z ) g(y) = f(W 1y) f( x ) ⇡ g(W T 1 x ) = f(W 1W T 1 x ) f( x ) ⇡ g (W T 1 x ) We know … How about … So that … x 1 -1 0 1 x 2 -1 0 1 But remember … i = Z (rfT wi)2 ⇢ d x
  20. Define the conditional expectation: Define the Monte Carlo approximation: THEOREM

    g( y ) = Z f(W 1y + W 2z ) ⇢( z | y ) d z , f( x ) ⇡ g(W T 1 x ) ˆ g(y) = 1 N N X i=1 f(W 1y + W 2zi), zi ⇠ ⇢(z|y) Exploit active subspaces for response surfaces with conditional averaging. ✓Z ⇣ f( x ) g(W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ( n+1 + · · · + m)1 2 ✓Z ⇣ f( x ) ˆ g(W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ 1 + N 1 2 ⌘ ( n+1 + · · · + m)1 2 THEOREM
  21. " = dist(W 1, ˆ W 1) = W 1W

    T 1 ˆ W 1 ˆ W T 1 ✓Z ⇣ f( x ) g( ˆ W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ " ( 1 + · · · + n)1 2 + ( n+1 + · · · + m)1 2 ⌘ Subspace error Eigenvalues for active variables Eigenvalues for inactive variables Define the subspace error: THEOREM Exploit active subspaces for response surfaces with conditional averaging.
  22. 1.  Choose points in the domain of g. 2.  Estimate

    conditional average at each point. 3.  Construct the approximation* in n < m dimensions. ( * Don’t interpolate! )
  23. D 1 2 The parameterized PDE The coefficients r ·

    (aru) = 1, s 2 D u = 0, s 2 1 n · aru = 0, s 2 2 log(a(s, x)) = m X k=1 p ✓k k(s) xk Cov( s1, s2) = 2 exp ✓ ks1 s2 k1 2 ` ◆ The quantity of interest f( x ) = Z u( s , x ) d 2 •  100-term KL •  Gaussian r.v.’s Rough fields Spatial average over Neumann boundary
  24. f( x ) ⇢( x ) rf( x ) PDE

    solution’s spatial average along the Neumann boundary Standard Gaussian density Gradient computed with adjoint equations
  25. Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12

    10-11 10-10 10-9 10-8 10-7 10-6 Est BI Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Est BI Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Long length scale, ` = 1 Short length scale, ` = 0.01 Eigenvalues of 1 N N X j=1 rfj rfT j
  26. Subspace Dimension 1 2 3 4 5 6 Subspace Distance

    10-2 10-1 100 BI Est Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 BI Est Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 Long length scale, ` = 1 Short length scale, ` = 0.01 Estimates of subspace error " = dist (W 1, ˆ W 1)
  27. Parameter index 0 50 100 Eigenvector components -1 -0.5 0

    0.5 1 Parameter index 0 50 100 Eigenvector components -1 -0.5 0 0.5 1 First eigenvector of Long length scale, ` = 1 Short length scale, ` = 0.01 1 N N X j=1 rfj rfT j
  28. -4 -2 0 2 4 Quantity of interest #10-3 0

    1 2 3 4 5 -4 -2 0 2 4 Quantity of interest #10-4 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Long length scale, ` = 1 Short length scale, ` = 0.01 f( x ) ⇡ g( ˆ w T 1 x ) ˆ w T 1 x ˆ w T 1 x Remember the goal! Plotting versus f( xj) ˆ w T 1 xj
  29. d = m ( x ) + e , e

    ⇠ N(0, 2I) ⇢lik(x , d) = exp( k d m(x) k2/ 2 2 ) ⇢ pos ( x ) = c 1 pos ⇢ lik ( x , d ) ⇢ pr ( x ) One slide on the Bayesian setup Additive Gaussian noise model: Standard Gaussian prior: Likelihood function from the noise model: Bayes’ Theorem gives the posterior: ⇢pr(x) = (2 ⇡ ) m 2 exp( x T x / 2)
  30. f( x ) = k d m ( x )k2/2

    2 rf( x ) = 1 2 2 r m ( x )T ( d m ( x )) 1.  A differentiable, scalar-valued function 2.  A weight function for averaging Two things define the active subspace. ⇢( x ) f( x ) The negative log-likelihood or data misfit: The standard Gaussian prior density: ⇢( x ) = ⇢pr( x )
  31. ⇢ pos(x) = c 1 pos exp( f (x)) ⇢

    pr(x) ⇢ pos(x) ⇡ ⇡ (x) = c 1 ⇡ exp( g ( W T 1 x)) ⇢ pr(x) = c 1 ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) ⇢pr( x ) = ⇢pr(W 1y + W 2z ) = ⇢pr( y , z ) = ⇢pr( y )⇢pr( z ) Approximate the posterior by exploiting the active subspace. The Gaussian prior is separable: Recall the posterior. And here’s how we approximate it.
  32. MCMC in n<m dimensions Independent sampling in m-n dimensions ⇢

    pos(x) ⇡ c 1 ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) Approximate the posterior by exploiting the active subspace.
  33. What is the approximation error? ⇢ pos(x) ⇡ c 1

    ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) Approximate the posterior by exploiting the active subspace.
  34. THEOREM Hellinger distance between posterior and approximation Eigenvalues for inactive

    variables The approximation error depends on the eigenvalues. ✓Z ⇣ ⇢ 1 2 pos ⇡ 1 2 ⌘ 2 d x ◆1 2  C ( n +1 + · · · + m)1 2
  35. An active subspace-exploiting MCMC Given and : 1.  Draw from

    a symmetric proposal centered at 2.  Compute the approximate misfit with Monte Carlo 3.  Compute the acceptance ratio 4.  Draw t from uniform, and set yk y0 k yk g(y0 k ) ⇡ PN j=1 f(W 1y0 k + W 2zj), zj ⇠ ⇢pr(z) g(yk) = exp( g ( y0 k)) ⇢pr( y0 k) exp( g ( yk)) ⇢pr( yk) yk+1 = ⇢ y0 k if t , yk otherwise. yk For each from the Markov chain: xk,` = W 1yk + W 2zk,`, zk,` ⇠ ⇢pr( z )
  36. 1 D “sensors” •  2nd-order finite difference •  100x100 mesh

    •  Automatic differentiation Invert for the subsurface with measurements on the surface.
  37. eigenvalues subspace error 1 D “sensors” •  2nd-order finite difference

    •  100x100 mesh •  Automatic differentiation Invert for the subsurface with measurements on the surface.
  38. 1 1.005 1.01 1.015 1.02 Step #105 -2.5 -2 -1.5

    -1 -0.5 0 0.5 1 1.5 2 2.5 State 1 1.005 1.01 1.015 1.02 Step #105 -3 -2 -1 0 1 2 3 4 State 0 100 200 300 400 500 Lag 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Autocorrelation Vanilla AS2, 0.1 AS2, 0.3 Autocorrelation function MCMC mixes better in lower dimensions. Standard Metropolis-Hastings (100d) 500k steps / 500k forward models Proposal variance 0.1 Acceptance ~60% Active-subspace accelerated (2d) 50k steps / 500k forward models Proposal variance 0.3 Acceptance ~62%
  39. Working in low dimensions introduces some bias. 0 20 40

    60 80 100 Index -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Centered posterior mean 0 20 40 60 80 100 Index -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Centered posterior variance
  40. -3 -2 -1 0 1 2 3 y2 -3 -2

    -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y3 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y3 -3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y3 1 2 3 2 3 4 The 100d chain shows evidence of a 2d active subspace.
  41. Z f( x ) ⇢ d x = Z Z

    f(W 1y + W 2z ) ⇢( z | y ) d z | {z } = g(y) ⇢( y ) d y = Z g( y ) ⇢( y ) d y ⇡ N X i=1 g( yi) wi ⇡ N X i=1 ˆ g( yi) wi Integrate in active variables Quadrature rule in active variables Monte Carlo in inactive variables
  42. There are more questions than answers. •  Exploiting symmetries? • 

    Nested node sets? •  Asymptotics? •  Error analysis?
  43. •  How do active subspaces relate to [insert method]? • 

    What if I don’t have gradients? •  What kinds of models does this work on? •  Can you show some applications? •  What about multiple quantities of interest? •  Tell me about hypercubes. •  How new is all this? PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Colorado School of Mines activesubspaces.org! @DrPaulynomial! QUESTIONS?