Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Active Subspaces: Emerging Ideas in Dimension Reduction for Computational Science Models

Active Subspaces: Emerging Ideas in Dimension Reduction for Computational Science Models

Talk at INFORMS2017 in a session on Uncertainty Quantification and Computer Experiments.

F75081afd76cac5bcea8bd43419e174e?s=128

Paul Constantine

October 23, 2017
Tweet

Transcript

  1. ACTIVE SUBSPACES Emerging ideas for dimension reduction in computational science

    models PAUL CONSTANTINE Assistant Professor Department of Computer Science University of Colorado, Boulder activesubspaces.org! @DrPaulynomial! SLIDES AVAILABLE UPON REQUEST DISCLAIMER: These slides are meant to complement the oral presentation. Use out of context at your own risk. Thanks to: Qiqi Wang (MIT, AeroAstro) David Gleich (Purdue, CS) Rachel Ward (UT Austin, Math) Armin Eftekhari (Alan Turing Institute) Jeff Hokanson (CU Boulder, CS)
  2. ( u( s , t; x ) ) spatial /

    temporal classical physics, mechanics, applied mathematics parameter principal component analysis, Karhunen-Loève state model reduction POD, reduced basis, empirical interpolation, … see, e.g., Benner, Cohen, Ohlberger, and Willcox (SIAM, 2017) PDE solution space time parameters functional of interest What is dimension reduction? x 2 X ✓ Rm t 2 [0, T] ⇢ R s 2 ⌦ ⇢ R3 u 2 Wk,2 ⇢ L2 2 R
  3. ( u( s , t; x ) ) x 2

    X ✓ Rm t 2 [0, T] ⇢ R s 2 ⌦ ⇢ R3 u 2 Wk,2 ⇢ L2 2 R functional of interest What is dimension reduction? x 7! How can we exploit the map from parameters to quantity-of-interest to reduce the parameter dimension? f( x ) parameters ASSUME PARAMETERS ARE INDEPENDENT
  4. Hypersonic scramjet models Constantine, Emory, Larsson, and Iaccarino (2015) Aerospace

    design Lukaczyk, Palacios, Alonso, and Constantine (2014) Integrated hydrologic models Jefferson, Gilbert, Constantine, and Maxwell (2015) Solar cell models Constantine, Zaharatos, and Campanelli (2015) Magnetohydrodynamics models Glaws, Constantine, Shadid, and Wildey (2017) Ebola transmission models Diaz, Constantine, Kalmbach, Jones, and Pankavich (arXiv, 2016) Lithium ion battery model Constantine and Doostan (2017) Automobile design Othmer, Lukaczyk, Constantine, and Alonso (2016) f( x )
  5. How many dimensions is high dimensions?

  6. APPROXIMATION OPTIMIZATION INTEGRATION ˜ f( x ) ⇡ f( x

    ) minimize x f( x ) Z f( x ) d x
  7. Number of parameters (the dimension) Number of model runs (at

    10 points per dimension) Time for parameter study (at 1 second per run) 1 10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) REDUCED-ORDER MODELS or PARALLEL PROCESSING
  8. Number of parameters (the dimension) Number of model runs (at

    10 points per dimension) Time for parameter study (at 1 second per run) 1 10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) BETTER DESIGNS or ADAPTIVE SAMPLING
  9. Number of parameters (the dimension) Number of model runs (at

    10 points per dimension) Time for parameter study (at 1 second per run) 1 10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) But how? CURSED BY DIMENSIONALITY? REDUCE THE DIMENSION!
  10. f ( x ) ⇡ r X k=1 fk,1( x1)

    · · · fk,m( xm) f( x ) ⇡ p X k=1 ak k( x ), k a k0 ⌧ p f ( x ) ⇡ f1( x1) + · · · + fm( xm) Structure-exploiting methods STRUCTURE METHODS Separation of variables [Beylkin & Mohlenkamp (2005)], Tensor-train [Oseledets (2011)], Adaptive cross approximation [Bebendorff (2011)], Proper generalized decomposition [Chinesta et al. (2011)], … Compressed sensing [Donoho (2006), Candès & Wakin (2008)], … Sparse grids [Bungartz & Griebel (2004)], HDMR [Sobol (2003)], ANOVA [Hoeffding (1948)], …
  11. “Even more understanding is lost if we consider each thing

    we can do to data only in terms of some set of very restrictive assumptions under which that thing is best possible---assumptions we know we CANNOT check in practice.” “Many algorithms … aim to diminish the ‘curse of dimensionality.’ Such algorithms take advantage of special properties of the functions being treated, such as alignment with the axes, but their authors do not always emphasize this aspect of their methods.”
  12. www.youtube.com/watch?v=mJvKzjT6lmY

  13. Design a jet nozzle under uncertainty (DARPA SEQUOIA project) 10-parameter

    engine performance model (See animation at https://youtu.be/Fek2HstkFVc)
  14. f( x ) ⇡ g(UT x ) Ridge approximations UT

    : Rm ! Rn g : Rn ! R where Constantine, Eftekhari, Hokanson, and Ward (2017)
  15. f( x ) ⇡ g(UT x ) Some relevant literature

    Approximation theory: Pinkus (2015), Diaconis and Shahshahani (1984) Compressed sensing: Fornasier et al. (2012), Cohen et al. (2012), Tyagi and Cevher (2014) Statistical regression: Friedman and Stuetzle (1981), Xia et al. (2002) Uncertainty quantification: Tipireddy and Ghanem (2014); Lei et al. (2015); Stoyanov and Webster (2015); Tripathy, Bilionis, and Gonzalez (2016); Li, Lin, and Li (2016); … Ridge approximations
  16. f( x ) ⇡ g(UT x ) What is U?

    What is the approximation error? What is g? Constantine, Eftekhari, Hokanson, and Ward (2017) Ridge approximations
  17. C = Z rf( x ) rf( x )T ⇢(

    x ) d x = W ⇤W T Define the active subspace The average outer product of the gradient and its eigendecomposition, Partition the eigendecomposition, Rotate and separate the coordinates, ⇤ =  ⇤1 ⇤2 , W = ⇥ W 1 W 2 ⇤ , W 1 2 Rm⇥n x = W W T x = W 1W T 1 x + W 2W T 2 x = W 1y + W 2z active variables inactive variables f = f( x ), x 2 Rm, rf( x ) 2 Rm, ⇢ : Rm ! R + Constantine, Dow, and Wang (2014) Some relevant literature Statistical regression: Samarov (1993), Hristache et al. (2001) Machine learning: Mukerjee, Wu, and Xiao (2010); Fukumizu and Leng (2014) Detection and estimation theory: van Trees (2001) The function, its gradient vector, and a given weight function:
  18. C = Z rf( x ) rf( x )T ⇢(

    x ) d x = W ⇤W T Define the active subspace The function, its gradient vector, and a given weight function: The average outer product of the gradient and its eigendecomposition: f = f( x ), x 2 Rm, rf( x ) 2 Rm, ⇢ : Rm ! R + Constantine, Dow, and Wang (2014) i = Z w T i rf( x ) 2 ⇢( x ) d x , i = 1, . . . , m average, squared, directional derivative along eigenvector eigenvalue Eigenvalues measure ridge structure with eigenvectors:
  19. (1) Draw samples: (2) Compute: and fj = f( xj)

    (3) Approximate with Monte Carlo, and compute eigendecomposition Equivalent to SVD of samples of the gradient Called an active subspace method in T. Russi’s 2010 Ph.D. thesis, Uncertainty Quantification with Experimental Data in Complex System Models C ⇡ 1 N N X j=1 rfj rfT j = ˆ W ˆ ⇤ ˆ W T 1 p N ⇥ rf1 · · · rfN ⇤ = ˆ W p ˆ ⇤ ˆ V T rfj = rf( xj) Constantine, Dow, and Wang (2014), Constantine and Gleich (2015, arXiv) xj ⇠ ⇢( x ) Estimate the active subspace with Monte Carlo
  20. 1 p N ⇥ rf1 · · · rfN ⇤

    ⇡ ˆ W 1 q ˆ ⇤1 ˆ V T 1 Low-rank approximation of the collection of gradients: Low-dimensional linear approximation of the gradient: f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ Approximate a function of many variables by a function of a few linear combinations of the variables: ✔ ✖ ✖ Remember the problem to solve span ( ˆ W 1) ⇡ { rf( x ) : x 2 supp ⇢( x ) }
  21. conditional average active subspace Poincaré constant eigenvalues associated with inactive

    subspace f( x ) µ(W T 1 x ) L2(⇢)  C ( n+1 + · · · + m)1 2 Constantine, Dow, and Wang (2014) But is that the smallest error? The eigenvalues measure the approximation error
  22. f( x ) ⇡ g(UT x ) What is U?

    Define the error function: R(U) = 1 2 Z (f( x ) µ(UT x ))2 ⇢( x ) d x Minimize the error: minimize U R ( U ) subject to U 2 G ( n, m ) Grassmann manifold of n-dimensional subspaces Constantine, Eftekhari, Hokanson, and Ward (2017) Ridge approximations best approximation
  23. (1) Draw samples: (2) Compute: fj = f( xj) (3)

    Minimize the misfit Minimize over polynomials and subspaces Constantine, Eftekhari, Hokanson, and Ward (2017), Hokanson and Constantine (2017, arXiv) xj ⇠ ⇢( x ) Estimate the optimal subspace with discrete least squares minimize g2P p(Rn) U2G(n,m) N X j=1 ⇣ fj g(UT xj) ⌘2
  24. Assessing ridge or near-ridge structure Z rf( x ) rf(

    x )T ⇢( x ) d x Derivative-based ideas: eigenvalues reveal structure, eigenvectors give directions Active subspaces [Constantine et al. (2014), Russi (2010)], Gradient outer product [Mukherjee et al. (2010)], Outer product of gradient [Hristache et al. (2001)] Z r2f( x ) ⇢( x ) d x Principal Hessian directions [Li (1992)], Likelihood-informed subspaces [Cui et al. (2014)] Ideas for approximating these without gradients: finite differences [Constantine & Gleich (2015), Lewis et al. (2016)], polynomial approximations [Yang et al (2016), Tippireddy & Ghanem (2014)], kernel approximations [Fukumizu & Leng (2014)] See Samarov’s average derivative functionals [Samarov (1993)]
  25. Assessing ridge or near-ridge structure Sufficient dimension reduction ideas: eigenvalues

    reveal structure, eigenvectors give directions Sliced inverse regression [Li (1991)] Sliced average variance estimation [Cook & Weisberg (1991)] E ⇥ E[ x |f] E[ x |f]T ⇤ E h ( I Cov[x |f ]) 2 i E ⇥ ( x1 x2) ( x1 x2)T | |f( x1) f( x2)|  ⇤ Contour regression [Li et al. (2005)] These are population metrics. Data produces sample estimates.
  26. minimize g, U f( x ) g(UT x ) Assessing

    ridge or near-ridge structure Optimization ideas: optimum residual suggests structure, optimizer gives directions Ridge approximation [Constantine et al. (2017)], Minimum average variance estimation [Xia et al. (2002)], Gaussian processes [Vivarelli & Wiliams (1999), Tripathy et al. (2016)] Projection pursuit regression [Friedman & Stuetzle (1981), Huber (1985)] Likelihood-based sufficient dimension reduction [Cook & Forzani (2009)] minimize gi, ui f( x ) X i gi( u T i x ) ! maximize U E [ k PU Cov[x |f ] PU k ⇤ ] All nonconvex optimizations. Some on Grassmann manifold of subspaces.
  27. (1)  Exploitable + for dimension reduction, not just cheap surrogate

    (2)  Insights + which variables are important (3)  Discoverable / checkable + eigenvalues + non-residual metrics: + plots in 1 and 2d E[ Var[ f | UT x ] ] Why I like ridge structure
  28. Do real world models have such structure?

  29. Hypersonic scramjet models Constantine, Emory, Larsson, and Iaccarino (2015) Evidence

    of active subspaces across science and engineering models
  30. Integrated jet nozzle models Alonso, Eldred, Constantine, Duraisamy, Farhat, Iaccarino,

    and Jakeman (2017) Evidence of active subspaces across science and engineering models
  31. Integrated hydrologic models Jefferson, Gilbert, Constantine, and Maxwell (2015) Evidence

    of active subspaces across science and engineering models
  32. −2 −1 0 1 2 −0.1 0 0.1 0.2 0.3

    0.4 0.5 0.6 0.7 0.8 0.9 Active Variable 1 Lift Lukaczyk, Constantine, Palacios, and Alonso (2014) Evidence of active subspaces across science and engineering models −2 −1 0 1 2 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Active Variable 1 Drag Aerospace vehicle geometries
  33. In-host HIV dynamical models T-cell count Loudon and Pankavich (2016)

    Evidence of active subspaces across science and engineering models
  34. Solar cell circuit models −2 −1 0 1 2 0

    0.05 0.1 0.15 0.2 0.25 Active Variable 1 P max (watts) Constantine, Zaharatos, and Campanelli (2015) Evidence of active subspaces across science and engineering models
  35. Atmospheric reentry vehicle model Cortesi, Constantine, Magin, and Congedo (2017)

    Evidence of active subspaces across science and engineering models −1 0 1 ˆ wT q x 0.4 0.6 0.8 1.0 1.2 Stagnation heat flux qst ×107 −1 0 1 ˆ wT p x 20000 40000 60000 80000 100000 Stagnation pressure pst
  36. Magnetohydrodynamics generator model -1 0 1 wT 1 x 0

    5 10 15 f(x) Average velocity Glaws, Constantine, Shadid, and Wildey (2017) Evidence of active subspaces across science and engineering models -1 0 1 wT 1 x 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 f(x) Induce magnetic field
  37. Lithium ion battery model 2 0 2 wT x 3.65

    3.70 Voltage [V] Constantine and Doostan (2017) Evidence of active subspaces across science and engineering models 2 0 2 wT x 2.0 2.2 Capacity [mAh·cm 2]
  38. Automobile geometries Othmer, Lukaczyk, Constantine, and Alonso (2016) Evidence of

    active subspaces across science and engineering models
  39. -4 -2 0 2 4 Quantity of interest #10-3 0

    1 2 3 4 5 -4 -2 0 2 4 Quantity of interest #10-4 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Long length scale Short length scale Evidence of active subspaces across science and engineering models Constantine, Dow, and Wang (2014) r · (aru) = 1, s 2 D u = 0, s 2 1 n · aru = 0, s 2 2 Input field Solution Short corr Long corr
  40. f( x ) Jupyter notebooks: github.com/paulcon/as-data-sets

  41. www.siam.org/meetings/dr17/ http://www.siam.org/journals/juq/juq_special.php SUBMISSIONS DUE NOVEMBER 1!!!

  42. How do the eigenspace-based active subspaces relate to the optimization-based

    subspaces? Why do all those models exhibit similar structure? What if my model doesn’t fit your setup? (no gradients, multiple outputs, correlated inputs, …) PAUL CONSTANTINE Assistant Professor University of Colorado Boulder activesubspaces.org! @DrPaulynomial! QUESTIONS? Active Subspaces SIAM (2015)