Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Polynomial uncertainty quantification in the age of AI

Polynomial uncertainty quantification in the age of AI

A perspective on polynomial uncertainty quantification. Presented at VKI.

Pranay Seshadri

October 21, 2018
Tweet

More Decks by Pranay Seshadri

Other Decks in Research

Transcript

  1. Polynomial uncertainty quantification in the age of A.I. Pranay Seshadri

    University of Cambridge & The Alan Turing Institute Sint-Genesius-Rode, Belgium October 17th, 2018
  2. Today’s polynomial uncertainty quantification landscape ▶ Most companies have polynomial

    chaos tools that go from a user-defined problem to computing statistics. This may include running relevant computational physics modules*. ▶ Some codes have adaptive capabilities to ensure that statistics provided are suitably converged. ▶ In the context of design, such codes and workflows are useful to assess performance of components under uncertainty. *For those that don’t, recommend www.effective-quadratures.org 2
  3. So, what needs to be done? ▶ Better sampling strategies:

    Motivated by the need to integrate parsimoniously, we need quadrature rules for ever increasing high-dimensional problems. ▶ Research into correlations: The standard adage, “inputs are assumed independent” is not always true. This is particularly apparent when our model evaluations are replaced by sensor data. ▶ Automation: Need high TRL products that streamline the UQ workflow, resulting in automated reports and conversational assistants. – What is the error in my uncertainty quantification study? – Which of my parameters has the greatest contribution to the uncertainty? – If I could run a few more evaluations, where should I run them? 3
  4. Presentation of material ▶ All results, plots and code shown

    in this presentation can be found online: https://github.com/Effective-Quadratures/VKI-Talk ▶ You need Python 2.7 and Effective Quadratures to run the codes. 4
  5. Sampling strategies ▶ Generating quadrature rules is an active research

    area. ▶ Desire to maximize the degree of exactness in a parsimonious way. ▶ Seek other alternatives compared to more traditional tensor and sparse grids. ▶ Compressed sensing methods assume sparsity in the polynomial basis, which may an attribute absent from our problems. ▶ Least squares methods do not make any such assumptions and solution strategies are well-known. A pictorial review 6
  6. Least squares for polynomial uncertainty quantification (e.g., Hampton & Doostan

    2015 , Tao, Narayan & Xiu 2016) ▶ Problem: minimize ∥Ax − b∥2 2 i.e., where A is a Vandermonde-type matrix and b are weighted function evaluations. ▶ Define A ∈ Rm×n and b ∈ Rm such that m ≥ n. ▶ As long as ATA is not singular, coefficients x can be computed. ▶ Textbook procedure is to use either the normal equations or the QR-factorization. ▶ Now, let’s define these quantities. A pictorial review 7
  7. Least squares for polynomial uncertainty quantification Definitions of A and

    b: A (i, j) = ψj (ζi) √ ωi, and b (i) = √ ωif (ζi) . Where we define ζ : A parameter on R characterized by a probability density function. ρ (ζ) : The probability density function, ˆ R ρ (ζ) dζ = 1. (1) ψj (ζ) : A jth-order orthogonal polynomial w.r.t ρ (ζ), ˆ R ψp (ζ) ψq (ζ) ρ (ζ) dζ = δpq. (2) A pictorial review 8
  8. Least squares for polynomial uncertainty quantification Definitions of A and

    b: A (i, j) = ψj (ζi) √ ωi, and b (i) = √ ωif (ζi) . Where we define ζi : Discretized values associated with ζ, termed quadrature points. ωi : Discretized values associated with ρ, termed quadrature weights. f (ζi) : Expensive model evaluation at quadrature point zetai . A pictorial review 9
  9. Least squares for polynomial uncertainty quantification Definitions of A and

    b: A (i, j) = ψj (ζi) √ ωi, and b (i) = √ ωif (ζi) . ▶ As an example, consider the case where A ∈ R5×5 A = (3) A pictorial review 10
  10. The Gram matrix ▶ Before we investigate the least squares

    problem further, it will be useful to study the Gram matrix G := ATA. (4) ▶ Each element of the Gram matrix is given by G (p, q) = m−1 ∑ i=0 ψp (ζi) ψq (ζi) ωi ≈ δpq (5) A pictorial review 11
  11. The Gram matrix Consider the quadrature rule with order= 4

    † †python file-1.py A pictorial review 12
  12. The Gram matrix Consider the quadrature rule with order= 10

    * *python file-2.py A pictorial review 13
  13. The Gram matrix Consider the quadrature rule with order= 50

    * *python file-3.py A pictorial review 14
  14. The Gram matrix ▶ Visualizing the Gram matrix also gives

    us a sense of the degree of exactness associated with a quadrature rule. ▶ Consider the case where we set the polynomial order p = 20 and m = 11. * *python file-4.py A pictorial review 15
  15. The Gram matrix ▶ The degree of exactness associated with

    an m−point Gauss quadrature rule is 2m − 1. (6) ▶ Trying to integrate a 22nd (and higher order polynomials) with 11 Gauss quadrature points! Highest degree permissible is 21! G (10, 10) = 11−1 ∑ i=0 ψ10 (ζi) ψ10 (ζi) ωi = 1, (7) but clearly G (11, 11) = 11−1 ∑ i=0 ψ11 (ζi) ψ11 (ζi) ωi ̸= 1 (8) resulting in the Gram matrix shown before. ▶ High-level idea is that we want minimize ∥G − I∥, where ∥·∥ is the spectral norm of a matrix, or equivalently κ (G). A pictorial review 16
  16. Multivariate extensions ▶ Recall our least squares problem: minimize ∥Ax

    − b∥2 2 , ▶ We now define the multivariate version of A A (i, j) = ψj (ζi) √ ωi where j ∈ J , i = 0, . . . , m − 1. ▶ ζ = ( ζ(1), . . . , ζ(d) ) is the d−dimensional parameter; ▶ ψ (ζ) is the multivariate orthogonal polynomial, written as a product of its composite univariate polynomials ψj (ζ) = d ∏ k=1 ψ(k) pk ( ζ(k) ) , j = (j1, . . . , jd) ∈ Nd. (9) ▶ ρ is the joint density, written as a product of its constituent marginals. A pictorial review 17
  17. Multivariate polynomial basis (e.g., Blatman & Sudret 2011, Trefethen 2017)

    ▶ Plethora of options*, e.g., total order is far more compact ∼ nd/d! than tensor (maximal) order ∼ nd. *python file-basis.py A pictorial review 18
  18. Tensor grid quadrature rules (e.g., Xiu & Karniadakis 2002, Gerstner

    & Griebel 2003) ▶ Multivariate extensions of univariate rules exist in the form of tensor product grids, at the cost of md function evaluations. ˆ Rd f (ζ) ρ (ζ) dζ ≈ ( Qm1 q ⊗ . . . ⊗ Qmd q ) (f) = m1 ∑ j1=1 . . . md ∑ jd=1 f (ζjd , . . . , ζjd ) ωj1 . . . ωjd = m ∑ j=1 f (ζj) ωj. (10) ▶ The notation Qmi q denotes a linear operation of the univariate quadrature rule applied to f along direction i; the subscript q stands for quadrature. A pictorial review 19
  19. Sparse grid quadrature rules (e.g., Constantine, Eldred & Phipps 2012,

    Gerstner & Griebel 1998) ▶ Sparse grids offer moderate computational attenuation to this problem. One can think of a sparse grid as linear combinations of select anisotropic tensor grids ˆ Rd f (ζ) ρ (ζ) dζ ≈ ∑ r∈K α (r) ( Qm1 q ⊗ . . . ⊗ Qmd q ) (f) . (11) ▶ Here for a given level l—a variable that controls the density of points and the highest order of the polynomial in each direction—the multi-index K and coefficients α(r) are given by K = { r ∈ Nd : l + 1 ≤ |r| ≤ l + d } (12) and α (r) = (−1)−|r|+d+l ( d − 1 − |r| + d + l ) . (13) A pictorial review 20
  20. Comparing tensor and sparse grids ▶ To motivate the use

    of sparse grids, and to realize its limitations, consider f (ζ) = exp (3ζ1 + ζ2) where ζ ∈ [−1, 1]2 (14) and ρ(ζ) is the uniform distribution over the domain. Figure: Parameter space points (left) from an isotropic tensor grid (maximum order 35) and the polynomial coefficients (right) with a tensor order basis of equivalent order. Total of 1296 function evaluations. A pictorial review 21
  21. Comparing tensor and sparse grids ▶ Repeating the same problem

    with a sparse grid†. Figure: Parameter space points (left) from a sparse grid (linear growth rule, l = 12) and the associated polynomial coefficients (right). Total of 1015 function evaluations. †python tensor-sparse.py A pictorial review 22
  22. Comparing tensor and sparse grids ▶ Now with another sparse

    grid, but with an exponential growth rule‡. Figure: Parameter space points (left) from a sparse grid (linear growth rule, l = 12) and the associated polynomial coefficients (right). Total of 667 function evaluations. ‡python tensor-sparse.py A pictorial review 23
  23. Summary and objectives moving forward ▶ It is important to

    note that using tensor or sparse grid quadrature rules§ within least squares yields the exact same coefficients than if one uses pseudospectral projections, i.e., x = ( ATA )−1 ATb = ATb. (15) ▶ Assuming A and b are known, solving this is trivial! ▶ However, the true challenge lies in selecting multivariate quadrature points and weights {(ζi, ωi)}m−1 i=0 , such that m is as close to n as possible. ▶ We will explore different strategies for generating A and techniques for subsampling it. §Care must be taken when using sparse grids, for details see Constantine et al. 2012, CMAME. A pictorial review 24
  24. Outline A pictorial review New quadrature strategies Getting ready for

    CFD 2.0 Things to know New quadrature strategies 25
  25. Classic Monte-Carlo samples ▶ Undoubtedly, the simplest sampling strategy for

    polynomial least squares is to generate random Monte-Carlo samples based on joint density ρ (ζ) in Rd. ▶ Practical sampling heuristic for well conditioned matrices is m ≥ n2 on total order and hyperbolic cross spaces ¶. ▶ But surely better strategies must exist? We break our problem into two sub-problems 1. Selecting a set of samples 2. Subselecting samples via optimization ¶See Migliorati, Nobile, Schwerin, Tempone, 2014. FCM New quadrature strategies 26
  26. Christoffel samples (e.g., Narayan, Jakeman & Tao 2017.) ▶ Christoffel

    samples significantly outperform standard Monte Carlo (in general). Define the diagonal of the orthogonal projection kernel Kn (ζ) := n ∑ j=1 ψ2 j (ζ) , (16) where subscript n denotes card (J ). ▶ If the domain is bounded with a continuous ρ, then we know that lim n→∞ m Kn (ζ) = ρ (ζ) ν (ζ) , (17) where ν (ζ) is the weighted pluripotential equilibrium measure. ▶ This boils down to sampling from ν (ζ) and modifying the weight function by a constant. New quadrature strategies 27
  27. Monte Carlo sampling ▶ Set n = 51 and m

    = 500 and the sampling density to be the uniform distribution. ▶ This yields κ (G) = 4054668*. *python file-5.py New quadrature strategies 28
  28. Christoffel sampling ▶ Set n = 51 and m =

    200 and the sampling density to be the arcsine distribution. ▶ This yields κ (G) = 11.5*. *python file-5.py New quadrature strategies 29
  29. Monte Carlo vs. Christoffel samples ▶ Set d = 3,

    J to be an isotropic total order basis with a maximum degree of 4. Figure: κ (G) = 8.95 for Monte Carlo (left) and κ (G) = 2.45 for Christoffel samples (right). New quadrature strategies 30
  30. The Gram matrix ▶ In fact as m → ∞

    we are guaranteed that κ (G)—but this is totally not practical! ▶ Lets consider m = 500 and n = 11; the Gram matrix is shown below with κ (G) = 1.618 ▶ Now, what if we could subselect rows of A, such that κ ( ˜ G ) is as close to 1 as possible? New quadrature strategies 31
  31. Optimization strategies ▶ It will be convenient to interpret our

    objective as identifying a suitable submatrix of A ∈ Rm×n by only selecting k << m rows. Ideally, we would like k to be as close to n as possible. Formally, we write this submatrix as ˆ Az = 1 τ m ∑ i=1 ziaT i e where 1Tz = k, τ = m ∑ i=1 ziwi, and zi ∈ {0, 1} . (18) ▶ Here aT i represents the rows of A, and e ∈ Rk and 1 ∈ Rm are vector of ones. ▶ The normalization factor τ is introduced to ensure the weights associated with the subselected quadrature rule sum up to one. New quadrature strategies 32
  32. Optimization objectives ▶ Consider some sample objectives Case Objective P1:

    minimize z σmax ( ˆ Az ) P2: maximize z σmin ( ˆ Az ) P3: minimize z κ ( ˆ Az ) P4: maximize z vol ( ˆ Az ) = ∏k i=1 σi ( ˆ Az ) P5: minimize z ˆ AT z ˆ Az 2 ▶ P1 to P4 are provably NP hard*. ▶ Theory to suggest that maximizing the determinant (P4) and minimizing the condition number (P3) are likely to yield equivalent solutions† *See Civril and Magdon-Ismail 2009. TSC. †Guo, Narayan, Yan, Zhou, 2018. SIAM JSC. New quadrature strategies 33
  33. Optimization solutions: Convex relaxation (e.g., Joshi & Boyd 2009, Seshadri,

    Iaccarino & Ghisu 2018.) ▶ Motivated by the sensor selection problem. ▶ Can be recast as a determinant maximization problem where the objective is a concave function in z with binary constraints on its entries. ▶ Applied to our problem, results in minimize z∈RM − log det ( 1 τ M ∑ i=1 ziaT i e ) − λ M ∑ i=1 (log (zi) + log (1 − zi)) subject to 1Tz = K and 0 ≤ zi ≤ 1, i = 1, . . . , M. (19) ▶ Here λ is a positive constant used to control the quality of the approximation. ▶ Can be solved by Newton’s method (complexity O(m3))! New quadrature strategies 34
  34. Optimization solutions: Pivoted QR factorization (e.g., Bos, De March, Sommariva

    & Vianello 2010, Seshadri, Narayan & Mahadevan 2018) ▶ The pivoted QR factorization is given by ATP = Q ( R1 R2 ) , (20) where Q ∈ Rn×n and R1 ∈ Rn×n. (21) ▶ The permutation matrix P ∈ Rm×m is based on a pivoting strategy that seeks to either maximize σmax(R1) or minimize σmin(R2). ▶ Both these strategies are in some cases equivalent and thus greedily maximize the diagonal entries of R1 and serve as heuristics for achieving objectives P1-P4.‡ ‡See Björck 2016 New quadrature strategies 35
  35. Sample Pivoted QR implementation m, n = size(A); (Q, R,

    p) = qr(A'); pivots = p(1:n); Az = A(pivots, :); New quadrature strategies 36
  36. Revisiting the Gram matrix ▶ Recall, for the last Gram

    matrix we studied, m = 500, n = 11 and κ (G) = 1.618. ▶ We deploy QR column pivoting and pick the first 11 pivot indices of A, to yield Az . ▶ κ (Gz) = 1.106! New quadrature strategies 37
  37. Revisiting the Gram matrix ▶ What’s interesting is that these

    points closely resemble the Gauss-Legendre points! New quadrature strategies 38
  38. Higher dimensions ▶ As we increase the dimensionality, the QR

    column pivoting heuristic is able to offer a set of well-conditioned matrices; shown here for d = 5 with J to be a total order with a maximal degree of 4. ▶ Note: We can also add more samples to decrease κ (G). New quadrature strategies 39
  39. Higher dimensions ▶ As we increase the dimensionality, the QR

    column pivoting heuristic is able to offer a set of well-conditioned matrices; shown here for d = 5 with J to be a total order with a maximal degree of 4. ▶ Note: We can also add more samples to decrease κ (G). New quadrature strategies 40
  40. Scale up in dimension ▶ In practice, we are interested

    in generating quadrature rules on total degree and hyperbolic basis—leveraging the anisotropic structure of physical problem. ▶ Interpolation and approximation with even total order basis encounter the pervasive challenges of uniqueness and existence (unisolvency). ▶ Key question: How do we interpolate / approximate with these non-tensorial bases; where should the points be distributed? New quadrature strategies 41
  41. New quadrature strategies (e.g., Tao, Narayan, & Xiu 2016, Seshadri,

    Narayan & Mahadevan 2018) A few recipes: ▶ Randomized quadratures: Randomly select ∝ nlog (n) samples from a tensor grid of the chosen order. Use QR column pivoting to prune down the number of samples to n. ▶ Effectively subsampled quadratures: Apply QR column pivoting to all samples in the tensor product grid and then select the points corresponding to the top n pivots. ▶ Christoffel least squares: Randomly select ∝ nlog (n) samples from the equilibrium measure. Use QR column pivoting to prune down the number of samples to n. New quadrature strategies 42
  42. Effectively subsampled quadratures A bivariate example with a total degree

    basis, ζ1 ∈ [−2, 2] and ζ2 ∈ [−2, 2.5] for f (ζ1, ζ2) = ζ1 − 2ζ2 1ζ2 − 3ζ4 1 + 88.3. (22) We use Legendre polynomials in both directions and subsample points from a 4th order Gauss-Legendre quadrature tensor grid. Figure: Comparison of tensor grid (left) with effectively subsampled quadratures (right). The former uses 25 points while the latter uses 12. New quadrature strategies 43
  43. Outline A pictorial review New quadrature strategies Getting ready for

    CFD 2.0 Things to know Getting ready for CFD 2.0 44
  44. CFD 2.0 ▶ Recall, the central equation of this talk:

    minimize ∥Ax − b∥2 2 . ▶ We have so far assumed that b is fixed. ▶ In CFD 2.0, b is no longer a deterministic vector, as each function call will have f (ζi) = fi ± f′ i, (23) variability characterized by an interval or distribution*. ▶ Naively, we can assume that the variation in b ∼ N (µb, Σb) and propagate through. *Owing to epistemic uncertainty work by Iaccarino, Duraisamy, Gorle, Emory and others Getting ready for CFD 2.0 45
  45. Getting ready for CFD 2.0 ▶ Can derive uncertainty in

    coefficients x µx = Pµb and Σx = PΣbPT, (24) where P is the pseudoinverse of A. ▶ These can be used to extract confidence intervals—polynomial approximation posterior variances*. *python file-new-cfd.py Getting ready for CFD 2.0 46
  46. What’s next? ▶ Focus of this talk has been: minimize

    ∥Ax − b∥2 2 . ▶ But there are other approaches for estimating x, such as LASSO, basis pursuit, etc. ▶ Regardless of how we compute x, estimating moments are trivial E [f] = x0 and σ2 [f] = n−1 ∑ i=1 x2 i . (25) ▶ So what else can we do? Things to know 48
  47. Things to know ▶ Estimate Sobol’ indices ▶ Discover and

    exploit dimension reducing subspaces ▶ Study impact of correlations in inputs ▶ Monte Carlo filtering ▶ Estimate gradients Things to know 49
  48. Things to know: Polynomials can aid dimension reduction (e.g., Hokanson

    & Constantine 2018, Seshadri et al. 2018, Yuchi et al. 2018.) ▶ Can use polynomials to find dimension reducing subspaces. Can also construct polynomials on dimension reducing subspaces. ▶ Efficiency (CFD simulations) with ζ ∈ R25 with m = 250 *. *python active.py Things to know 50
  49. Things to know: Estimating Sobol’ indices are straightforward! (e.g., Sudret

    2008.) ▶ Can use polynomial surrogates to compute first (and total) order Sobol’ indices†. †python active.py Things to know 51