Polynomial uncertainty quantification in the age of AI

Polynomial uncertainty quantiﬁcation in the age of A.I. Pranay Seshadri
University of Cambridge & The Alan Turing Institute Sint-Genesius-Rode, Belgium October 17th, 2018

Today’s polynomial uncertainty quantification landscape ▶ Most companies have polynomial
chaos tools that go from a user-defined problem to computing statistics. This may include running relevant computational physics modules*. ▶ Some codes have adaptive capabilities to ensure that statistics provided are suitably converged. ▶ In the context of design, such codes and workflows are useful to assess performance of components under uncertainty. *For those that don’t, recommend www.effective-quadratures.org 2

So, what needs to be done? ▶ Better sampling strategies:
Motivated by the need to integrate parsimoniously, we need quadrature rules for ever increasing high-dimensional problems. ▶ Research into correlations: The standard adage, “inputs are assumed independent” is not always true. This is particularly apparent when our model evaluations are replaced by sensor data. ▶ Automation: Need high TRL products that streamline the UQ workﬂow, resulting in automated reports and conversational assistants. – What is the error in my uncertainty quantiﬁcation study? – Which of my parameters has the greatest contribution to the uncertainty? – If I could run a few more evaluations, where should I run them? 3

Presentation of material ▶ All results, plots and code shown
in this presentation can be found online: https://github.com/Effective-Quadratures/VKI-Talk ▶ You need Python 2.7 and Effective Quadratures to run the codes. 4

Outline A pictorial review New quadrature strategies Getting ready for
CFD 2.0 Things to know A pictorial review 5

Sampling strategies ▶ Generating quadrature rules is an active research
area. ▶ Desire to maximize the degree of exactness in a parsimonious way. ▶ Seek other alternatives compared to more traditional tensor and sparse grids. ▶ Compressed sensing methods assume sparsity in the polynomial basis, which may an attribute absent from our problems. ▶ Least squares methods do not make any such assumptions and solution strategies are well-known. A pictorial review 6

Least squares for polynomial uncertainty quantification (e.g., Hampton & Doostan
2015 , Tao, Narayan & Xiu 2016) ▶ Problem: minimize ∥Ax − b∥2 2 i.e., where A is a Vandermonde-type matrix and b are weighted function evaluations. ▶ Define A ∈ Rm×n and b ∈ Rm such that m ≥ n. ▶ As long as ATA is not singular, coefficients x can be computed. ▶ Textbook procedure is to use either the normal equations or the QR-factorization. ▶ Now, let’s define these quantities. A pictorial review 7

Least squares for polynomial uncertainty quantification Definitions of A and
b: A (i, j) = ψj (ζi) √ ωi, and b (i) = √ ωif (ζi) . Where we define ζ : A parameter on R characterized by a probability density function. ρ (ζ) : The probability density function, ˆ R ρ (ζ) dζ = 1. (1) ψj (ζ) : A jth-order orthogonal polynomial w.r.t ρ (ζ), ˆ R ψp (ζ) ψq (ζ) ρ (ζ) dζ = δpq. (2) A pictorial review 8

b: A (i, j) = ψj (ζi) √ ωi, and b (i) = √ ωif (ζi) . Where we deﬁne ζi : Discretized values associated with ζ, termed quadrature points. ωi : Discretized values associated with ρ, termed quadrature weights. f (ζi) : Expensive model evaluation at quadrature point zetai . A pictorial review 9

b: A (i, j) = ψj (ζi) √ ωi, and b (i) = √ ωif (ζi) . ▶ As an example, consider the case where A ∈ R5×5 A = (3) A pictorial review 10

The Gram matrix ▶ Before we investigate the least squares
problem further, it will be useful to study the Gram matrix G := ATA. (4) ▶ Each element of the Gram matrix is given by G (p, q) = m−1 ∑ i=0 ψp (ζi) ψq (ζi) ωi ≈ δpq (5) A pictorial review 11

The Gram matrix Consider the quadrature rule with order= 4
† †python ﬁle-1.py A pictorial review 12

* *python ﬁle-2.py A pictorial review 13

* *python ﬁle-3.py A pictorial review 14

The Gram matrix ▶ Visualizing the Gram matrix also gives
us a sense of the degree of exactness associated with a quadrature rule. ▶ Consider the case where we set the polynomial order p = 20 and m = 11. * *python ﬁle-4.py A pictorial review 15

The Gram matrix ▶ The degree of exactness associated with
an m−point Gauss quadrature rule is 2m − 1. (6) ▶ Trying to integrate a 22nd (and higher order polynomials) with 11 Gauss quadrature points! Highest degree permissible is 21! G (10, 10) = 11−1 ∑ i=0 ψ10 (ζi) ψ10 (ζi) ωi = 1, (7) but clearly G (11, 11) = 11−1 ∑ i=0 ψ11 (ζi) ψ11 (ζi) ωi ̸= 1 (8) resulting in the Gram matrix shown before. ▶ High-level idea is that we want minimize ∥G − I∥, where ∥·∥ is the spectral norm of a matrix, or equivalently κ (G). A pictorial review 16

Multivariate extensions ▶ Recall our least squares problem: minimize ∥Ax
− b∥2 2 , ▶ We now deﬁne the multivariate version of A A (i, j) = ψj (ζi) √ ωi where j ∈ J , i = 0, . . . , m − 1. ▶ ζ = ( ζ(1), . . . , ζ(d) ) is the d−dimensional parameter; ▶ ψ (ζ) is the multivariate orthogonal polynomial, written as a product of its composite univariate polynomials ψj (ζ) = d ∏ k=1 ψ(k) pk ( ζ(k) ) , j = (j1, . . . , jd) ∈ Nd. (9) ▶ ρ is the joint density, written as a product of its constituent marginals. A pictorial review 17

Multivariate polynomial basis (e.g., Blatman & Sudret 2011, Trefethen 2017)
▶ Plethora of options*, e.g., total order is far more compact ∼ nd/d! than tensor (maximal) order ∼ nd. *python ﬁle-basis.py A pictorial review 18

Tensor grid quadrature rules (e.g., Xiu & Karniadakis 2002, Gerstner
& Griebel 2003) ▶ Multivariate extensions of univariate rules exist in the form of tensor product grids, at the cost of md function evaluations. ˆ Rd f (ζ) ρ (ζ) dζ ≈ ( Qm1 q ⊗ . . . ⊗ Qmd q ) (f) = m1 ∑ j1=1 . . . md ∑ jd=1 f (ζjd , . . . , ζjd ) ωj1 . . . ωjd = m ∑ j=1 f (ζj) ωj. (10) ▶ The notation Qmi q denotes a linear operation of the univariate quadrature rule applied to f along direction i; the subscript q stands for quadrature. A pictorial review 19

Sparse grid quadrature rules (e.g., Constantine, Eldred & Phipps 2012,
Gerstner & Griebel 1998) ▶ Sparse grids offer moderate computational attenuation to this problem. One can think of a sparse grid as linear combinations of select anisotropic tensor grids ˆ Rd f (ζ) ρ (ζ) dζ ≈ ∑ r∈K α (r) ( Qm1 q ⊗ . . . ⊗ Qmd q ) (f) . (11) ▶ Here for a given level l—a variable that controls the density of points and the highest order of the polynomial in each direction—the multi-index K and coefﬁcients α(r) are given by K = { r ∈ Nd : l + 1 ≤ |r| ≤ l + d } (12) and α (r) = (−1)−|r|+d+l ( d − 1 − |r| + d + l ) . (13) A pictorial review 20

Comparing tensor and sparse grids ▶ To motivate the use
of sparse grids, and to realize its limitations, consider f (ζ) = exp (3ζ1 + ζ2) where ζ ∈ [−1, 1]2 (14) and ρ(ζ) is the uniform distribution over the domain. Figure: Parameter space points (left) from an isotropic tensor grid (maximum order 35) and the polynomial coefﬁcients (right) with a tensor order basis of equivalent order. Total of 1296 function evaluations. A pictorial review 21

Comparing tensor and sparse grids ▶ Repeating the same problem
with a sparse grid†. Figure: Parameter space points (left) from a sparse grid (linear growth rule, l = 12) and the associated polynomial coefﬁcients (right). Total of 1015 function evaluations. †python tensor-sparse.py A pictorial review 22

Comparing tensor and sparse grids ▶ Now with another sparse
grid, but with an exponential growth rule‡. Figure: Parameter space points (left) from a sparse grid (linear growth rule, l = 12) and the associated polynomial coefﬁcients (right). Total of 667 function evaluations. ‡python tensor-sparse.py A pictorial review 23

Summary and objectives moving forward ▶ It is important to
note that using tensor or sparse grid quadrature rules§ within least squares yields the exact same coefﬁcients than if one uses pseudospectral projections, i.e., x = ( ATA )−1 ATb = ATb. (15) ▶ Assuming A and b are known, solving this is trivial! ▶ However, the true challenge lies in selecting multivariate quadrature points and weights {(ζi, ωi)}m−1 i=0 , such that m is as close to n as possible. ▶ We will explore different strategies for generating A and techniques for subsampling it. §Care must be taken when using sparse grids, for details see Constantine et al. 2012, CMAME. A pictorial review 24

CFD 2.0 Things to know New quadrature strategies 25

Classic Monte-Carlo samples ▶ Undoubtedly, the simplest sampling strategy for
polynomial least squares is to generate random Monte-Carlo samples based on joint density ρ (ζ) in Rd. ▶ Practical sampling heuristic for well conditioned matrices is m ≥ n2 on total order and hyperbolic cross spaces ¶. ▶ But surely better strategies must exist? We break our problem into two sub-problems 1. Selecting a set of samples 2. Subselecting samples via optimization ¶See Migliorati, Nobile, Schwerin, Tempone, 2014. FCM New quadrature strategies 26

Christoffel samples (e.g., Narayan, Jakeman & Tao 2017.) ▶ Christoffel
samples signiﬁcantly outperform standard Monte Carlo (in general). Deﬁne the diagonal of the orthogonal projection kernel Kn (ζ) := n ∑ j=1 ψ2 j (ζ) , (16) where subscript n denotes card (J ). ▶ If the domain is bounded with a continuous ρ, then we know that lim n→∞ m Kn (ζ) = ρ (ζ) ν (ζ) , (17) where ν (ζ) is the weighted pluripotential equilibrium measure. ▶ This boils down to sampling from ν (ζ) and modifying the weight function by a constant. New quadrature strategies 27

Monte Carlo sampling ▶ Set n = 51 and m
= 500 and the sampling density to be the uniform distribution. ▶ This yields κ (G) = 4054668*. *python ﬁle-5.py New quadrature strategies 28

Christoffel sampling ▶ Set n = 51 and m =
200 and the sampling density to be the arcsine distribution. ▶ This yields κ (G) = 11.5*. *python ﬁle-5.py New quadrature strategies 29

Monte Carlo vs. Christoffel samples ▶ Set d = 3,
J to be an isotropic total order basis with a maximum degree of 4. Figure: κ (G) = 8.95 for Monte Carlo (left) and κ (G) = 2.45 for Christoffel samples (right). New quadrature strategies 30

The Gram matrix ▶ In fact as m → ∞
we are guaranteed that κ (G)—but this is totally not practical! ▶ Lets consider m = 500 and n = 11; the Gram matrix is shown below with κ (G) = 1.618 ▶ Now, what if we could subselect rows of A, such that κ ( ˜ G ) is as close to 1 as possible? New quadrature strategies 31

Optimization strategies ▶ It will be convenient to interpret our
objective as identifying a suitable submatrix of A ∈ Rm×n by only selecting k << m rows. Ideally, we would like k to be as close to n as possible. Formally, we write this submatrix as ˆ Az = 1 τ m ∑ i=1 ziaT i e where 1Tz = k, τ = m ∑ i=1 ziwi, and zi ∈ {0, 1} . (18) ▶ Here aT i represents the rows of A, and e ∈ Rk and 1 ∈ Rm are vector of ones. ▶ The normalization factor τ is introduced to ensure the weights associated with the subselected quadrature rule sum up to one. New quadrature strategies 32

Optimization objectives ▶ Consider some sample objectives Case Objective P1:
minimize z σmax ( ˆ Az ) P2: maximize z σmin ( ˆ Az ) P3: minimize z κ ( ˆ Az ) P4: maximize z vol ( ˆ Az ) = ∏k i=1 σi ( ˆ Az ) P5: minimize z ˆ AT z ˆ Az 2 ▶ P1 to P4 are provably NP hard*. ▶ Theory to suggest that maximizing the determinant (P4) and minimizing the condition number (P3) are likely to yield equivalent solutions† *See Civril and Magdon-Ismail 2009. TSC. †Guo, Narayan, Yan, Zhou, 2018. SIAM JSC. New quadrature strategies 33

Optimization solutions: Convex relaxation (e.g., Joshi & Boyd 2009, Seshadri,
Iaccarino & Ghisu 2018.) ▶ Motivated by the sensor selection problem. ▶ Can be recast as a determinant maximization problem where the objective is a concave function in z with binary constraints on its entries. ▶ Applied to our problem, results in minimize z∈RM − log det ( 1 τ M ∑ i=1 ziaT i e ) − λ M ∑ i=1 (log (zi) + log (1 − zi)) subject to 1Tz = K and 0 ≤ zi ≤ 1, i = 1, . . . , M. (19) ▶ Here λ is a positive constant used to control the quality of the approximation. ▶ Can be solved by Newton’s method (complexity O(m3))! New quadrature strategies 34

Optimization solutions: Pivoted QR factorization (e.g., Bos, De March, Sommariva
& Vianello 2010, Seshadri, Narayan & Mahadevan 2018) ▶ The pivoted QR factorization is given by ATP = Q ( R1 R2 ) , (20) where Q ∈ Rn×n and R1 ∈ Rn×n. (21) ▶ The permutation matrix P ∈ Rm×m is based on a pivoting strategy that seeks to either maximize σmax(R1) or minimize σmin(R2). ▶ Both these strategies are in some cases equivalent and thus greedily maximize the diagonal entries of R1 and serve as heuristics for achieving objectives P1-P4.‡ ‡See Björck 2016 New quadrature strategies 35

Sample Pivoted QR implementation m, n = size(A); (Q, R,
p) = qr(A'); pivots = p(1:n); Az = A(pivots, :); New quadrature strategies 36

Revisiting the Gram matrix ▶ Recall, for the last Gram
matrix we studied, m = 500, n = 11 and κ (G) = 1.618. ▶ We deploy QR column pivoting and pick the ﬁrst 11 pivot indices of A, to yield Az . ▶ κ (Gz) = 1.106! New quadrature strategies 37

Revisiting the Gram matrix ▶ What’s interesting is that these
points closely resemble the Gauss-Legendre points! New quadrature strategies 38

Higher dimensions ▶ As we increase the dimensionality, the QR
column pivoting heuristic is able to offer a set of well-conditioned matrices; shown here for d = 5 with J to be a total order with a maximal degree of 4. ▶ Note: We can also add more samples to decrease κ (G). New quadrature strategies 39

Higher dimensions ▶ As we increase the dimensionality, the QR
column pivoting heuristic is able to offer a set of well-conditioned matrices; shown here for d = 5 with J to be a total order with a maximal degree of 4. ▶ Note: We can also add more samples to decrease κ (G). New quadrature strategies 40

Scale up in dimension ▶ In practice, we are interested
in generating quadrature rules on total degree and hyperbolic basis—leveraging the anisotropic structure of physical problem. ▶ Interpolation and approximation with even total order basis encounter the pervasive challenges of uniqueness and existence (unisolvency). ▶ Key question: How do we interpolate / approximate with these non-tensorial bases; where should the points be distributed? New quadrature strategies 41

New quadrature strategies (e.g., Tao, Narayan, & Xiu 2016, Seshadri,
Narayan & Mahadevan 2018) A few recipes: ▶ Randomized quadratures: Randomly select ∝ nlog (n) samples from a tensor grid of the chosen order. Use QR column pivoting to prune down the number of samples to n. ▶ Effectively subsampled quadratures: Apply QR column pivoting to all samples in the tensor product grid and then select the points corresponding to the top n pivots. ▶ Christoffel least squares: Randomly select ∝ nlog (n) samples from the equilibrium measure. Use QR column pivoting to prune down the number of samples to n. New quadrature strategies 42

Effectively subsampled quadratures A bivariate example with a total degree
basis, ζ1 ∈ [−2, 2] and ζ2 ∈ [−2, 2.5] for f (ζ1, ζ2) = ζ1 − 2ζ2 1ζ2 − 3ζ4 1 + 88.3. (22) We use Legendre polynomials in both directions and subsample points from a 4th order Gauss-Legendre quadrature tensor grid. Figure: Comparison of tensor grid (left) with effectively subsampled quadratures (right). The former uses 25 points while the latter uses 12. New quadrature strategies 43

CFD 2.0 Things to know Getting ready for CFD 2.0 44

CFD 2.0 ▶ Recall, the central equation of this talk:
minimize ∥Ax − b∥2 2 . ▶ We have so far assumed that b is ﬁxed. ▶ In CFD 2.0, b is no longer a deterministic vector, as each function call will have f (ζi) = fi ± f′ i, (23) variability characterized by an interval or distribution*. ▶ Naively, we can assume that the variation in b ∼ N (µb, Σb) and propagate through. *Owing to epistemic uncertainty work by Iaccarino, Duraisamy, Gorle, Emory and others Getting ready for CFD 2.0 45

Getting ready for CFD 2.0 ▶ Can derive uncertainty in
coefficients x µx = Pµb and Σx = PΣbPT, (24) where P is the pseudoinverse of A. ▶ These can be used to extract confidence intervals—polynomial approximation posterior variances*. *python file-new-cfd.py Getting ready for CFD 2.0 46

CFD 2.0 Things to know Things to know 47

What’s next? ▶ Focus of this talk has been: minimize
∥Ax − b∥2 2 . ▶ But there are other approaches for estimating x, such as LASSO, basis pursuit, etc. ▶ Regardless of how we compute x, estimating moments are trivial E [f] = x0 and σ2 [f] = n−1 ∑ i=1 x2 i . (25) ▶ So what else can we do? Things to know 48

Things to know ▶ Estimate Sobol’ indices ▶ Discover and
exploit dimension reducing subspaces ▶ Study impact of correlations in inputs ▶ Monte Carlo ﬁltering ▶ Estimate gradients Things to know 49

Things to know: Polynomials can aid dimension reduction (e.g., Hokanson
& Constantine 2018, Seshadri et al. 2018, Yuchi et al. 2018.) ▶ Can use polynomials to ﬁnd dimension reducing subspaces. Can also construct polynomials on dimension reducing subspaces. ▶ Efﬁciency (CFD simulations) with ζ ∈ R25 with m = 250 *. *python active.py Things to know 50

Things to know: Estimating Sobol’ indices are straightforward! (e.g., Sudret
2008.) ▶ Can use polynomial surrogates to compute ﬁrst (and total) order Sobol’ indices†. †python active.py Things to know 51

Things to know: A new automation is enroute ▶ Play
the app! Things to know 52

Polynomial uncertainty quantification in the ag...

Polynomial uncertainty quantification in the age of AI

More Decks by Pranay Seshadri

Other Decks in Research

Featured

Transcript