Probabilistic Numerics 2018 April 11 London

Slide 1

Slide 1 text

Adaptive Probabilistic Numerical Methods Using Fast Transforms Fred J. Hickernell & Jagadees Rathinavel Department of Applied Mathematics Center for Interdisciplinary Scientiﬁc Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Thanks to the organizers, the GAIL team, NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) SAMSI-Lloyds-Turing Workshop on Probabilistic Numerical Methods, April 11, 2018

Slide 2

Slide 2 text

Introduction MLE Matching Kernels and Designs Examples Summary References When Do We Stop? Compute the solution to a linear problem: sol(f) = $ ’ ’ ’ & ’ ’ ’ % ż Rd f(x) (x) dx Bayesian inference, ﬁnancial risk, statistical physics, ... f(x) surrogate modeling for computer experiments, ... . . . Desired solution: An adaptive algorithm, app(¨, ¨) of the form app(f, ε) = w0,n + n ÿ i=1 wi,n f(xi ), where n, txiu∞ i=1 , w0,n , and w = (wi,n )n i=1 are chosen to guarantee sol(f) ´ app(f, ε) ď ε with high probability @ε ą 0, reasonable f 2/12

Slide 3

Slide 3 text

Introduction MLE Matching Kernels and Designs Examples Summary References The Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Deﬁning c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. 3/12

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Introduction MLE Matching Kernels and Designs Examples Summary References Maximum Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) ﬁrst with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, 4/12

Slide 7

Slide 7 text

Introduction MLE Matching Kernels and Designs Examples Summary References Maximum Likelihood Estimation mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, Q1: How large a family of kernels, Cθ is needed in practice to be conﬁdent in the error bound? Q2: Are the better ways of ﬁnding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives 4/12

Slide 8

Slide 8 text

Introduction MLE Matching Kernels and Designs Examples Summary References Low Discrepancy Sampling Suppose that the domain is [0, 1]d. Low discrepancy sampling places the xi more evenly than IID sampling IID points Sobol’ points Integration lattice points ¨¨¨ Dick, J. et al. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013), H., F. J. et al. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research-programs/2017-18-program-quasi- monte-carlo-high-dimensional-sampling-methods-applied-mathematics-qmc/. 5/12

Slide 9

Slide 9 text

Introduction MLE Matching Kernels and Designs Examples Summary References Covariance Kernels that Match the Design Suppose that the covariance kernel, Cθ , and the design, txiun i=1 , have special properties: C = Cθ(xi , xj ) n i,j=1 = C1 , . . . , Cn = 1 n VΛVH, VH = nV´1, Λ = diag(λ1 , . . . , λn ) = diag(λ) V = V1 ¨ ¨ ¨ Vn = v1 ¨ ¨ ¨ vn T , V1 = v1 = 1 Suppose that VTz is a fast transform (O(n log n) cost) applied to z. Then it follows that λ = VTC1 is fast, C´11 = 1 λ1 Let y be the observed function values. Recall c = sol¨(sol¨¨(Cθ(¨, ¨¨))), and let ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T Then using the MLE estimates, the approximate solution and the stopping criterion become: app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε 6/12

Slide 10

Slide 10 text

Introduction MLE Matching Kernels and Designs Examples Summary References Covariance Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε 7/12

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Introduction MLE Matching Kernels and Designs Examples Summary References Form of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? 8/12

Slide 14

Slide 14 text

Introduction MLE Matching Kernels and Designs Examples Summary References Option Pricing µ = fair price = ż Rd e´rT max   1 d d ÿ j=1 Sj ´ K, 0   e´zTz/2 (2π)d/2 dz « $13.12 Sj = S0 e(r´σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E´2 IID diﬀ 2E´3 100% 6.1E7 33.000 1E´2 Scr. Sobol’ PCA 1E´3 100% 1.6E4 0.040 1E´2 Scr. Sob. cont. var. PCA 2E´3 100% 4.1E3 0.019 1E´2 Bayes. Latt. PCA 2E´3 99% 1.6E4 0.051 9/12

Slide 15

Slide 15 text

Introduction MLE Matching Kernels and Designs Examples Summary References Gaussian Probability µ = ż [a,b] exp ´1 2 tTΣ´1t a(2π)d det(Σ) dt = ż [0,1]d´1 f(x) dx For some typical choice of a, b, Σ, d = 3; µ « 0.6763 Rel. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E´2 IID 5E´4 100% 8.1E4 0.020 1E´2 Scr. Sobol’ 4E´5 100% 1.0E3 0.005 1E´2 Bayes. Latt. 5E´5 100% 4.1E3 0.023 1E´3 IID 9E´5 100% 2.0E6 0.400 1E´3 Scr. Sobol’ 2E´5 100% 2.0E3 0.006 1E´3 Bayes. Latt. 3E´7 100% 6.6E4 0.076 1E´4 Scr. Sobol’ 4E´7 100% 1.6E4 0.018 1E´4 Bayes. Latt. 6E´9 100% 5.2E5 0.580 1E´4 Bayes. Latt. Smth. 1E´7 100% 3.3E4 0.047 10/12

Slide 16

Slide 16 text

Introduction MLE Matching Kernels and Designs Examples Summary References Recap of Questions Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? Q9: Is this adaptive Bayesian algorithm competitive with others? Q10: What other problems are amenable to matched kernels and designs beyond cubature and function approximation? 11/12

Slide 17

Slide 17 text

Thank you Slides available at www.slideshare.net/fjhickernell/ probabilistic-numerics-2018-April-11-london

Slide 18

Slide 18 text

Introduction MLE Matching Kernels and Designs Examples Summary References Dick, J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). H., F. J., Kuo, F. Y., L’Ecuyer, P. & Owen, A. B. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research- programs/2017-18-program-quasi-monte-carlo-high-dimensional-sampling-methods- applied-mathematics-qmc/. 12/12