Slide 1

Slide 1 text

Adaptive Probabilistic Numerical Methods Using Fast Transforms Fred J. Hickernell & Jagadees Rathinavel Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology hickernell@iit.edu mypages.iit.edu/~hickernell Thanks to the organizers, the GAIL team, NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) SAMSI-Lloyds-Turing Workshop on Probabilistic Numerical Methods, April 11, 2018

Slide 2

Slide 2 text

Introduction MLE Matching Kernels and Designs Examples Summary References When Do We Stop? Compute the solution to a linear problem: sol(f) = $ ’ ’ ’ & ’ ’ ’ % ż Rd f(x) (x) dx Bayesian inference, financial risk, statistical physics, ... f(x) surrogate modeling for computer experiments, ... . . . Desired solution: An adaptive algorithm, app(¨, ¨) of the form app(f, ε) = w0,n + n ÿ i=1 wi,n f(xi ), where n, txiu∞ i=1 , w0,n , and w = (wi,n )n i=1 are chosen to guarantee sol(f) ´ app(f, ε) ď ε with high probability @ε ą 0, reasonable f 2/12

Slide 3

Slide 3 text

Introduction MLE Matching Kernels and Designs Examples Summary References The Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. 3/12

Slide 4

Slide 4 text

Introduction MLE Matching Kernels and Designs Examples Summary References The Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. There are issues requiring attention. 3/12

Slide 5

Slide 5 text

Introduction MLE Matching Kernels and Designs Examples Summary References Maximum Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) first with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * 4/12

Slide 6

Slide 6 text

Introduction MLE Matching Kernels and Designs Examples Summary References Maximum Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) first with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, 4/12

Slide 7

Slide 7 text

Introduction MLE Matching Kernels and Designs Examples Summary References Maximum Likelihood Estimation mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives 4/12

Slide 8

Slide 8 text

Introduction MLE Matching Kernels and Designs Examples Summary References Low Discrepancy Sampling Suppose that the domain is [0, 1]d. Low discrepancy sampling places the xi more evenly than IID sampling IID points Sobol’ points Integration lattice points ¨¨¨ Dick, J. et al. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013), H., F. J. et al. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research-programs/2017-18-program-quasi- monte-carlo-high-dimensional-sampling-methods-applied-mathematics-qmc/. 5/12

Slide 9

Slide 9 text

Introduction MLE Matching Kernels and Designs Examples Summary References Covariance Kernels that Match the Design Suppose that the covariance kernel, Cθ , and the design, txiun i=1 , have special properties: C = Cθ(xi , xj ) n i,j=1 = C1 , . . . , Cn = 1 n VΛVH, VH = nV´1, Λ = diag(λ1 , . . . , λn ) = diag(λ) V = V1 ¨ ¨ ¨ Vn = v1 ¨ ¨ ¨ vn T , V1 = v1 = 1 Suppose that VTz is a fast transform (O(n log n) cost) applied to z. Then it follows that λ = VTC1 is fast, C´11 = 1 λ1 Let y be the observed function values. Recall c = sol¨(sol¨¨(Cθ(¨, ¨¨))), and let ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T Then using the MLE estimates, the approximate solution and the stopping criterion become: app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε 6/12

Slide 10

Slide 10 text

Introduction MLE Matching Kernels and Designs Examples Summary References Covariance Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε 7/12

Slide 11

Slide 11 text

Introduction MLE Matching Kernels and Designs Examples Summary References Covariance Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? 7/12

Slide 12

Slide 12 text

Introduction MLE Matching Kernels and Designs Examples Summary References Form of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N 8/12

Slide 13

Slide 13 text

Introduction MLE Matching Kernels and Designs Examples Summary References Form of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? 8/12

Slide 14

Slide 14 text

Introduction MLE Matching Kernels and Designs Examples Summary References Option Pricing µ = fair price = ż Rd e´rT max   1 d d ÿ j=1 Sj ´ K, 0   e´zTz/2 (2π)d/2 dz « $13.12 Sj = S0 e(r´σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E´2 IID diff 2E´3 100% 6.1E7 33.000 1E´2 Scr. Sobol’ PCA 1E´3 100% 1.6E4 0.040 1E´2 Scr. Sob. cont. var. PCA 2E´3 100% 4.1E3 0.019 1E´2 Bayes. Latt. PCA 2E´3 99% 1.6E4 0.051 9/12

Slide 15

Slide 15 text

Introduction MLE Matching Kernels and Designs Examples Summary References Gaussian Probability µ = ż [a,b] exp ´1 2 tTΣ´1t a(2π)d det(Σ) dt = ż [0,1]d´1 f(x) dx For some typical choice of a, b, Σ, d = 3; µ « 0.6763 Rel. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E´2 IID 5E´4 100% 8.1E4 0.020 1E´2 Scr. Sobol’ 4E´5 100% 1.0E3 0.005 1E´2 Bayes. Latt. 5E´5 100% 4.1E3 0.023 1E´3 IID 9E´5 100% 2.0E6 0.400 1E´3 Scr. Sobol’ 2E´5 100% 2.0E3 0.006 1E´3 Bayes. Latt. 3E´7 100% 6.6E4 0.076 1E´4 Scr. Sobol’ 4E´7 100% 1.6E4 0.018 1E´4 Bayes. Latt. 6E´9 100% 5.2E5 0.580 1E´4 Bayes. Latt. Smth. 1E´7 100% 3.3E4 0.047 10/12

Slide 16

Slide 16 text

Introduction MLE Matching Kernels and Designs Examples Summary References Recap of Questions Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? Q9: Is this adaptive Bayesian algorithm competitive with others? Q10: What other problems are amenable to matched kernels and designs beyond cubature and function approximation? 11/12

Slide 17

Slide 17 text

Thank you Slides available at www.slideshare.net/fjhickernell/ probabilistic-numerics-2018-April-11-london

Slide 18

Slide 18 text

Introduction MLE Matching Kernels and Designs Examples Summary References Dick, J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). H., F. J., Kuo, F. Y., L’Ecuyer, P. & Owen, A. B. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research- programs/2017-18-program-quasi-monte-carlo-high-dimensional-sampling-methods- applied-mathematics-qmc/. 12/12