Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Probabilistic Numerics 2018 April 11 London

Probabilistic Numerics 2018 April 11 London

Talk at the Probabilistic Numerics Workshop in London at the Alan Turing Institute

Fred J. Hickernell

April 11, 2018
Tweet

More Decks by Fred J. Hickernell

Other Decks in Research

Transcript

  1. Adaptive Probabilistic Numerical Methods Using Fast Transforms Fred J. Hickernell

    & Jagadees Rathinavel Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Thanks to the organizers, the GAIL team, NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) SAMSI-Lloyds-Turing Workshop on Probabilistic Numerical Methods, April 11, 2018
  2. Introduction MLE Matching Kernels and Designs Examples Summary References When

    Do We Stop? Compute the solution to a linear problem: sol(f) = $ ’ ’ ’ & ’ ’ ’ % ż Rd f(x) (x) dx Bayesian inference, financial risk, statistical physics, ... f(x) surrogate modeling for computer experiments, ... . . . Desired solution: An adaptive algorithm, app(¨, ¨) of the form app(f, ε) = w0,n + n ÿ i=1 wi,n f(xi ), where n, txiu∞ i=1 , w0,n , and w = (wi,n )n i=1 are chosen to guarantee sol(f) ´ app(f, ε) ď ε with high probability @ε ą 0, reasonable f 2/12
  3. Introduction MLE Matching Kernels and Designs Examples Summary References The

    Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. 3/12
  4. Introduction MLE Matching Kernels and Designs Examples Summary References The

    Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. There are issues requiring attention. 3/12
  5. Introduction MLE Matching Kernels and Designs Examples Summary References Maximum

    Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) first with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * 4/12
  6. Introduction MLE Matching Kernels and Designs Examples Summary References Maximum

    Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) first with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, 4/12
  7. Introduction MLE Matching Kernels and Designs Examples Summary References Maximum

    Likelihood Estimation mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives 4/12
  8. Introduction MLE Matching Kernels and Designs Examples Summary References Low

    Discrepancy Sampling Suppose that the domain is [0, 1]d. Low discrepancy sampling places the xi more evenly than IID sampling IID points Sobol’ points Integration lattice points ¨¨¨ Dick, J. et al. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013), H., F. J. et al. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research-programs/2017-18-program-quasi- monte-carlo-high-dimensional-sampling-methods-applied-mathematics-qmc/. 5/12
  9. Introduction MLE Matching Kernels and Designs Examples Summary References Covariance

    Kernels that Match the Design Suppose that the covariance kernel, Cθ , and the design, txiun i=1 , have special properties: C = Cθ(xi , xj ) n i,j=1 = C1 , . . . , Cn = 1 n VΛVH, VH = nV´1, Λ = diag(λ1 , . . . , λn ) = diag(λ) V = V1 ¨ ¨ ¨ Vn = v1 ¨ ¨ ¨ vn T , V1 = v1 = 1 Suppose that VTz is a fast transform (O(n log n) cost) applied to z. Then it follows that λ = VTC1 is fast, C´11 = 1 λ1 Let y be the observed function values. Recall c = sol¨(sol¨¨(Cθ(¨, ¨¨))), and let ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T Then using the MLE estimates, the approximate solution and the stopping criterion become: app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε 6/12
  10. Introduction MLE Matching Kernels and Designs Examples Summary References Covariance

    Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε 7/12
  11. Introduction MLE Matching Kernels and Designs Examples Summary References Covariance

    Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? 7/12
  12. Introduction MLE Matching Kernels and Designs Examples Summary References Form

    of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N 8/12
  13. Introduction MLE Matching Kernels and Designs Examples Summary References Form

    of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? 8/12
  14. Introduction MLE Matching Kernels and Designs Examples Summary References Option

    Pricing µ = fair price = ż Rd e´rT max   1 d d ÿ j=1 Sj ´ K, 0   e´zTz/2 (2π)d/2 dz « $13.12 Sj = S0 e(r´σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E´2 IID diff 2E´3 100% 6.1E7 33.000 1E´2 Scr. Sobol’ PCA 1E´3 100% 1.6E4 0.040 1E´2 Scr. Sob. cont. var. PCA 2E´3 100% 4.1E3 0.019 1E´2 Bayes. Latt. PCA 2E´3 99% 1.6E4 0.051 9/12
  15. Introduction MLE Matching Kernels and Designs Examples Summary References Gaussian

    Probability µ = ż [a,b] exp ´1 2 tTΣ´1t a(2π)d det(Σ) dt = ż [0,1]d´1 f(x) dx For some typical choice of a, b, Σ, d = 3; µ « 0.6763 Rel. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E´2 IID 5E´4 100% 8.1E4 0.020 1E´2 Scr. Sobol’ 4E´5 100% 1.0E3 0.005 1E´2 Bayes. Latt. 5E´5 100% 4.1E3 0.023 1E´3 IID 9E´5 100% 2.0E6 0.400 1E´3 Scr. Sobol’ 2E´5 100% 2.0E3 0.006 1E´3 Bayes. Latt. 3E´7 100% 6.6E4 0.076 1E´4 Scr. Sobol’ 4E´7 100% 1.6E4 0.018 1E´4 Bayes. Latt. 6E´9 100% 5.2E5 0.580 1E´4 Bayes. Latt. Smth. 1E´7 100% 3.3E4 0.047 10/12
  16. Introduction MLE Matching Kernels and Designs Examples Summary References Recap

    of Questions Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? Q9: Is this adaptive Bayesian algorithm competitive with others? Q10: What other problems are amenable to matched kernels and designs beyond cubature and function approximation? 11/12
  17. Thank you Slides available at www.slideshare.net/fjhickernell/ probabilistic-numerics-2018-April-11-london

  18. Introduction MLE Matching Kernels and Designs Examples Summary References Dick,

    J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). H., F. J., Kuo, F. Y., L’Ecuyer, P. & Owen, A. B. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research- programs/2017-18-program-quasi-monte-carlo-high-dimensional-sampling-methods- applied-mathematics-qmc/. 12/12