SIAM UQ 2020-Bayesian Cubature

The Successes and Challenges of Automatic Bayesian Cubature Fred J.
Hickernell Department of Applied Mathematics Center for Interdisciplinary Scientiﬁc Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell with Jagadeeswaran R. and the GAIL team partially supported by NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) Thanks to the special session organizers SIAM-UQ 2020, March 2020

Introduction Fast Bayesian Transforms Numerical Examples Gaussian Process Diagnostics Summary
References The Guaranteed Automatic Integration Library (GAIL) and QMCPy Teams Sou-Cheng Choi (Chief Data Scientist, Kamakura) Yuhan Ding (IIT PhD ’15, Lecturer, IIT) Lan Jiang (IIT PhD ’16, Compass) Lluís Antoni Jiménez Rugama (IIT PhD ’17, UBS) Jagadeeswaran Rathinavel (IIT PhD ’19, Wi-Tronix) Aleksei Sorokin (IIT BS + MAS ’21 exp.) Tong Xin (IIT MS, UIC PhD ’20 exp.) Kan Zhang (IIT PhD ’20 exp.) Yizhi Zhang (IIT PhD ’18, Jamran Int’l) Xuan Zhou (IIT PhD ’15, JP Morgan) and others Adaptive software libraries GAIL and QMCPy 2/15

References Problem f Want ﬁxed tolerance cubature ALG : L1[0, 1]d × (0, ∞) → R such that [0,1]d f(x) dx − ALG(f, ε) ε ∀ε > 0, for reasonable f 3/15

References Problem f Want ﬁxed tolerance cubature ALG : L1[0, 1]d × (0, ∞) → R such that Pf [0,1]d f(x) dx − ALG(f, ε) ε 99% ∀ε > 0, f ∼ GP(m, s2Cθ) 3/15

References Problem f Want ﬁxed tolerance cubature ALG : L1[0, 1]d × (0, ∞) → R such that Pf [0,1]d f(x) dx − ALG(f, ε) ε 99% ∀ε > 0, f ∼ GP(m, s2Cθ) design or node array X = (x1 , . . . , xn )T ∈ [0, 1]n×d, function data f = f(X) ∈ Rn c0θ = [0,1]d×[0,1]d Cθ(t, x) dtdx > 0, cθ = [0,1]d Cθ(X, t) dt ∈ [0, 1]n, Cθ = Cθ(X, X) ∈ [0, 1]n×n [0,1]d f(x) dx (f = y) ∼ N m[1 − cT θ C−1 θ 1] + cT θ C−1y ALG(f,ε) , s2(c0θ − cT θ C−1 θ cθ) Choosing n large enough to make 2.58s c0θ − cT θ C−1 θ cθ ε would seem to achieve our goal 3/15

References Problem f Want ﬁxed tolerance cubature ALG : L1[0, 1]d × (0, ∞) → R such that Pf [0,1]d f(x) dx − ALG(f, ε) ε 99% ∀ε > 0, f ∼ GP(m, s2Cθ) design or node array X = (x1 , . . . , xn )T ∈ [0, 1]n×d, function data f = f(X) ∈ Rn c0θ = [0,1]d×[0,1]d Cθ(t, x) dtdx > 0, cθ = [0,1]d Cθ(X, t) dt ∈ [0, 1]n, Cθ = Cθ(X, X) ∈ [0, 1]n×n [0,1]d f(x) dx (f = y) ∼ N m[1 − cT θ C−1 θ 1] + cT θ C−1y ALG(f,ε) , s2(c0θ − cT θ C−1 θ cθ) Choosing n large enough to make 2.58s c0θ − cT θ C−1 θ cθ ε would seem to achieve our goal Issues requiring attention Inference of m, s, θ, and Cθ , which urn f comes from Ill-conditioning and numerical cost of vector-matrix calculations Whether Gaussian process is a reasonable assumption 3/15

References Inferring Gaussian Process Parameters for GP(m, s2Cθ ) f Using empirical Bayes mEB = 1TC−1 θ y 1TC−1 θ 1 , s2 EB = 1 n yT C−1 θ − C−1 θ 11TC−1 θ 1TC−1 θ 1 y, θEB = argmin θ log yT C−1 θ − C−1 θ 11TC−1 θ 1TC−1 θ 1 y + 1 n log(det(Cθ)) , ALG(f, ε) = (1 − 1TC−1 θ cθ)1 1TC−1 θ 1 + cθ T C−1 θ y when 2.58sEB c0θ − cT θ C−1 θ cθ ε Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Lattice Sampling. Stat. Comput. 29, 1215–1229 (2019). 4/15

References Inferring Gaussian Process Parameters for GP(m, s2Cθ ) f Using empirical Bayes mEB = 1TC−1 θ y 1TC−1 θ 1 , s2 EB = 1 n yT C−1 θ − C−1 θ 11TC−1 θ 1TC−1 θ 1 y, θEB = argmin θ log yT C−1 θ − C−1 θ 11TC−1 θ 1TC−1 θ 1 y + 1 n log(det(Cθ)) , ALG(f, ε) = (1 − 1TC−1 θ cθ)1 1TC−1 θ 1 + cθ T C−1 θ y when 2.58sEB c0θ − cT θ C−1 θ cθ ε Ill-conditioning and numerical cost of vector-matrix calculations Whether Gaussian process is a reasonable assumption Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Lattice Sampling. Stat. Comput. 29, 1215–1229 (2019). 4/15

References Fast Bayesian Transforms in General Find a kernel Cθ to match the design X so that Cθ = 1 n VΛθ VH, VH = nV−1, V = (v1 , . . . , vn )T = (V1 , . . . , Vn ) known analytically v1 = V1 = 1, cθ = 1, b := VHb requires only O(n log(n)) operations ∀b. Cθ is a fast Bayesian transform kernel and b → VHb a fast Bayesian transform (FBT) Then by empirical Bayes y = FBT of function data y, λθ = diag(Λθ) = (λθ,1 , . . . , λθ,n )T = Cθ,1 = FBT of ﬁrst column of Cθ θEB = argmin θ log n i=2 |yi |2 λθ,i + 1 n n i=1 log(λθ,i ) ALG(f, ε) = y1 n = 1 n n i=1 yi = sample mean when 2.58 n n i=2 |yi |2 λθ,i 1 − n λθ,1 ε Cost is O(n log(n)) times the number of iterations for optimizing θ 5/15

References Types of FBT Kernels Cθ(t, x) = Kθ(x t), {xi }2m i=1 = aﬃne shift of a group under ⊕ for m = 0, 1, . . . 6/15

References Types of FBT Kernels Cθ(t, x) = Kθ(x t), {xi }2m i=1 = aﬃne shift of a group under ⊕ for m = 0, 1, . . . Shifted Lattice Nodes, ⊕ = addition mod1 Scrambled Sobol’ Nodes, ⊕ = bitwise addition FBT = Fast Fourier Transform FBT = Fast Walsh Transform 6/15

References Flexible FBT Kernel for Lattice Node Designs Cθ(t, x) = Kθ(x t) must be positive definite, where ⊕ = addition mod1. Common example is Kθ(x) = d j=1 [1 + a B2r (xj )] = d j=1 [1 + aκr (xj )], θ = (a, r) ∈ (0, ∞) × (1, ∞), r = 2r κr (x) := |k| 1 exp(2π √ −1kx) |k|r B2r closed form, but r ∈ N; κr defined for r > 1, but infinite sum 7/15

References Flexible FBT Kernel for Lattice Node Designs Cθ(t, x) = Kθ(x t) must be positive definite, where ⊕ = addition mod1. Common example is Kθ(x) = d j=1 [1 + a B2r (xj )] = d j=1 [1 + aκr (xj )], θ = (a, r) ∈ (0, ∞) × (1, ∞), r = 2r κr (x) := |k| 1 exp(2π √ −1kx) |k|r B2r closed form, but r ∈ N; κr defined for r > 1, but infinite sum But all we need to compute the integral and credible interval is y and λθ = Cθ,1 , where Cθ,1 = Kθ(xi x1 ) n i=1 , which only depends on κr ( /n) for = 0, . . . , n − 1 7/15

References Flexible FBT Kernel for Lattice Node Designs Cθ(t, x) = Kθ(x t) must be positive definite, where ⊕ = addition mod1. Common example is Kθ(x) = d j=1 [1 + a B2r (xj )] = d j=1 [1 + aκr (xj )], θ = (a, r) ∈ (0, ∞) × (1, ∞), r = 2r κr (x) := |k| 1 exp(2π √ −1kx) |k|r B2r closed form, but r ∈ N; κr defined for r > 1, but infinite sum But all we need to compute the integral and credible interval is y and λθ = Cθ,1 , where Cθ,1 = Kθ(xi x1 ) n i=1 , which only depends on κr ( /n) for = 0, . . . , n − 1 Moreover, κr ( /n) n−1 =0 can be computed using one FFT details 7/15

References Keister’s Example Rd cos( t ) exp(− t 2) dt = [0,1]d f(x) dx ε = 5 × 10−3, d = 3 Method MC Lattice Sobol BayesLat BayesSobol Absolute Error 1.40 × 10−3 5.20 × 10−4 5.40 × 10−4 4.10 × 10−4 6.80 × 10−4 Tolerance Met 100% 100% 100% 100% 100% n 2 600 000 4100 3900 510 1900 Time (seconds) 0.1400 0.0110 0.0091 0.0034 0.0410 Algorithms are implemented in GAIL and soon QMCPy Choi, S.-C. T., Ding, Y., H., F. J., Jiang, L., Jiménez Rugama, L. A., Li, D., Jagadeeswaran, R., Tong, X., Zhang, K., et al. GAIL: Guaranteed Automatic Integration Library (Versions 1.0–2.2). MATLAB software. 2013–2017. http://gailgithub.github.io/GAIL_Dev/. Choi, S.-C. T., H., F. J., McCourt, M. & Sorokin, A. QMCPy: A quasi-Monte Carlo Python Library. 2020+. https://github.com/QMCSoftware/QMCSoftware. Keister, B. D. Multidimensional Quadrature Algorithms. Computers in Physics 10, 119–122 (1996). 8/15

References Gaussian Probability (a,b) exp −1 2 tTΣ−1t (2π)d det(Σ) dt = [0,1]d−1 f(x) dx by Genz’s transformation ε = 1 × 10−4, d = 5, Σ = 0.4I + 0.611T, a = (−∞, . . . , −∞), b ∼ √ dU[0, 1]d Method MC Lattice Sobol’ BayesLat BayesSobol Absolute Error 2.00 × 10−5 5.00 × 10−6 4.10 × 10−6 9.20 × 10−6 3.40 × 10−6 Tolerance Met 100% 100% 100% 100% 100% n 62 000 000 4100 4100 2000 4100 Time (seconds) 17.0000 0.0110 0.0097 0.0880 0.0950 Algorithms are implemented in GAIL and soon QMCPy Choi, S.-C. T., Ding, Y., H., F. J., Jiang, L., Jiménez Rugama, L. A., Li, D., Jagadeeswaran, R., Tong, X., Zhang, K., et al. GAIL: Guaranteed Automatic Integration Library (Versions 1.0–2.2). MATLAB software. 2013–2017. http://gailgithub.github.io/GAIL_Dev/. Choi, S.-C. T., H., F. J., McCourt, M. & Sorokin, A. QMCPy: A quasi-Monte Carlo Python Library. 2020+. https://github.com/QMCSoftware/QMCSoftware. Genz, A. Comparison of Methods for the Computation of Multivariate Normal Probabilities. Computing Science and Statistics 25, 400–405 (1993). 9/15

References Is f a Typical Instance of a Gaussian Process? If f ∈ GP(m, s2Cθ), then Z := s−1 n−1VΛ−1/2 θ VH C−1/2 θ (f − m1) ∼ N(0, I). Generate the data z = 1 nsEB VΛ−1/2 θEB VH(y − mEB 1) = n i=2 |yi |2 λθEB ,i −1 VΛ−1/2 θEB (y − y1 e1 ) Q-Q plots of z vs. standard Gaussian 10/15

References Is f a Typical Instance of a Gaussian Process? If f ∈ GP(m, s2Cθ), then Z := s−1 n−1VΛ−1/2 θ VH C−1/2 θ (f − m1) ∼ N(0, I). Generate the data z = 1 nsEB VΛ−1/2 θEB VH(y − mEB 1) = n i=2 |yi |2 λθEB ,i −1 VΛ−1/2 θEB (y − y1 e1 ) Q-Q plots of z vs. standard Gaussian Random function ﬁtting kernel with given r, θ f(x) = f0 + k∈{1,...,N}d fc (k) cos(2πkTx) + fs (k) sin(2πkTx) f0 , fc (k), fs (k) IID ∼ N 0, a k 0 bd− k 0 kj=0 k−r j θ = a/b, N = 256 10/15

References Is f a Typical Instance of a Gaussian Process? If f ∈ GP(m, s2Cθ), then Z := s−1 n−1VΛ−1/2 θ VH C−1/2 θ (f − m1) ∼ N(0, I). Generate the data z = 1 nsEB VΛ−1/2 θEB VH(y − mEB 1) = n i=2 |yi |2 λθEB ,i −1 VΛ−1/2 θEB (y − y1 e1 ) Q-Q plots of z vs. standard Gaussian f from Keister’s example True r and θ unknown 10/15

References The Successes of Bayesian Cubature f Credible intervals lead to adaptive cubature algorithms that provide answers to the desired tolerances Low discrepancy sampling provides cubatures requiring fewer samples that IID sampling Empirical Bayes and other methods can infer reasonable Gaussian process parameters for a family of covairance kernels—choosing a reasonable urn for f Covariance kernels that match the low discrepancy sampling facilitate fast Bayeian transforms for ﬁtting the Gaussian process parameters, computing the cubature, and constructing the credible intervals Extra eﬀort provides a richer family of covariance kernels 11/15

References The Challenges of Bayesian Cubature f Inferring the Gaussian process parameters and allowing a larger family of kernels takes extra computational time Overfitting as well as underfitting the kernel are possible The Gaussian assumption is not always justified by the data Which periodizing transformations of f are appropriate for lattice sampling and its matching kernels? 12/15

Thank you These slides are available at speakerdeck.com/fjhickernell/siam-uq-2020-bayesian-cubature

References References Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Lattice Sampling. Stat. Comput. 29, 1215–1229 (2019). Choi, S.-C. T. et al. GAIL: Guaranteed Automatic Integration Library (Versions 1.0–2.2). MATLAB software. 2013–2017. http://gailgithub.github.io/GAIL_Dev/. Choi, S.-C. T., H., F. J., McCourt, M. & Sorokin, A. QMCPy: A quasi-Monte Carlo Python Library. 2020+. https://github.com/QMCSoftware/QMCSoftware. Keister, B. D. Multidimensional Quadrature Algorithms. Computers in Physics 10, 119–122 (1996). Genz, A. Comparison of Methods for the Computation of Multivariate Normal Probabilities. Computing Science and Statistics 25, 400–405 (1993). 14/15

References Computing κ(x) for x = 0, 1/n . . . , 1 − 1/n ζ(s, a) := ∞ m=0 (a + m)−s Hurwitz zeta function κr ( /n) = |k| 1 exp(2π √ −1k /n) |k|r = 2 nr ∞ m=1 1 |m|r + n−1 k=1 −∞ m=−∞ exp(2π √ −1k /n) |k + mn|r = 1 nr 2ζ(r) + n−1 k=1 exp(2π √ −1k /n)[ζ(r, k/n) + ζ(r, 1 − k/n)] κr = (κ( /n))n−1 =0 = WHκr κr := n−r(2ζ(r), ζ(r, 1/n) + ζ(r, (n − 1)/n), . . . , ζ(r, (n − 1)/n) + ζ(r, 1/n))T W = (exp(2π √ −1k /n))n−1 k, =0 return 15/15

SIAM UQ 2020-Bayesian Cubature

SIAM UQ 2020-Bayesian Cubature

More Decks by Fred J. Hickernell

Other Decks in Research

Featured

Transcript