Hickernell, Jagadeeswaran Rathinavel & Aleksei Sorokin Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Thanks to the GAIL and QMCPy teams Thanks to the organizers, especially during these unusual times Disclosure: I am multicultural (computational mathematician & statistician) Slides at speakerdeck.com/fjhickernell/dagstuhl-bayesian-cubature-ld-qmcpy Probabilistic Numerics @ Dagstuhl, October 27, 2021
References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Sample f at x1, . . . , xn Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε 2/14
References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Sample f at x1, . . . , xn Tune the hyperparameters by empirical Bayes so that f is not an outlier Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε 2/14
References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Choose Cθ and x1, x2, . . . for a fast eigenvector-eigenvalue decomposition of the Gram matrix Cθ = Cθ(xi, xj) n i,j=1 ; this is a practical choice, rather than informed by f (see also Kanagawa, Shäfer, Wenger) Sample f at x1, . . . , xn Tune the hyperparameters by empirical Bayes so that f is not an outlier Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε 2/14
References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Choose Cθ and x1, x2, . . . for a fast eigenvector-eigenvalue decomposition of the Gram matrix Cθ = Cθ(xi, xj) n i,j=1 ; this is a practical choice, rather than informed by f (see also Kanagawa, Shäfer, Wenger) Sample f at x1, . . . , xn Tune the hyperparameters by empirical Bayes so that f is not an outlier Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε Implement in software: GAIL [1] and QMCPy [2] (see also Naslidynk, Pleiss, ProbNum) 2/14
References Low Discrepancy (LD) Sequences for Quasi-Monte Carlo (QMC) Methods IID sequences have clusters and gaps 0 1 x i ,1 0 1 x i ,2 n = 64 0 1 x i ,1 0 1 x i ,2 n = 128 0 1 x i ,1 0 1 x i ,2 n = 256 IID Uniform Points 3/14
References Low Discrepancy (LD) Sequences for Quasi-Monte Carlo (QMC) Methods LD lattice sequences [7–9] are more even than IID and give faster convergence 0 1 x i , 1 0 1 x i , 2 n = 64 0 1 x i , 1 0 1 x i , 2 n = 128 0 1 x i , 1 0 1 x i , 2 n = 256 Lattice Points Cθ(t, x) = Kθ(t − x mod 1), e.g., Kθ(t) = d k=1 1 + ηk(−1)r+1 B2r(tk) Bernoulli polynomial , θ = (r, η) 3/14
References Low Discrepancy (LD) Sequences for Quasi-Monte Carlo (QMC) Methods LD digital sequences [9–11] (e.g., Sobol’) are more even than IID and give faster convergence 0 1 x i , 1 0 1 x i , 2 n = 64 0 1 x i , 1 0 1 x i , 2 n = 128 0 1 x i , 1 0 1 x i , 2 n = 256 Sobol Points Cθ(t, x) = Kθ(t ⊖ digitwise subtraction x), e.g., Kθ(t) = d k=1 1 + ηk ωr(tk) explicit , θ = (r, η) 3/14
References Tuning Hyperparameters in GP(m, s2Cθ ) and the Credible Interval [3–6] For these matched data sites and covariance kernels [0,1]d Cθ(t, x) dt = 1 ∀x ∈ [0, 1]d, Cθ = Cθ(xi, xj) n i,j=1 = Cθ1, . . . , Cθn) = 1 n VΛθVH V explict (complex exponential/Hadamard), V1 = 1 b → b := VHb is FFT/FHWT O(n log n), λθ := diag(Λθ) = VHCθ1 = Cθ1 which leads to the approximation and credible interval: ^ µ = y1 n = 1 n n i=1 f(xi) = sample mean, y = f(x1), . . . , f(xn) T , y = VHy P |µ − ^ µ| ⩽ err = 99%, err = 2.58 n n i=2 |yi|2 λθi 1 − n λθ1 4/14
References Tuning Hyperparameters in GP(m, s2Cθ ) and the Credible Interval [3–6] For these matched data sites and covariance kernels b → b := VHb is FFT/FHWT O(n log n), λθ := diag(Λθ) = VHCθ1 = Cθ1 which leads to the approximation and credible interval: ^ µ = y1 n = 1 n n i=1 f(xi) = sample mean, y = f(x1), . . . , f(xn) T , y = VHy P |µ − ^ µ| ⩽ err = 99%, err = 2.58 n n i=2 |yi|2 λθi 1 − n λθ1 θopt = argmin θ log i=2 |yi|2 λθi + 1 n n i=1 log(λθi) 4/14
References Our Bayesian Stopping Criteria into Software Our research on stopping criteria (Bayesian and otherwise) has been implemented in GAIL [1] MATLAB library (born in 2013) QMCPy [2] Python library (born in 2019) We aim to provide software that Includes the best—mostly quasi-Monte Carlo (QMC)—algorithms available Is well-tested Is an easy on-ramp for practitioners Provides use cases for algorithm developers Plays well with other packages 5/14
References Slow Speed computations here Example Stopping Criterion d Absolute Tolerance n Time Keister Bayesian Lattice [4] 5 0.01 210 0.725 s Keister Bayesian Sobol’ [6] 212 0.102 s Keister Track Fourier Coef. Lattice [14] 213 0.024 s Keister Track Fourier Coef. Sobol’ [15] 213 0.012 s Option Pricing Bayesian Lattice [4] 13 0.01 211 0.128 s Option Pricing Track Fourier Coef. Lattice [14] 212 0.034 s Computational load in optimizing for the kernel parameters May need to reduce the number of hyperparameters to ensure correct answer Inefficient code Speed may not be a problem if function evaluation is expensive 8/14
References Checking the Gaussian Process Assumption If f(x1), . . . , f(xn) T has a known multivariate Gaussian distribution, then the correct transformation of this data has a standard univariate Gaussian distribution Here are QQ plots for the Keister example with different randomized lattice data sites and n = 64 −3 −2 −1 0 1 2 3 Standard Gaussian Quantiles −2 0 2 4 6 Data Quantiles d =3,n =64,ropt =5.28,θopt =0.89 −3 −2 −1 0 1 2 3 Standard Gaussian Quantiles −3 −2 −1 0 1 2 3 4 5 Data Quantiles d =3,n =64,ropt =5.10,θopt =0.93 −3 −2 −1 0 1 2 3 Standard Gaussian Quantiles −3 −2 −1 0 1 2 3 4 Data Quantiles d =3,n =64,ropt =5.18,θopt =0.95 Should we be concerned? 9/14
References Summary Bayesian cubature using credible intervals as stopping criteria often gives correct answers Matching sampling sites (designs) with covariance kernels expedites the computation Tuning the hyperparameters can add significant computational load Challenges of speeding up the computation and finding the right kernels or hyperparameters still remain 10/14
References References I 1. Choi, S.-C. T. et al. GAIL: Guaranteed Automatic Integration Library (Versions 1.0–2.3.2). MATLAB software, http://gailgithub.github.io/GAIL_Dev/. 2021. 2. Choi, S.-C. T., H., F. J., Jagadeeswaran, R., McCourt, M. & Sorokin, A. QMCPy: A quasi-Monte Carlo Python Library. 2020. https://qmcsoftware.github.io/QMCSoftware/. 3. H., F. J. & Jagadeeswaran, R. Comment on “Probabilistic Integration: A Role in Statistical Computation?” Statist. Sci. 34, 23–28 (2019). 4. Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Lattice Sampling. Stat. Comput. 29, 1215–1229 (2019). 5. Jagadeeswaran, R. Fast Automatic Bayesian Cubature Using Matching Kenrels and Designs. PhD thesis (Illinois Institute of Technology, 2019). 12/14
References References II 6. Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Sobol’ Sampling. In preparation for submission for publication. 2021+. 7. Sloan, I. H. & Joe, S. Lattice Methods for Multiple Integration. (Oxford University Press, Oxford, 1994). 8. Lemieux, C. Monte Carlo and quasi-Monte Carlo Sampling. (Springer Science+Business Media, Inc., New York, 2009). 9. Dick, J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). 10. Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods. (SIAM, Philadelphia, 1992). 11. Dick, J. & Pillichshammer, F. Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. (Cambridge University Press, Cambridge, 2010). 13/14
References References III 12. Keister, B. D. Multidimensional Quadrature Algorithms. Computers in Physics 10, 119–122 (1996). 13. Glasserman, P. Monte Carlo Methods in Financial Engineering. (Springer-Verlag, New York, 2004). 14. Jiménez Rugama, L. A. & H., F. J. Adaptive Multidimensional Integration Based on Rank-1 Lattices. in Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Leuven, Belgium, April 2014 (eds Cools, R. & Nuyens, D.) 163. arXiv:1411.1966 (Springer-Verlag, Berlin, 2016), 407–422. 15. H., F. J. & Jiménez Rugama, L. A. Reliable Adaptive Cubature Using Digital Sequences. in Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Leuven, Belgium, April 2014 (eds Cools, R. & Nuyens, D.) 163. arXiv:1410.8615 [math.NA] (Springer-Verlag, Berlin, 2016), 367–383. 14/14