Slide 1

Slide 1 text

Bayesian Cubature with Low Discrepancy Sequences in QMCPy Fred J. Hickernell, Jagadeeswaran Rathinavel & Aleksei Sorokin Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Thanks to the GAIL and QMCPy teams Thanks to the organizers, especially during these unusual times Disclosure: I am multicultural (computational mathematician & statistician) Slides at speakerdeck.com/fjhickernell/dagstuhl-bayesian-cubature-ld-qmcpy Probabilistic Numerics @ Dagstuhl, October 27, 2021

Slide 2

Slide 2 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Sample f at x1, . . . , xn Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε 2/14

Slide 3

Slide 3 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Sample f at x1, . . . , xn Tune the hyperparameters by empirical Bayes so that f is not an outlier Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε 2/14

Slide 4

Slide 4 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Choose Cθ and x1, x2, . . . for a fast eigenvector-eigenvalue decomposition of the Gram matrix Cθ = Cθ(xi, xj) n i,j=1 ; this is a practical choice, rather than informed by f (see also Kanagawa, Shäfer, Wenger) Sample f at x1, . . . , xn Tune the hyperparameters by empirical Bayes so that f is not an outlier Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε 2/14

Slide 5

Slide 5 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Our take on Bayesian Quadrature Cubature [3–6] µ := [0,1]d f(x) dx =? ^ µn = function of f(x1), . . . , f(xn) Want P[|µ − ^ µn| ⩽ ε] ⩾ 99% Assume we are handed f ∼ GP(m, s2Cθ), so E[f(x)] = m and E {f(t) − m}{f(x) − m} = s2Cθ(t, x) Choose Cθ and x1, x2, . . . for a fast eigenvector-eigenvalue decomposition of the Gram matrix Cθ = Cθ(xi, xj) n i,j=1 ; this is a practical choice, rather than informed by f (see also Kanagawa, Shäfer, Wenger) Sample f at x1, . . . , xn Tune the hyperparameters by empirical Bayes so that f is not an outlier Construct a credible interval for µ in terms of the posterior mean ^ µn Increase n until the half-width is no greater than ε Implement in software: GAIL [1] and QMCPy [2] (see also Naslidynk, Pleiss, ProbNum) 2/14

Slide 6

Slide 6 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Low Discrepancy (LD) Sequences for Quasi-Monte Carlo (QMC) Methods IID sequences have clusters and gaps 0 1 x i ,1 0 1 x i ,2 n = 64 0 1 x i ,1 0 1 x i ,2 n = 128 0 1 x i ,1 0 1 x i ,2 n = 256 IID Uniform Points 3/14

Slide 7

Slide 7 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Low Discrepancy (LD) Sequences for Quasi-Monte Carlo (QMC) Methods LD lattice sequences [7–9] are more even than IID and give faster convergence 0 1 x i , 1 0 1 x i , 2 n = 64 0 1 x i , 1 0 1 x i , 2 n = 128 0 1 x i , 1 0 1 x i , 2 n = 256 Lattice Points Cθ(t, x) = Kθ(t − x mod 1), e.g., Kθ(t) = d k=1 1 + ηk(−1)r+1 B2r(tk) Bernoulli polynomial , θ = (r, η) 3/14

Slide 8

Slide 8 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Low Discrepancy (LD) Sequences for Quasi-Monte Carlo (QMC) Methods LD digital sequences [9–11] (e.g., Sobol’) are more even than IID and give faster convergence 0 1 x i , 1 0 1 x i , 2 n = 64 0 1 x i , 1 0 1 x i , 2 n = 128 0 1 x i , 1 0 1 x i , 2 n = 256 Sobol Points Cθ(t, x) = Kθ(t ⊖ digitwise subtraction x), e.g., Kθ(t) = d k=1 1 + ηk ωr(tk) explicit , θ = (r, η) 3/14

Slide 9

Slide 9 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Eigenvector-Eigenvalue Decomposition of the Gram Matrix [3–6] For these matched data sites and covariance kernels [0,1]d Cθ(t, x) dt = 1 ∀x ∈ [0, 1]d, Cθ = Cθ(xi, xj) n i,j=1 = Cθ1, . . . , Cθn) = 1 n VΛθVH V explict (complex exponential/Hadamard), V1 = 1 b → b := VHb is FFT/FHWT O(n log n), λθ := diag(Λθ) = VHCθ1 = Cθ1 4/14

Slide 10

Slide 10 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Tuning Hyperparameters in GP(m, s2Cθ ) and the Credible Interval [3–6] For these matched data sites and covariance kernels [0,1]d Cθ(t, x) dt = 1 ∀x ∈ [0, 1]d, Cθ = Cθ(xi, xj) n i,j=1 = Cθ1, . . . , Cθn) = 1 n VΛθVH V explict (complex exponential/Hadamard), V1 = 1 b → b := VHb is FFT/FHWT O(n log n), λθ := diag(Λθ) = VHCθ1 = Cθ1 which leads to the approximation and credible interval: ^ µ = y1 n = 1 n n i=1 f(xi) = sample mean, y = f(x1), . . . , f(xn) T , y = VHy P |µ − ^ µ| ⩽ err = 99%, err = 2.58 n n i=2 |yi|2 λθi 1 − n λθ1 4/14

Slide 11

Slide 11 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Tuning Hyperparameters in GP(m, s2Cθ ) and the Credible Interval [3–6] For these matched data sites and covariance kernels b → b := VHb is FFT/FHWT O(n log n), λθ := diag(Λθ) = VHCθ1 = Cθ1 which leads to the approximation and credible interval: ^ µ = y1 n = 1 n n i=1 f(xi) = sample mean, y = f(x1), . . . , f(xn) T , y = VHy P |µ − ^ µ| ⩽ err = 99%, err = 2.58 n n i=2 |yi|2 λθi 1 − n λθ1 θopt = argmin θ log i=2 |yi|2 λθi + 1 n n i=1 log(λθi) 4/14

Slide 12

Slide 12 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Our Bayesian Stopping Criteria into Software Our research on stopping criteria (Bayesian and otherwise) has been implemented in GAIL [1] MATLAB library (born in 2013) QMCPy [2] Python library (born in 2019) We aim to provide software that Includes the best—mostly quasi-Monte Carlo (QMC)—algorithms available Is well-tested Is an easy on-ramp for practitioners Provides use cases for algorithm developers Plays well with other packages 5/14

Slide 13

Slide 13 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Keister Example in QMCPy computations in Google colab notebook here µ = Rd cos(∥t∥) exp(− ∥t∥2) dt = [0,1]d f(x) dx =? Keister [12] d = 5; tol = 1e-2 lattice = Lattice(dimension = d, → order = "linear") integrand = Keister(lattice) stop_crit = CubBayesLatticeG( → integrand, abs_tol = tol, → ptransform = "none") solution,data = → stop_crit.integrate() print(data) solution 1.136 error_bound 0.009 n_total 1024 time_integrate 0.725 abs_tol 0.010 rel_tol 0 n_init 2^(8) n_max 2^(22) 6/14

Slide 14

Slide 14 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Asian Option Pricing Example in QMCPy computations here option price = µ = Rd payoff(z) discrete BM PDF(z) dz = [0,1]d f(x) dx =? Glasserman [13] volatility=.5; interest_rate =.01 start_price = 30; strike_price = 25 t_final = 1; call_put = "call" mean_type = "arithmetic" d = 13; abs_tol = .01 integrand = AsianOption(Lattice(d, order = " → linear"), volatility, start_price, → strike_price, interest_rate, t_final, → call_put, mean_type) stop_crit = CubBayesLatticeG(integrand, abs_tol → = abs_tol, ptransform = "Baker") stop_crit.one_theta = True solution,data = stop_crit.integrate() print(data) solution 6.261 error_bound 0.004 n_total 2048 time_integrate 0.128 abs_tol 0.010 time_vec [0.077 0.154 0.231 ... → 0.846 0.923 1. ] covariance [[0.077 0.077 0.077 → ... 0.077 0.077 0.077] [0.077 0.154 0.154 ... 0.154 0.154 0 → .154] [0.077 0.154 0.231 ... 0.231 0.231 0 → .231] ... decomp_type PCA 7/14

Slide 15

Slide 15 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Slow Speed computations here Example Stopping Criterion d Absolute Tolerance n Time Keister Bayesian Lattice [4] 5 0.01 210 0.725 s Keister Bayesian Sobol’ [6] 212 0.102 s Keister Track Fourier Coef. Lattice [14] 213 0.024 s Keister Track Fourier Coef. Sobol’ [15] 213 0.012 s Option Pricing Bayesian Lattice [4] 13 0.01 211 0.128 s Option Pricing Track Fourier Coef. Lattice [14] 212 0.034 s Computational load in optimizing for the kernel parameters May need to reduce the number of hyperparameters to ensure correct answer Inefficient code Speed may not be a problem if function evaluation is expensive 8/14

Slide 16

Slide 16 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Checking the Gaussian Process Assumption If f(x1), . . . , f(xn) T has a known multivariate Gaussian distribution, then the correct transformation of this data has a standard univariate Gaussian distribution Here are QQ plots for the Keister example with different randomized lattice data sites and n = 64 −3 −2 −1 0 1 2 3 Standard Gaussian Quantiles −2 0 2 4 6 Data Quantiles d =3,n =64,ropt =5.28,θopt =0.89 −3 −2 −1 0 1 2 3 Standard Gaussian Quantiles −3 −2 −1 0 1 2 3 4 5 Data Quantiles d =3,n =64,ropt =5.10,θopt =0.93 −3 −2 −1 0 1 2 3 Standard Gaussian Quantiles −3 −2 −1 0 1 2 3 4 Data Quantiles d =3,n =64,ropt =5.18,θopt =0.95 Should we be concerned? 9/14

Slide 17

Slide 17 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References Summary Bayesian cubature using credible intervals as stopping criteria often gives correct answers Matching sampling sites (designs) with covariance kernels expedites the computation Tuning the hyperparameters can add significant computational load Challenges of speeding up the computation and finding the right kernels or hyperparameters still remain 10/14

Slide 18

Slide 18 text

Thank you These slides are available at speakerdeck.com/fjhickernell/dagstuhl-bayesian-cubature-ld-qmcpy

Slide 19

Slide 19 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References References I 1. Choi, S.-C. T. et al. GAIL: Guaranteed Automatic Integration Library (Versions 1.0–2.3.2). MATLAB software, http://gailgithub.github.io/GAIL_Dev/. 2021. 2. Choi, S.-C. T., H., F. J., Jagadeeswaran, R., McCourt, M. & Sorokin, A. QMCPy: A quasi-Monte Carlo Python Library. 2020. https://qmcsoftware.github.io/QMCSoftware/. 3. H., F. J. & Jagadeeswaran, R. Comment on “Probabilistic Integration: A Role in Statistical Computation?” Statist. Sci. 34, 23–28 (2019). 4. Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Lattice Sampling. Stat. Comput. 29, 1215–1229 (2019). 5. Jagadeeswaran, R. Fast Automatic Bayesian Cubature Using Matching Kenrels and Designs. PhD thesis (Illinois Institute of Technology, 2019). 12/14

Slide 20

Slide 20 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References References II 6. Jagadeeswaran, R. & H., F. J. Fast Automatic Bayesian Cubature Using Sobol’ Sampling. In preparation for submission for publication. 2021+. 7. Sloan, I. H. & Joe, S. Lattice Methods for Multiple Integration. (Oxford University Press, Oxford, 1994). 8. Lemieux, C. Monte Carlo and quasi-Monte Carlo Sampling. (Springer Science+Business Media, Inc., New York, 2009). 9. Dick, J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). 10. Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods. (SIAM, Philadelphia, 1992). 11. Dick, J. & Pillichshammer, F. Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. (Cambridge University Press, Cambridge, 2010). 13/14

Slide 21

Slide 21 text

Introduction Matched Data Sites & Covariance Kernels QMCPy Challenges Summary References References III 12. Keister, B. D. Multidimensional Quadrature Algorithms. Computers in Physics 10, 119–122 (1996). 13. Glasserman, P. Monte Carlo Methods in Financial Engineering. (Springer-Verlag, New York, 2004). 14. Jiménez Rugama, L. A. & H., F. J. Adaptive Multidimensional Integration Based on Rank-1 Lattices. in Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Leuven, Belgium, April 2014 (eds Cools, R. & Nuyens, D.) 163. arXiv:1411.1966 (Springer-Verlag, Berlin, 2016), 407–422. 15. H., F. J. & Jiménez Rugama, L. A. Reliable Adaptive Cubature Using Digital Sequences. in Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Leuven, Belgium, April 2014 (eds Cools, R. & Nuyens, D.) 163. arXiv:1410.8615 [math.NA] (Springer-Verlag, Berlin, 2016), 367–383. 14/14