SAMSI-QMC December 2017 Trends Workshop

When to Stop Sampling: Answers and Further Questions Fred J.
Hickernell Department of Applied Mathematics & Center for Interdisciplinary Scientiﬁc Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Thanks to the Guaranteed Automatic Integration Library (GAIL) team and friends Supported by NSF-DMS-1522687 Thanks to SAMSI, the QMC Program organizers, and the Worskshop Organizers Trends and Advances in Monte Carlo Sampling Algorithms, December 12, 2017

Introduction IID Sampling QMC Sampling Bayesian Cubature General Error Criteria
Concluding Remarks References Multidimensional Integration Examples µ = Rd f(x) ν(dx) = option price f(x) = discounted payoﬀ determined by market forces x µ = X g(x) dx = P(X ∈ X) = probability g = probability density function µj µ0 = X bj L(b|data) (b) db X L(b|data) (b) db = Bayesian posterior expectation of βj µ := X g(x)dx = X f(x) ν(dx) =: E[f(X)] ≈ µn := 1 n n i=1 f(xi ) = X f(x) ^ νn (dx) How big should n be so that |µ − µn | ε ? 2/21

Concluding Remarks References Error of (Diﬀerent Kinds of) Monte Carlo Methods µ := X g(x) dx = X f(x) ν(dx) =: E[f(X)] ≈ µn := 1 n n i=1 f(xi ) = X f(x) ^ νn (dx) errn (f) := |µ − µn | = X f(x) make small by importance sampling (ν − ^ νn ) make small by clever sampling (dx) ∀f errn (f) ∀ reasonable f errn (f) data-driven choose n large enough ε errn (f) = size(f) unknown × size(ν − ^ νn ) maybe known (H., 2017+) may choose ^ νn more carefully than IID for random algorithms/integrands, the inequalities above should hold with high probability algorithms described here in the Guaranteed Automatic Integration Library (GAIL) (Choi et al., 2013–2017) 3/21

Concluding Remarks References Stopping IID Sampling µ := X f(x) ν(dx) =: E[f(X)] ≈ µn := 1 n n i=1 f(xi ), xi IID ∼ ν, errn (f) = |µ − µn | The Central Limit Theorem says that P [errn (f) errn (f)] ≈ n→∞ 99%, errn (f) = 2.58σ √ n , σ2 = var(f(X)) 4/21

Concluding Remarks References Stopping IID Sampling µ := X f(x) ν(dx) =: E[f(X)] ≈ µn := 1 n n+nσ i=1+nσ f(xi ), xi IID ∼ ν, errn (f) = |µ − µn | The Central Limit Theorem says that P [errn (f) errn (f)] ≈ n→∞ 99%, errn (f) = 2.58σ √ n errn (f) = 2.58 × 1.2^ σ √ n , ^ σ2 = 1 nσ − 1 nσ i=1 f(xi ) 4/21

Concluding Remarks References Stopping IID Sampling µ := X f(x) ν(dx) =: E[f(X)] ≈ µn := 1 n n+nσ i=1+nσ f(xi ), xi IID ∼ ν, errn (f) = |µ − µn | The Central Limit Theorem says that P [errn (f) errn (f)] ≈ n→∞ 99%, errn (f) = 2.58σ √ n errn (f) = 2.58 × 1.2^ σ √ n , ^ σ2 = 1 nσ − 1 nσ i=1 f(xi ) Choose n = 2.58 × 1.2^ σ ε 2 to make errn (f) ε 4/21

Concluding Remarks References Stopping IID Sampling µ := X f(x) ν(dx) =: E[f(X)] ≈ µn := 1 n n+nσ i=1+nσ f(xi ), xi IID ∼ ν, errn (f) = |µ − µn | H. et al. (2013) (see also Bayer et al. (2014), H. et al. (2017+)) assume that the integrand belongs to the cone of reasonable integrands: C = {f ∈ L4(X, ν) : kurt(f(X)) κup }. Then we may replace n→∞ by using Berry-Esseen inequalities (Nefedova and Shevtsova, 2012; Shevtsova, 2014) with a more complicated formula for errn (f). Bound on the kurtosis also allows us to be highly conﬁdent that σ 1.2^ σ via Cantelli’s inequality. P [errn (f) errn (f)] 99.5%, P(σ 1.2^ σ) 99.5%, ^ σ2 = 1 nσ − 1 nσ i=1 f(xi ) errn (f) = min δ > 0 : Φ − √ nδ/σ + 1 √ n min 0.3328(κ3/4 up + 0.429), 18.1139κ3/4 up 1 + √ nδ/σ 3 0.25% , errn (f) = same as errn (f) but with σ replaced by 1.2^ σ Choose n to make errn (f) ε 4/21

Concluding Remarks References Option Pricing µ = fair price = Rd e−rT max   1 d d j=1 Sj − K, 0   e−zTz/2 (2π)d/2 dz ≈ $13.12 Sj = S0 e(r−σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E−2 IID 2E−3 100% 6.1E7 33.000 5/21

Concluding Remarks References Faster Convergence with Quasi-Monte Carlo Sampling err(f, n) := |µ − µn | = X f(x) make small by importance sampling (ν − ^ νn ) make small by clever sampling (dx) 6/21

Concluding Remarks References Stopping Quasi-Monte Carlo Sampling µ = [0,1]d f(x) dx = f(0), f(x) = k f(k)φk(x), f(k) = [0,1]d f(x)φk (x) dx φk are complex exponentials (for lattices) or Walsh functions (for nets) · · · · · · 7/21

Concluding Remarks References Stopping Quasi-Monte Carlo Sampling µ = [0,1]d f(x) dx = f(0) ≈ µn = fn (0), f(x) = k f(k)φk(x), f(k) = [0,1]d f(x)φk (x) dx φk are complex exponentials (for lattices) or Walsh functions (for nets) · · · · · · errn (f) errn (f) := k∈dual set\{0} f(k) , dual set = {k : φk(xi ) = φk(x1 ), i = 1, . . . , n φk looks like a constant } discrete transform, fn (k) = 1 n n i=1 f(xi )φk (xi ) ∀k, can be computed in O(n log(n)) operations 7/21

Concluding Remarks References Stopping Quasi-Monte Carlo Sampling µ = [0,1]d f(x) dx = f(0) ≈ µn = fn (0), f(x) = k f(k)φk(x), f(k) = [0,1]d f(x)φk (x) dx φk are complex exponentials (for lattices) or Walsh functions (for nets) errn (f) errn (f) := k∈dual set\{0} f(k) , dual set = {k : φk(xi ) = φk(x1 ), i = 1, . . . , n φk looks like a constant } discrete transform, fn (k) = 1 n n i=1 f(xi )φk (xi ) ∀k, can be computed in O(n log(n)) operations H. and Jiménez Rugama (2016), Jiménez Rugama and H. (2016) and (see also H. et al. (2017+)) deﬁne a cone, C, of reasonable integrands whose true Fourier coeﬃcients do not decay too erratically. Then errn (f) errn (f) errn (f) := C(n) moderatek fn (k) ∀f ∈ C Control variates are possible (H. et al., 2017+). 7/21

Concluding Remarks References A Reasonable and an Unreasonable Integrand 8/21

Concluding Remarks References Option Pricing µ = fair price = Rd e−rT max   1 d d j=1 Sj − K, 0   e−zTz/2 (2π)d/2 dz ≈ $13.12 Sj = S0 e(r−σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E−2 IID 2E−3 100% 6.1E7 33.000 1E−2 Scr. Sobol’ 1E−3 100% 1.6E4 0.040 9/21

Concluding Remarks References Option Pricing µ = fair price = Rd e−rT max   1 d d j=1 Sj − K, 0   e−zTz/2 (2π)d/2 dz ≈ $13.12 Sj = S0 e(r−σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E−2 IID 2E−3 100% 6.1E7 33.000 1E−2 Scr. Sobol’ 1E−3 100% 1.6E4 0.040 1E−2 Scr. Sob. cont. var. 2E−3 100% 4.1E3 0.019 10/21

Concluding Remarks References Bayesan Cubature with Nice Covariance Kernels f ∼ GP(m, s2Cθ), Cθ : [0, 1]d × [0, 1]d → R, m, s, θ to be inferred by MLE, C = Cθ(xi , xj ) n i,j=1 = 1 n WΛWH, Λ = diag(λ1 , . . . , λn ), C1 = λ1 1, [0,1]d Cθ(x, t) dt = 1 ∀x Using maximum likelihood estimation (MLE) gives mMLE = µn = 1 n n i=1 f(xi ), s2 MLE = 1 n2 n i=2 |yi |2 λi , y = WT    f(x1 ) . . . f(xn )    fast transform, O(n log n) , θMLE = argmin θ log n i=2 |yi |2 λi + 1 n n i=1 log(λi ) Then µ is a Gaussian random variable, and P  |µ − µn | 2.58 1 − n λ1 1 n2 n i=2 |yi |2 λ2 i   = 99% for reasonable f 11/21

Concluding Remarks References Option Pricing µ = fair price = Rd e−rT max   1 d d j=1 Sj − K, 0   e−zTz/2 (2π)d/2 dz ≈ $13.12 Sj = S0 e(r−σ2/2)jT/d+σxj = stock price at time jT/d, x = Az, AAT = Σ = min(i, j)T/d d i,j=1 , T = 1/4, d = 13 here Abs. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E−2 IID 2E−3 100% 6.1E7 33.000 1E−2 Scr. Sobol’ 1E−3 100% 1.6E4 0.040 1E−2 Scr. Sob. cont. var. 2E−3 100% 4.1E3 0.019 1E−2 Bayes. Latt. 2E−3 99% 1.6E4 0.051 12/21

Concluding Remarks References More General Error Criteria Let’s change the problem: ﬁnd µn satisfying |µ − µn | max(ε, εr |µ|), given |µ − µn | errn (f) H. et al. (2017+) show that if n is large enough to make errn (f) max(ε, εr µn + errn (f) ) + max(ε, εr µn − errn (f) ) 2 then a satisfactory choice is µn = (µn − errn (f)) max(ε, εr µn + errn (f) ) + (µn + errn (f)) max(ε, εr µn − errn (f) ) max(ε, εr µn + errn (f) ) + max(ε, εr µn − errn (f) ) =    µn , εr = 0 max(µ2 n − errn (f)2, 0) µn , ε = 0 H. et al. (2017+) extend this to the case where the solution is v(µ): |v(µ) − ˜ v| max(ε, εr |v(µ)|), given µ ∈ [µ − errn (f), µ + errn (f)] 13/21

Concluding Remarks References Gaussian Probability µ = [a,b] exp −1 2 tTΣ−1t (2π)d det(Σ) dt Genz (1993) = [0,1]d−1 f(x) dx For some typical choice of a, b, Σ, d = 3; µ ≈ 0.6763 Rel. Error Median Worst 10% Worst 10% Tolerance Method Error Accuracy n Time (s) 1E−2 IID 5E−4 100% 8.1E4 0.020 1E−2 Scr. Sobol’ 4E−5 100% 1.0E3 0.005 1E−2 Bayes. Latt. 5E−5 100% 4.1E3 0.023 1E−3 IID 9E−5 100% 2.0E6 0.400 1E−3 Scr. Sobol’ 2E−5 100% 2.0E3 0.006 1E−3 Bayes. Latt. 3E−7 100% 6.6E4 0.076 1E−4 Scr. Sobol’ 4E−7 100% 1.6E4 0.018 1E−4 Bayes. Latt. 6E−9 100% 5.2E5 0.580 1E−4 Bayes. Latt. Smth. 1E−7 100% 3.3E4 0.047 14/21

Concluding Remarks References Bayesian Inference for Logistic Regression µj µ0 = X bj L(b|data) (b) db X L(b|data) (b) db = Bayesian posterior expectation of βj yi ∼ Ber exp(β1 + β2 xi ) 1 + exp(β1 + β2 xi ) , (b) = exp −(b2 1 + b2 2 )/2 2π Importance sample via an appropriately chosen normal ε = 0.001 n = 9 000–17 000 15/21

Concluding Remarks References Sobol’ Indices Y = g(X), where X ∼ U[0, 1]d; Sobol’ Indexj (µ) describes how much coordinate j of input X inﬂuences output Y (Sobol’, 1990; 2001): Sobol’ Indexj (µ) := µ1 µ2 − µ2 3 , j = 1, . . . , d µ1 := [0,1)2d [g(x) − g(xj , x −j )]g(x ) dx dx µ2 := [0,1)d [g(x)]2 dx, µ3 := [0,1)d g(x) dx Jiménez Rugama and Gilquin (2017+) have used adaptive Sobol’ sampling to compute Sobol’ indices: g(x) = −x1 + x1 x2 − x1 x2 x3 + · · · + x1 x2 x3 x4 x5 x6 (Bratley et al., 1992) ε = 1E−3, εr = 0 j 1 2 3 4 5 6 n 65 536 32 768 16 384 16 384 2 048 2 048 Sobol’ Indexj 0.6529 0.1791 0.0370 0.0133 0.0015 0.0015 Sobol’ Indexj 0.6528 0.1792 0.0363 0.0126 0.0010 0.0012 Sobol’ Indexj (µn ) 0.6492 0.1758 0.0308 0.0083 0.0018 0.0039 16/21

Concluding Remarks References Summary Established conservative, pre-asymptotic conﬁdence intervals for reasonable integrands for IID sampling for Quasi-Monte Carlo (low discrepancy) sampling, and via Bayesian cubature to ﬁt even general error criteria Quasi-Monte Carlo sampling can give faster convergence Bayesian cubature is practical if the sample matches the covariance kernel so that the vector-matrix operations are fast enough Algorithms implemented in open-source GAIL (Choi et al., 2013–2017), which is undergoing continual development 17/21

Concluding Remarks References Further Questions Are the cones of reasonable integrands reasonable in practice? What data-driven necessary conditions are there for f to be reasonable (f ∈ C)? Can we get higher order convergence with higher order nets? What can be done for Markov Chain Monte Carlo? What can be done for multi-level Monte Carlo (Giles, 2015)? 18/21

Slides available at speakerdeck.com/fjhickernell/ samsi-qmc-december-2017-trends-workshop

Concluding Remarks References References I Bayer, C., H. Hoel, E. von Schwerin, and R. Tempone. 2014. On nonasymptotic optimal stopping criteria in Monte Carlo Simulations, SIAM J. Sci. Comput. 36, A869–A885. Bratley, P., B. L. Fox, and H. Niederreiter. 1992. Implementation and tests of low-discrepancy sequences, ACM Trans. Model. Comput. Simul. 2, 195–213. Choi, S.-C. T., Y. Ding, F. J. H., L. Jiang, Ll. A. Jiménez Rugama, D. Li, J. Rathinavel, X. Tong, K. Zhang, Y. Zhang, and X. Zhou. 2013–2017. GAIL: Guaranteed Automatic Integration Library (versions 1.0–2.2). Genz, A. 1993. Comparison of methods for the computation of multivariate normal probabilities, Computing Science and Statistics 25, 400–405. Giles, M. 2015. Multilevel Monte Carlo methods, Acta Numer. 24, 259–328. H., F. J. 2017+. The trio identity for quasi-Monte Carlo error analysis, Monte Carlo and quasi-Monte Carlo methods: MCQMC, Stanford, usa, August 2016. submitted for publication, arXiv:1702.01487. H., F. J., S.-C. T. Choi, L. Jiang, and Ll. A. Jiménez Rugama. 2017+. Monte Carlo simulation, automatic stopping criteria for, Wiley statsref-statistics reference online. to appear. H., F. J., L. Jiang, Y. Liu, and A. B. Owen. 2013. Guaranteed conservative ﬁxed width conﬁdence intervals via Monte Carlo sampling, Monte Carlo and quasi-Monte Carlo methods 2012, pp. 105–128. 20/21

Concluding Remarks References References II H., F. J. and Ll. A. Jiménez Rugama. 2016. Reliable adaptive cubature using digital sequences, Monte Carlo and quasi-Monte Carlo methods: MCQMC, Leuven, Belgium, April 2014, pp. 367–383. arXiv:1410.8615 [math.NA]. H., F. J., Ll. A. Jiménez Rugama, and D. Li. 2017+. Adaptive quasi-Monte Carlo methods for cubature, Contemporary computational mathematics — a celebration of the 80th birthday of ian sloan. to appear, arXiv:1702.01491 [math.NA]. Jiménez Rugama, Ll. A. and L. Gilquin. 2017+. Reliable error estimation for Sobol’ indices, Statistics and Computing. in press. Jiménez Rugama, Ll. A. and F. J. H. 2016. Adaptive multidimensional integration based on rank-1 lattices, Monte Carlo and quasi-Monte Carlo methods: MCQMC, Leuven, Belgium, April 2014, pp. 407–422. arXiv:1411.1966. Nefedova, Yu. S. and I. G. Shevtsova. 2012. On non-uniform convergence rate estimates in the central limit theorem, Theory Probab. Appl. 57, 62–97. Shevtsova, I. 2014. On the absolute constants in the Berry-Esseen type inequalities for identically distributed summands, Dokl. Akad. Nauk 456, 650–654. Sobol’, I. M. 1990. On sensitivity estimation for nonlinear mathematical models, Matem. Mod. 2, no. 1, 112–118. . 2001. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates, Math. Comput. Simul. 55, no. 1-3, 271–280. 21/21

SAMSI-QMC December 2017 Trends Workshop

SAMSI-QMC December 2017 Trends Workshop

More Decks by Fred J. Hickernell

Other Decks in Research

Featured

Transcript