Slide 1

Slide 1 text

Challenges in Developing Great MCQMC Software Fred J. Hickernell Department of Applied Mathematics & Center for Interdisciplinary Scientific Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell with input from Mark Klinchin, the GAIL an QMCPy teams, and friends partially supported by SigOpt (Intel) and an Illinois Tech alum Join us Tuesday, July 19, at 12:45 PM for lunch at Bella Casa to discuss the future of MCQMC software Thanks to the organizers as we meet again in person Slides at speakerdeck.com/fjhickernell/challenges-of-developing-mcqmc-software MCQMC 2022, Linz, July 18, 2022

Slide 2

Slide 2 text

Introduction Structure Support Next Steps References A Great MCQMC Software Ecosystem Should Be Correct, e.g., not omit the zeroth point in a low discrepancy sequence (Owen 2022; SciPy Developers 2020) Complete—contain the components or easily access components in other libraries to solve real, complex problems Accessible—downloadable; tutorials, demos, discussion forums, etc. for (new) users; written in a language that potential users speak; provide a consistent user interface Efficient—in terms of compute time and memory Current—include the latest and best algorithms Sustainable—have a sufficient user base and developer community for updates and maintenance Scalable—take advantage of advanced computer architectures for speed and to solve large problems 2/20

Slide 3

Slide 3 text

Introduction Structure Support Next Steps References Selective History of Publicly Available MCQMC Software I L’Ecuyer (2017) provides a history of random number generation Notorious RANDU from the 1960s (Wikipedia n.d.), which fails the spectral test VEGAS (Lepage 1978; Lepage 2021) importance sampling Monte Carlo algorithm favored by physicists ACM low discrepancy point generators (Bratley and Fox 1988; Bratley, Fox, and Niederreiter 1992; Hong and H. 2003) FinDer (Paskov and Traub 1995; Papageorgiou n.d.) and BRODA (Kucherenko n.d.) targeting quantitative finance Korobov cubature and scrambled Sobol’ generators in NAG (The Numerical Algorithms Group 2021) for decades Scrambled Sobol’ and Halton generators in MATLAB (The MathWorks, Inc. 2022) since 2008, and fixed a few years later 3/20

Slide 4

Slide 4 text

Introduction Structure Support Next Steps References Selective History of Publicly Available MCQMC Software II BUGS (Lunn et al. 2012; MRC Biostatistic Unit n.d.) and Stan (Stan Development Team 2022) for Markov Chain Monte Carlo (MCMC) LatMRG (L’Ecuyer and Couture 1997), RNGStreams (L’Ecuyer, Simard, et al. 2002), SSJ (L’Ecuyer 2002; L’Ecuyer n.d.), and LatNetBuilder (Darmon et al. 2018) Talk M 11 Low discrepancy generators (Friedel and Keller 2002; Friedel and Keller 2001), SamplePack (Kollig and Keller 2002), Grünschloß (n.d.), and MatBuilder (Paulin et al. 2022) Talk M 12, Tu 12 Sobol’ direction numbers (Joe and Kuo 2003; Joe and Kuo 2008; Joe and Kuo n.d.) Fast CBC, Magic Point Shop, QMC4PDE and other code since 2004 at Nuyens (n.d.) Talk Tu 10:30 Multi-level software (Giles n.d.[a]; Giles n.d.[b]) Talk M 10:30 4/20

Slide 5

Slide 5 text

Introduction Structure Support Next Steps References Selective History of Publicly Available MCQMC Software III Data-driven error bounds and stopping criteria in GAIL (Choi, Ding, et al. 2021) and QMCPy (Choi, H., Jagadeeswaran, M. McCourt, et al. 2021; Choi, H., Jagadeeswaran, M. J. McCourt, et al. 2022) Talk Tu 11 Uncertainty quantification libraries Dakota (Adams et al. 2022), UQTk (Debusschere et al. 2004; Debusschere n.d.), and MUQ (Parno et al. 2014) Talk Tu 11:30 have some basic level low discrepancy sampling QMC framework in Julia since 2019 (SciML QuasiMonteCarlo.jl n.d.) by Robbe and others Talk M 17 Scrambled Sobol’ in SciPy (Virtanen et al. 2020) and PyTorch (Paszke et al. 2019) since several years ago TensorFlow QMC framework (TF Quant Finance Contributors 2021) since a year or so ago We have stand-alone algorithms, small libraries, and pieces of larger libraries 5/20

Slide 6

Slide 6 text

Introduction Structure Support Next Steps References Why Do We Need Software? Theory Explains, justifies, and leads to algorithms Assumptions =⇒ success can be viewed as failure =⇒ what went wrong Software Makes algorithms practical Solves (new) applications Eliminates do-it-yourself Applications Make societal impact Inspire new theory 6/20

Slide 7

Slide 7 text

Introduction Structure Support Next Steps References Why Do We Need Software? Theory Explains, justifies, and leads to algorithms Assumptions =⇒ success can be viewed as failure =⇒ what went wrong Software Makes algorithms practical Solves (new) applications Eliminates do-it-yourself Applications Make societal impact Inspire new theory The US Department of Energy is investing in studying how to develop great scientific software, including a recent Workshop on the Science of Scientific-Software Development and Use (Heroux 2019; U.S. Department of Energy, Office of Advanced Scientific Computing Research 2021) 6/20

Slide 8

Slide 8 text

Introduction Structure Support Next Steps References Software Architecture µ := T g(t) λ(t) dt = · · · = E[f(X)] expectation = [0,1]d f(x) dx integration ≈ 1 n n i=1 f(Xi) =: ^ µn Low Discrepancy Generator producing {X1, X2, . . . } that mimics the uniform distribution True Measure that defines the original integral, e.g., Lebesgue; embodies the transformation t = Ψ(x) Integrand g, which defines the original integral, plus the transformed version, f, to fit the LD generator Stopping Criterion based on a data-driven error bound, which determines how large n should be to ensure that |µ − ^ µn| ⩽ ε 7/20

Slide 9

Slide 9 text

Introduction Structure Support Next Steps References Software Architecture µ := T g(t) λ(t) dt = · · · = E[f(X)] expectation = [0,1]d f(x) dx integration ≈ 1 n n i=1 f(Xi) =: ^ µn Low Discrepancy Generator producing {X1, X2, . . . } that mimics the uniform distribution True Measure that defines the original integral, e.g., Lebesgue; embodies the transformation t = Ψ(x) Integrand g, which defines the original integral, plus the transformed version, f, to fit the LD generator Stopping Criterion based on a data-driven error bound, which determines how large n should be to ensure that |µ − ^ µn| ⩽ ε May we agree on a common software definition for these these objects like we have for floating point numbers or basic mathematical functions? 7/20

Slide 10

Slide 10 text

Introduction Structure Support Next Steps References Choosing a Language, Library, or Environment Language/library choices of the users are driven by familiarity and speed Favorites aligned with various communities, but no one dominates all Difference between prototyping environments and ninja written production code Growth of multi-processor environments Provides opportunity Complicates software development; where does parallelism really help? 8/20

Slide 11

Slide 11 text

Introduction Structure Support Next Steps References Choosing a Language, Library, or Environment Language/library choices of the users are driven by familiarity and speed Favorites aligned with various communities, but no one dominates all Difference between prototyping environments and ninja written production code Growth of multi-processor environments Provides opportunity Complicates software development; where does parallelism really help? Recommendations if you want to contribute software Realize your new idea as a demo in your favorite library Add a significant idea as a feature to other libraries Move basic features of your library to larger library Write wrappers or demos to connect other libraries to your library 8/20

Slide 12

Slide 12 text

Introduction Structure Support Next Steps References Ex. QMCPy + FEniCS/Dolfin (Logg et al. 2012) Want to demonstrate how our QMC software could work with a popular DE solver for DEs with random coefficients Partial success − d dx a(x, W) d dx u(x, W) = 200(2x − 1) 0 ⩽ x ⩽ 1, u(0) = u(1) = 0 a(x, W) = 1 + 0.6 d k=1 Wk Tk(2x − 1) k2 W ∼ U[−1, 1]d E[u(0.25, W)] = ? 9/20

Slide 13

Slide 13 text

Introduction Structure Support Next Steps References Ex. QMCPy + FEniCS/Dolfin (Logg et al. 2012) Want to demonstrate how our QMC software could work with a popular DE solver for DEs with random coefficients Partial success − d dx a(x, W) d dx u(x, W) = 200(2x − 1) 0 ⩽ x ⩽ 1, u(0) = u(1) = 0 a(x, W) = 1 + 0.6 d k=1 Wk Tk(2x − 1) k2 W ∼ U[−1, 1]d E[u(0.25, W)] = −1.7462 10−3 10−2 10−1 Error Tolerance 10−1 100 101 102 103 Computation Time (seconds) Comparison of IID and Sobol Points IID Points Sobol Points 9/20

Slide 14

Slide 14 text

Introduction Structure Support Next Steps References Ex. QMCPy + FEniCS/Dolfin (Logg et al. 2012) Want to demonstrate how our QMC software could work with a popular DE solver for DEs with random coefficients Partial success Why? Even with FEniCS/Dolfin’s extensive documentation we had difficulties Did not understand the latest version, so reverted to an older version Had to learn how to change the random instance of the coefficient without tiggering the JIT compiler Do not yet know how to express the random coefficient in terms of the covariance kernel But It works If we can overcome the challenges, will post a blog and demo at Choi, Hickernell, et al. (2020) 9/20

Slide 15

Slide 15 text

Introduction Structure Support Next Steps References Encouraging Users of Your Library Requires A repository where Your library can be easily downloaded There is documentation Issues or queries can be posted There are updates Tests are run regularly to mitigate against bugs (Elementary) demos on how to use the software and highlighting its advantages Demos connecting your library with others Recommending your students and postdocs to use your library Welcoming instructions on how to contribute an algorithm or a demo 10/20

Slide 16

Slide 16 text

Introduction Structure Support Next Steps References Building a Developer Team for Your Library Many senior researchers don’t code, and rely on transient team members (students, postdocs) Encourage a sense of ownership that lasts beyond the time your team members are part of your lab Encourage those outside your own group to contribute Those of us who understand theory and appreciate software need to encourage and steer popular software to ensure that it is implemented well. Recognize good software as valuable scholarly output Recognize research software engineer as a valued vocation; see Research Software Engineers International https://researchsoftware.org, Research Software Engineers are people who combine professional software expertise with an understanding of research. They go by various job titles but the term Research Software Engineer (RSE) is fast gaining international recognition. 11/20

Slide 17

Slide 17 text

Introduction Structure Support Next Steps References Next Steps Which exemplary theoretical developments are not yet implemented in readily available software? What potential MCQMC user communities are ill-served by existing software libraries? Should those of us interested in software gather regularly for show-and-tell to foster more collaboration? Join us for a lunch discussion at Bella Casa tomorrow, July 19, at 12:45 PM 12/20

Slide 18

Slide 18 text

Introduction Structure Support Next Steps References References I Adams, B. M. et al. (May 2022). Dakota, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.16 User’s Manual. Sandia National Laboratories. Bratley, P. and B. L. Fox (1988). “Algorithm 659: Implementing Sobol’s Quasirandom Sequence Generator”. In: ACM Trans. Math. Software 14, pp. 88–100. Bratley, P., B. L. Fox, and H. Niederreiter (1992). “Implementation and Tests of Low-Discrepancy Sequences”. In: ACM Trans. Model. Comput. Simul. 2, pp. 195–213. Choi, S.-C. T., F. J. H., R. Jagadeeswaran, M. McCourt, et al. (2021). QMCPy: A quasi-Monte Carlo Python Library. doi: 10.5281/zenodo.3964489. url: https://qmcsoftware.github.io/QMCSoftware/. Choi, S.-C. T., F. J. H., R. Jagadeeswaran, M. J. McCourt, et al. (2022). “Quasi-Monte Carlo Software”. In: Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Oxford, England, August 2020. Ed. by A. Keller. Springer Proceedings in Mathematics and Statistics. https://arxiv.org/abs/2102.07833. Springer, Cham, pp. 23–50. 13/20

Slide 19

Slide 19 text

Introduction Structure Support Next Steps References References II Choi, S.-C. T., Y. Ding, et al. (2021). GAIL: Guaranteed Automatic Integration Library (Versions 1.0–2.3.2). MATLAB software, http://gailgithub.github.io/GAIL_Dev/. doi: 10.5281/zenodo.4018189. Choi, S.-C. T., F. J. Hickernell, et al. (2020). QMCPy Blogs. url: https://qmcpy.wpcomstaging.com. Darmon, Y. et al. (2018). LatNet Builder. url: https://github.com/umontreal-simul/latnetbuilder. Debusschere, B. (n.d.). UQ Toolkit. url: https://www.sandia.gov/uqtoolkit/. Debusschere, B. et al. (2004). “Numerical Challenges in the Use of Polynomial Chaos Representations for Stochastic Processes”. In: SIAM J. Sci. Comput., pp. 698–719. url: 10.1137/S1064827503427741. Friedel, I. and A. Keller (2001). libseq – low discrepancy sequence library. url: http://www.multires.caltech.edu/software/libseq/. — (2002). “Fast Generation of Randomized Low-Discrepancy Point Sets”. In: Monte Carlo and Quasi-Monte Carlo Methods 2000. Ed. by K. T. Fang, F. J. H., and H. Niederreiter. Springer-Verlag, Berlin, pp. 257–273. 14/20

Slide 20

Slide 20 text

Introduction Structure Support Next Steps References References III Giles, M. (n.d.[a]). Multi-Level Monte Carlo Software. url: https://people.maths.ox.ac.uk/gilesm/mlmc/. — (n.d.[b]). Multi-Level Quasi-Monte Carlo Software. url: http://people.maths.ox.ac.uk/~gilesm/mlqmc/. Grünschloß, L. (n.d.). Leonhard Grünschloß’s webpage. url: https://gruenschloss.org. Heroux, M. (2019). Research Software Science: A Scientific Approach to Understanding and Improving How We Develop and Use Software for Research. url: https://bssw.io/blog_posts/research-software-science-a-scientific-approach-to- understanding-and-improving-how-we-develop-and-use-software-for-research (visited on 2019). Hong, H. S. and F. J. H. (2003). “Algorithm 823: Implementing Scrambled Digital Nets”. In: ACM Trans. Math. Software 29, pp. 95–109. doi: 10.1145/779359.779360. Joe, S. and F. Y. Kuo (2003). “Remark on Algorithm 659: Implementing Sobol’s quasirandom sequence generator”. In: ACM Trans. Math. Software 29, pp. 49–57. 15/20

Slide 21

Slide 21 text

Introduction Structure Support Next Steps References References IV Joe, S. and F. Y. Kuo (2008). “Constructing Sobol sequences with better two-dimensional projections”. In: SIAM J. Sci. Comput. 30, pp. 2635–2654. — (n.d.). Sobol Sequence Generator. url: https://web.maths.unsw.edu.au/~fkuo/sobol/. Kollig, T. and A. Keller (2002). SamplePack. url: https://www.uni-kl.de/AG-Heinrich/SamplePack.html. Kucherenko, S. (n.d.). BRODA. url: https://www.broda.co.uk/index.html. L’Ecuyer, P. (2002). “SSJ: A Framework for Stochastic Simulation in Java”. In: 2002 Winter Simulation Conference. IEEE Press, pp. 234–242. — (2017). “History of uniform random number generation”. In: Proceedings of the 2017 Winter Simulation Conference. Ed. by W. K. V. Chan et al. doi: 10.1109/WSC.2017.8247790. — (n.d.). SSJ: Stochastic Simulation in Java. url: https://github.com/umontreal-simul/ssj. 16/20

Slide 22

Slide 22 text

Introduction Structure Support Next Steps References References V L’Ecuyer, P. and R. Couture (1997). “An Implementation of the Lattice and Spectral Tests for Multiple Recursive Linear Random Number Generators”. In: INFORMS J. Comput. 9, pp. 206–217. L’Ecuyer, P., R. Simard, et al. (2002). “An Objected-Oriented Random-Number Package with Many Long Streams and Substreams”. In: Oper. Res. 50, pp. 1073–1075. doi: 10.1287/opre.50.6.1073.358. Lepage, G. P. (1978). “A new algorithm for adaptive multidimensional integration”. In: J. Comput. Phys. 27, pp. 192–203. — (2021). “Adaptive multidimensional integration: VEGAS enhanced”. In: J. Comput. Phys. 439, p. 110386. Logg, A., G. N. Wells, and J. Hake (2012). “DOLFIN: a C++/Python Finite Element Library”. In: Automated Solution of Differential Equations by the Finite Element Method. Ed. by K. M. A. Logg and G. N. Wells. Vol. 84. Lecture Notes in Computational Science and Engineering. Springer. Chap. 10. Lunn, D. et al. (2012). The BUGS Book: A Practical Introduction to Bayesian Analysis. Chapman & Hall. 17/20

Slide 23

Slide 23 text

Introduction Structure Support Next Steps References References VI MRC Biostatistic Unit (n.d.). The BUGS Project. url: https://www.mrc-bsu.cam.ac.uk/software/bugs/. Nuyens, D. (n.d.). Dirk’s Homepage. url: https://people.cs.kuleuven.be/~dirk.nuyens/. Owen, A. B. (2022). “On dropping the first Sobol’ point”. In: Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Oxford, England, August 2020, pp. 71–86. url: https://arxiv.org/pdf/2008.08051.pdf. Papageorgiou, A. (n.d.). FinDer. url: http://www.cs.columbia.edu/~ap/html/finder.html. Parno, M. et al. (2014). MIT uncertainty quantification (MUQ) library. url: https://mituq.bitbucket.io/source/_site/index.html. Paskov, S. and J. Traub (1995). “Faster Valuation of Financial Derivatives”. In: J. Portfolio Management 22, pp. 113–120. Paszke, A. et al. (2019). “PyTorch: An imperative style, high-performance deep learning library”. In: Advances in neural information processing systems 32, pp. 8026–8037. 18/20

Slide 24

Slide 24 text

Introduction Structure Support Next Steps References References VII Paulin, L. et al. (Aug. 2022). “MatBuilder: Mastering Sampling Uniformity Over Projections”. In: ACM Transactions on Graphics (Proceedings of SIGGRAPH) 41.4. doi: https://doi.org/10.1145/3528223.3530063. SciML QuasiMonteCarlo.jl (n.d.). url: https://github.com/SciML/QuasiMonteCarlo.jl. SciPy Developers (2020). scipy discussion of Sobol’ sequence implementation. url: https://github.com/scipy/scipy/pull/10844. Stan Development Team (2022). Stan Modeling Language Users Guide and Reference Manual, version 2.30. url: http://mc-stan.org. TF Quant Finance Contributors (2021). Quasi Monte-Carlo Methods. url: https://github.com/google/tf-quant-finance/math/qmc. The MathWorks, Inc. (2022). MATLAB R2022b. Natick, MA. The Numerical Algorithms Group (2021). The NAG Library. Mark 27. Oxford. U.S. Department of Energy, Office of Advanced Scientific Computing Research (2021). Workshop on the Science of Scientific-Software Development and Use. url: https://web.cvent.com/event/1b7d7c3a-e9b4-409d-ae2b-284779cfe72f/summary. 19/20

Slide 25

Slide 25 text

Introduction Structure Support Next Steps References References VIII Virtanen, P. et al. (2020). “SciPy 1.0: fundamental algorithms for scientific computing in Python”. In: Nature Methods 17.3, pp. 261–272. Wikipedia (n.d.). RANDU. url: https://en.wikipedia.org/wiki/RANDU. 20/20