Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Challenges of Developing MCQMC Software

Challenges of Developing MCQMC Software

Slides of a talk given at MCQMC 2022 in Linz

Fred J. Hickernell

July 18, 2022
Tweet

More Decks by Fred J. Hickernell

Other Decks in Research

Transcript

  1. Challenges in Developing Great MCQMC Software
    Fred J. Hickernell
    Department of Applied Mathematics & Center for Interdisciplinary Scientific Computation
    Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell
    with input from Mark Klinchin, the GAIL an QMCPy teams, and friends
    partially supported by SigOpt (Intel) and an Illinois Tech alum
    Join us Tuesday, July 19, at 12:45 PM for lunch at Bella Casa
    to discuss the future of MCQMC software
    Thanks to the organizers as we meet again in person
    Slides at speakerdeck.com/fjhickernell/challenges-of-developing-mcqmc-software
    MCQMC 2022, Linz, July 18, 2022

    View Slide

  2. Introduction Structure Support Next Steps References
    A Great MCQMC Software Ecosystem Should Be
    Correct, e.g., not omit the zeroth point in a low discrepancy sequence (Owen
    2022; SciPy Developers 2020)
    Complete—contain the components or easily access components in other
    libraries to solve real, complex problems
    Accessible—downloadable; tutorials, demos, discussion forums, etc. for (new)
    users; written in a language that potential users speak; provide a consistent
    user interface
    Efficient—in terms of compute time and memory
    Current—include the latest and best algorithms
    Sustainable—have a sufficient user base and developer community for
    updates and maintenance
    Scalable—take advantage of advanced computer architectures for speed
    and to solve large problems
    2/20

    View Slide

  3. Introduction Structure Support Next Steps References
    Selective History of Publicly Available MCQMC Software I
    L’Ecuyer (2017) provides a history of random number generation
    Notorious RANDU from the 1960s (Wikipedia n.d.), which fails the spectral test
    VEGAS (Lepage 1978; Lepage 2021) importance sampling Monte Carlo
    algorithm favored by physicists
    ACM low discrepancy point generators (Bratley and Fox 1988; Bratley, Fox,
    and Niederreiter 1992; Hong and H. 2003)
    FinDer (Paskov and Traub 1995; Papageorgiou n.d.) and BRODA (Kucherenko
    n.d.) targeting quantitative finance
    Korobov cubature and scrambled Sobol’ generators in NAG (The Numerical
    Algorithms Group 2021) for decades
    Scrambled Sobol’ and Halton generators in MATLAB (The MathWorks, Inc.
    2022) since 2008, and fixed a few years later
    3/20

    View Slide

  4. Introduction Structure Support Next Steps References
    Selective History of Publicly Available MCQMC Software II
    BUGS (Lunn et al. 2012; MRC Biostatistic Unit n.d.) and Stan (Stan Development
    Team 2022) for Markov Chain Monte Carlo (MCMC)
    LatMRG (L’Ecuyer and Couture 1997), RNGStreams (L’Ecuyer, Simard, et al.
    2002), SSJ (L’Ecuyer 2002; L’Ecuyer n.d.), and LatNetBuilder (Darmon et al.
    2018) Talk M 11
    Low discrepancy generators (Friedel and Keller 2002; Friedel and Keller 2001),
    SamplePack (Kollig and Keller 2002), Grünschloß (n.d.), and MatBuilder (Paulin
    et al. 2022) Talk M 12, Tu 12
    Sobol’ direction numbers (Joe and Kuo 2003; Joe and Kuo 2008; Joe and Kuo
    n.d.)
    Fast CBC, Magic Point Shop, QMC4PDE and other code since 2004 at Nuyens
    (n.d.) Talk Tu 10:30
    Multi-level software (Giles n.d.[a]; Giles n.d.[b]) Talk M 10:30
    4/20

    View Slide

  5. Introduction Structure Support Next Steps References
    Selective History of Publicly Available MCQMC Software III
    Data-driven error bounds and stopping criteria in GAIL (Choi, Ding, et al. 2021)
    and QMCPy (Choi, H., Jagadeeswaran, M. McCourt, et al. 2021; Choi, H.,
    Jagadeeswaran, M. J. McCourt, et al. 2022) Talk Tu 11
    Uncertainty quantification libraries Dakota (Adams et al. 2022), UQTk
    (Debusschere et al. 2004; Debusschere n.d.), and MUQ (Parno et al. 2014) Talk
    Tu 11:30 have some basic level low discrepancy sampling
    QMC framework in Julia since 2019 (SciML QuasiMonteCarlo.jl n.d.) by Robbe
    and others Talk M 17
    Scrambled Sobol’ in SciPy (Virtanen et al. 2020) and PyTorch (Paszke et al.
    2019) since several years ago
    TensorFlow QMC framework (TF Quant Finance Contributors 2021) since a year
    or so ago
    We have stand-alone algorithms, small libraries, and pieces of larger libraries
    5/20

    View Slide

  6. Introduction Structure Support Next Steps References
    Why Do We Need Software?
    Theory
    Explains, justifies, and leads to algorithms
    Assumptions =⇒ success can be viewed as failure =⇒ what went wrong
    Software
    Makes algorithms practical
    Solves (new) applications
    Eliminates do-it-yourself
    Applications
    Make societal impact
    Inspire new theory
    6/20

    View Slide

  7. Introduction Structure Support Next Steps References
    Why Do We Need Software?
    Theory
    Explains, justifies, and leads to algorithms
    Assumptions =⇒ success can be viewed as failure =⇒ what went wrong
    Software
    Makes algorithms practical
    Solves (new) applications
    Eliminates do-it-yourself
    Applications
    Make societal impact
    Inspire new theory
    The US Department of Energy is investing in studying how to develop great
    scientific software, including a recent Workshop on the Science of
    Scientific-Software Development and Use (Heroux 2019; U.S. Department of
    Energy, Office of Advanced Scientific Computing Research 2021)
    6/20

    View Slide

  8. Introduction Structure Support Next Steps References
    Software Architecture
    µ :=
    T
    g(t) λ(t) dt = · · · = E[f(X)]
    expectation
    =
    [0,1]d
    f(x) dx
    integration

    1
    n
    n
    i=1
    f(Xi) =: ^
    µn
    Low Discrepancy Generator producing {X1, X2, . . . } that mimics the uniform
    distribution
    True Measure that defines the original integral, e.g., Lebesgue; embodies the
    transformation t = Ψ(x)
    Integrand g, which defines the original integral, plus the transformed version, f, to
    fit the LD generator
    Stopping Criterion based on a data-driven error bound, which determines how
    large n should be to ensure that |µ − ^
    µn| ⩽ ε
    7/20

    View Slide

  9. Introduction Structure Support Next Steps References
    Software Architecture
    µ :=
    T
    g(t) λ(t) dt = · · · = E[f(X)]
    expectation
    =
    [0,1]d
    f(x) dx
    integration

    1
    n
    n
    i=1
    f(Xi) =: ^
    µn
    Low Discrepancy Generator producing {X1, X2, . . . } that mimics the uniform
    distribution
    True Measure that defines the original integral, e.g., Lebesgue; embodies the
    transformation t = Ψ(x)
    Integrand g, which defines the original integral, plus the transformed version, f, to
    fit the LD generator
    Stopping Criterion based on a data-driven error bound, which determines how
    large n should be to ensure that |µ − ^
    µn| ⩽ ε
    May we agree on a common software definition for these these objects like we
    have for floating point numbers or basic mathematical functions?
    7/20

    View Slide

  10. Introduction Structure Support Next Steps References
    Choosing a Language, Library, or Environment
    Language/library choices of the users are driven by familiarity and speed
    Favorites aligned with various communities, but no one dominates all
    Difference between prototyping environments and ninja written production
    code
    Growth of multi-processor environments
    Provides opportunity
    Complicates software development; where does parallelism really help?
    8/20

    View Slide

  11. Introduction Structure Support Next Steps References
    Choosing a Language, Library, or Environment
    Language/library choices of the users are driven by familiarity and speed
    Favorites aligned with various communities, but no one dominates all
    Difference between prototyping environments and ninja written production
    code
    Growth of multi-processor environments
    Provides opportunity
    Complicates software development; where does parallelism really help?
    Recommendations if you want to contribute software
    Realize your new idea as a demo in your favorite library
    Add a significant idea as a feature to other libraries
    Move basic features of your library to larger library
    Write wrappers or demos to connect other libraries to your library
    8/20

    View Slide

  12. Introduction Structure Support Next Steps References
    Ex. QMCPy + FEniCS/Dolfin (Logg et al. 2012)
    Want to demonstrate how our QMC software could work with a popular DE
    solver for DEs with random coefficients
    Partial success

    d
    dx
    a(x, W)
    d
    dx
    u(x, W) = 200(2x − 1)
    0 ⩽ x ⩽ 1, u(0) = u(1) = 0
    a(x, W) = 1 + 0.6
    d
    k=1
    Wk
    Tk(2x − 1)
    k2
    W ∼ U[−1, 1]d
    E[u(0.25, W)] = ?
    9/20

    View Slide

  13. Introduction Structure Support Next Steps References
    Ex. QMCPy + FEniCS/Dolfin (Logg et al. 2012)
    Want to demonstrate how our QMC software could work with a popular DE
    solver for DEs with random coefficients
    Partial success

    d
    dx
    a(x, W)
    d
    dx
    u(x, W) = 200(2x − 1)
    0 ⩽ x ⩽ 1, u(0) = u(1) = 0
    a(x, W) = 1 + 0.6
    d
    k=1
    Wk
    Tk(2x − 1)
    k2
    W ∼ U[−1, 1]d
    E[u(0.25, W)] = −1.7462 10−3 10−2 10−1
    Error Tolerance
    10−1
    100
    101
    102
    103
    Computation Time (seconds)
    Comparison of IID and Sobol Points
    IID Points
    Sobol Points
    9/20

    View Slide

  14. Introduction Structure Support Next Steps References
    Ex. QMCPy + FEniCS/Dolfin (Logg et al. 2012)
    Want to demonstrate how our QMC software could work with a popular DE
    solver for DEs with random coefficients
    Partial success Why?
    Even with FEniCS/Dolfin’s extensive
    documentation we had difficulties
    Did not understand the latest
    version, so reverted to an older
    version
    Had to learn how to change the
    random instance of the
    coefficient without tiggering the
    JIT compiler
    Do not yet know how to express
    the random coefficient in terms of
    the covariance kernel
    But
    It works
    If we can overcome the
    challenges, will post a blog and
    demo at Choi, Hickernell, et al.
    (2020)
    9/20

    View Slide

  15. Introduction Structure Support Next Steps References
    Encouraging Users of Your Library Requires
    A repository where
    Your library can be easily downloaded
    There is documentation
    Issues or queries can be posted
    There are updates
    Tests are run regularly to mitigate against bugs
    (Elementary) demos on how to use the software and highlighting its
    advantages
    Demos connecting your library with others
    Recommending your students and postdocs to use your library
    Welcoming instructions on how to contribute an algorithm or a demo
    10/20

    View Slide

  16. Introduction Structure Support Next Steps References
    Building a Developer Team for Your Library
    Many senior researchers don’t code, and rely on transient team members
    (students, postdocs)
    Encourage a sense of ownership that lasts beyond the time your team members
    are part of your lab
    Encourage those outside your own group to contribute
    Those of us who understand theory and appreciate software need to
    encourage and steer popular software to ensure that it is implemented well.
    Recognize good software as valuable scholarly output
    Recognize research software engineer as a valued vocation; see Research
    Software Engineers International https://researchsoftware.org, Research
    Software Engineers are people who combine professional software expertise
    with an understanding of research. They go by various job titles but the term
    Research Software Engineer (RSE) is fast gaining international recognition.
    11/20

    View Slide

  17. Introduction Structure Support Next Steps References
    Next Steps
    Which exemplary theoretical
    developments are not yet
    implemented in readily available
    software?
    What potential MCQMC user
    communities are ill-served by
    existing software libraries?
    Should those of us interested in
    software gather regularly for
    show-and-tell to foster more
    collaboration?
    Join us for a lunch discussion at Bella
    Casa tomorrow, July 19, at 12:45 PM
    12/20

    View Slide

  18. Introduction Structure Support Next Steps References
    References I
    Adams, B. M. et al. (May 2022). Dakota, A Multilevel Parallel Object-Oriented
    Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification,
    and Sensitivity Analysis: Version 6.16 User’s Manual. Sandia National Laboratories.
    Bratley, P. and B. L. Fox (1988). “Algorithm 659: Implementing Sobol’s Quasirandom
    Sequence Generator”. In: ACM Trans. Math. Software 14, pp. 88–100.
    Bratley, P., B. L. Fox, and H. Niederreiter (1992). “Implementation and Tests of
    Low-Discrepancy Sequences”. In: ACM Trans. Model. Comput. Simul. 2, pp. 195–213.
    Choi, S.-C. T., F. J. H., R. Jagadeeswaran, M. McCourt, et al. (2021). QMCPy: A
    quasi-Monte Carlo Python Library. doi: 10.5281/zenodo.3964489. url:
    https://qmcsoftware.github.io/QMCSoftware/.
    Choi, S.-C. T., F. J. H., R. Jagadeeswaran, M. J. McCourt, et al. (2022). “Quasi-Monte
    Carlo Software”. In: Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Oxford,
    England, August 2020. Ed. by A. Keller. Springer Proceedings in Mathematics and
    Statistics. https://arxiv.org/abs/2102.07833. Springer, Cham, pp. 23–50.
    13/20

    View Slide

  19. Introduction Structure Support Next Steps References
    References II
    Choi, S.-C. T., Y. Ding, et al. (2021). GAIL: Guaranteed Automatic Integration Library
    (Versions 1.0–2.3.2). MATLAB software, http://gailgithub.github.io/GAIL_Dev/. doi:
    10.5281/zenodo.4018189.
    Choi, S.-C. T., F. J. Hickernell, et al. (2020). QMCPy Blogs. url:
    https://qmcpy.wpcomstaging.com.
    Darmon, Y. et al. (2018). LatNet Builder. url:
    https://github.com/umontreal-simul/latnetbuilder.
    Debusschere, B. (n.d.). UQ Toolkit. url: https://www.sandia.gov/uqtoolkit/.
    Debusschere, B. et al. (2004). “Numerical Challenges in the Use of Polynomial Chaos
    Representations for Stochastic Processes”. In: SIAM J. Sci. Comput., pp. 698–719. url:
    10.1137/S1064827503427741.
    Friedel, I. and A. Keller (2001). libseq – low discrepancy sequence library. url:
    http://www.multires.caltech.edu/software/libseq/.
    — (2002). “Fast Generation of Randomized Low-Discrepancy Point Sets”. In: Monte
    Carlo and Quasi-Monte Carlo Methods 2000. Ed. by K. T. Fang, F. J. H., and
    H. Niederreiter. Springer-Verlag, Berlin, pp. 257–273.
    14/20

    View Slide

  20. Introduction Structure Support Next Steps References
    References III
    Giles, M. (n.d.[a]). Multi-Level Monte Carlo Software. url:
    https://people.maths.ox.ac.uk/gilesm/mlmc/.
    — (n.d.[b]). Multi-Level Quasi-Monte Carlo Software. url:
    http://people.maths.ox.ac.uk/~gilesm/mlqmc/.
    Grünschloß, L. (n.d.). Leonhard Grünschloß’s webpage. url: https://gruenschloss.org.
    Heroux, M. (2019). Research Software Science: A Scientific Approach to Understanding
    and Improving How We Develop and Use Software for Research. url:
    https://bssw.io/blog_posts/research-software-science-a-scientific-approach-to-
    understanding-and-improving-how-we-develop-and-use-software-for-research
    (visited on 2019).
    Hong, H. S. and F. J. H. (2003). “Algorithm 823: Implementing Scrambled Digital Nets”.
    In: ACM Trans. Math. Software 29, pp. 95–109. doi: 10.1145/779359.779360.
    Joe, S. and F. Y. Kuo (2003). “Remark on Algorithm 659: Implementing Sobol’s
    quasirandom sequence generator”. In: ACM Trans. Math. Software 29, pp. 49–57.
    15/20

    View Slide

  21. Introduction Structure Support Next Steps References
    References IV
    Joe, S. and F. Y. Kuo (2008). “Constructing Sobol sequences with better
    two-dimensional projections”. In: SIAM J. Sci. Comput. 30, pp. 2635–2654.
    — (n.d.). Sobol Sequence Generator. url:
    https://web.maths.unsw.edu.au/~fkuo/sobol/.
    Kollig, T. and A. Keller (2002). SamplePack. url:
    https://www.uni-kl.de/AG-Heinrich/SamplePack.html.
    Kucherenko, S. (n.d.). BRODA. url: https://www.broda.co.uk/index.html.
    L’Ecuyer, P. (2002). “SSJ: A Framework for Stochastic Simulation in Java”. In: 2002 Winter
    Simulation Conference. IEEE Press, pp. 234–242.
    — (2017). “History of uniform random number generation”. In: Proceedings of the 2017
    Winter Simulation Conference. Ed. by W. K. V. Chan et al. doi:
    10.1109/WSC.2017.8247790.
    — (n.d.). SSJ: Stochastic Simulation in Java. url:
    https://github.com/umontreal-simul/ssj.
    16/20

    View Slide

  22. Introduction Structure Support Next Steps References
    References V
    L’Ecuyer, P. and R. Couture (1997). “An Implementation of the Lattice and Spectral
    Tests for Multiple Recursive Linear Random Number Generators”. In: INFORMS J.
    Comput. 9, pp. 206–217.
    L’Ecuyer, P., R. Simard, et al. (2002). “An Objected-Oriented Random-Number Package
    with Many Long Streams and Substreams”. In: Oper. Res. 50, pp. 1073–1075. doi:
    10.1287/opre.50.6.1073.358.
    Lepage, G. P. (1978). “A new algorithm for adaptive multidimensional integration”. In:
    J. Comput. Phys. 27, pp. 192–203.
    — (2021). “Adaptive multidimensional integration: VEGAS enhanced”. In: J. Comput.
    Phys. 439, p. 110386.
    Logg, A., G. N. Wells, and J. Hake (2012). “DOLFIN: a C++/Python Finite Element
    Library”. In: Automated Solution of Differential Equations by the Finite Element Method.
    Ed. by K. M. A. Logg and G. N. Wells. Vol. 84. Lecture Notes in Computational Science
    and Engineering. Springer. Chap. 10.
    Lunn, D. et al. (2012). The BUGS Book: A Practical Introduction to Bayesian Analysis.
    Chapman & Hall.
    17/20

    View Slide

  23. Introduction Structure Support Next Steps References
    References VI
    MRC Biostatistic Unit (n.d.). The BUGS Project. url:
    https://www.mrc-bsu.cam.ac.uk/software/bugs/.
    Nuyens, D. (n.d.). Dirk’s Homepage. url:
    https://people.cs.kuleuven.be/~dirk.nuyens/.
    Owen, A. B. (2022). “On dropping the first Sobol’ point”. In: Monte Carlo and
    Quasi-Monte Carlo Methods: MCQMC, Oxford, England, August 2020, pp. 71–86. url:
    https://arxiv.org/pdf/2008.08051.pdf.
    Papageorgiou, A. (n.d.). FinDer. url:
    http://www.cs.columbia.edu/~ap/html/finder.html.
    Parno, M. et al. (2014). MIT uncertainty quantification (MUQ) library. url:
    https://mituq.bitbucket.io/source/_site/index.html.
    Paskov, S. and J. Traub (1995). “Faster Valuation of Financial Derivatives”. In: J. Portfolio
    Management 22, pp. 113–120.
    Paszke, A. et al. (2019). “PyTorch: An imperative style, high-performance deep learning
    library”. In: Advances in neural information processing systems 32, pp. 8026–8037.
    18/20

    View Slide

  24. Introduction Structure Support Next Steps References
    References VII
    Paulin, L. et al. (Aug. 2022). “MatBuilder: Mastering Sampling Uniformity Over
    Projections”. In: ACM Transactions on Graphics (Proceedings of SIGGRAPH) 41.4. doi:
    https://doi.org/10.1145/3528223.3530063.
    SciML QuasiMonteCarlo.jl (n.d.). url: https://github.com/SciML/QuasiMonteCarlo.jl.
    SciPy Developers (2020). scipy discussion of Sobol’ sequence implementation. url:
    https://github.com/scipy/scipy/pull/10844.
    Stan Development Team (2022). Stan Modeling Language Users Guide and Reference
    Manual, version 2.30. url: http://mc-stan.org.
    TF Quant Finance Contributors (2021). Quasi Monte-Carlo Methods. url:
    https://github.com/google/tf-quant-finance/math/qmc.
    The MathWorks, Inc. (2022). MATLAB R2022b. Natick, MA.
    The Numerical Algorithms Group (2021). The NAG Library. Mark 27. Oxford.
    U.S. Department of Energy, Office of Advanced Scientific Computing Research (2021).
    Workshop on the Science of Scientific-Software Development and Use. url:
    https://web.cvent.com/event/1b7d7c3a-e9b4-409d-ae2b-284779cfe72f/summary.
    19/20

    View Slide

  25. Introduction Structure Support Next Steps References
    References VIII
    Virtanen, P. et al. (2020). “SciPy 1.0: fundamental algorithms for scientific computing in
    Python”. In: Nature Methods 17.3, pp. 261–272.
    Wikipedia (n.d.). RANDU. url: https://en.wikipedia.org/wiki/RANDU.
    20/20

    View Slide