QMC for Beginners (Oct 2025)

Quasi–Monte Carlo Methods Smarter Sampling and Beyond Dr. Nathan Kirk
October 23rd, 2025 Department of Applied Mathematics Illinois Tech

Agenda Objective: Learn about numerical methods for integration, specifically, quasi-Monte
Carlo methods, and touch on some optimization methods. • Integration (and where it turns up) • Monte Carlo Methods • Quasi-Monte Carlo Methods • Optimized Quasi-Monte Carlo Sampling (my work!) • Randomized Quasi-Monte Carlo • Summary Topics: Number Theory, Numerical Methods, Optimization. 1

Useful Literature This talk is based on the survey article,
”Quasi-Monte Carlo Methods: What, Why and How?” by Fred J. Hickernell, Nathan Kirk and Aleksei G. Sorokin. Available at: https://arxiv.org/abs/2502.03644 Other useful literature: • J. Dick and F. Pillichshammer, Digital Nets and Sequences, Cambridge, 2010. • A. B. Owen, Monte Carlo Theory, Methods and Examples, 2013–, online book. https://artowen.su.domains/mc/ 2

Integration Given a mathematical function f , the integral b
a f (x) dx = F(b) − F(a) where F is any anti-derivative of f on [a, b], can yield new information about the physical or geometric context of the original function. You can think of it as the area/volume under the graph of the function. That is, it can represent: • Expected, or ”average” value of the function • Geometric meaning, like areas or volumes of shapes defined by the function f • Physical meaning, like work or energy in physics 3

Integration - An Example Hooke’s Law from physics says that
the force required to stretch a spring from resting position is F(x) = kx where F is the force, k is the spring constant (how stiff or loose the spring is), and x is the distance the spring is stretched. To find the total energy expended to stretch the spring (which has now stored as potential energy in the spring), we can compute Potential Energy = final initial F(x) dx = x=L x=0 kx dx = 1 2 kL2 4

Where Do Integrals Arise? Applications Across Fields Core ideas •
b a f (x) dx, Ω 1 dV • E[f (X)] = f (x) p(x) dx • F(t) = t −∞ p(x) dx • W = b a F(x) dx Application snapshots • Finance pricing: Payoff(ω) p(ω) dω • Engineering mass/CoM: m = Ω ρ(x) dx • Probability/ML evidence: Z = p(y | θ)p(θ) dθ • Computer graphics rendering integral Takeaway: Integrals aggregate many small contributions into totals—areas, averages, probabilities etc—all across science, engineering, and data. 5

When do we need numerical integration? Many integrals are known
exactly in closed form, e.g., 1 0 xk dx = 1 k + 1 , π 0 sin(x) dx = 2, ∞ −∞ e−x2 dx = √ π. But for complex integrands, a closed form may not exist or be practical. Solution: use numerical methods to approximate the integral. One powerful option is the Monte Carlo method. 6

Pit Stop 1 Any questions? 7

The Monte Carlo method - Sampling I am interested in
sampling problems. Sampling is intrinsically important to Monte Carlo methods. 8

The Monte Carlo method - An Example Consider the 1D
example f (x) = 1 + x2 + 1 2 sin(4πx) and the goal is to estimate the shaded area, 1 0 f (x) dx = 4/3 = 1.3333 . . . 9

The Monte Carlo method Our goal is to estimate I
= [0,1]d f (x) dx. Monte Carlo Procedure: 1. Sample the domain [0, 1]d uniformly at random N times to obtain X1, X2, . . . , XN 2. Compute f (Xi ) for each i 3. Take average: IMC = 1 N N i=1 f (Xi ). Accuracy: more samples ⇒ higher accuracy. The error of the approximation IMC to the true value I is O(1/ √ N). 10

The quasi-Monte Carlo method Quasi-Monte Carlo Procedure: 1. replace random
samples by low-discrepancy (LD)–or evenly spread–points X1, . . . , XN ∈ [0, 1]d . 2. Compute f (Xi ) of each i 3. Take average: IQMC = 1 N N i=1 f (Xi ). Accuracy: again, more samples ⇒ higher accuracy, but the error scales now like O(1/N), rather than the slower MC rate of O(1/ √ N). 11

The quasi-Monte Carlo method - An Example Using the same
one-dimensional example with low-discrepancy sampling... 12

Quasi-Monte Carlo in Higher Dimensions - The Keister Function We
consider an example from computational physics, referred to as the ’Keister’ function, which can be written as I = [0,1]d πd/2 cos Φ−1(x1), . . . , Φ−1(xd ) √ 2 dx =: f (x) We aim to approximate I via Monte Carlo methods in dimension 6. 13

Quasi-Monte Carlo in Higher Dimensions - The Keister Function We
are going to approximate the Keister function using random points, LD points and grid-based. Here is some two-dimensional visualization of those points: 14

Quasi-Monte Carlo in Higher Dimensions - The Keister Function (d
= 6) Why is grid-based sampling doing so poorly? 15

Quasi-Monte Carlo in Higher Dimensions - The Keister Function A
two-dimensional grid covers the space quite well. However, a d-dimensional grid for moderate values of d does not cover the unit cube well. 16

Quasi-Monte Carlo Constructions - The van der Corput sequence (1D)
Idea (radical inverse): write i in base b, reverse its digits around the decimal point: i = (dK dK−1 . . . d2 d1 d0)b ∈ N0 ⇒ ϕb(i) = (0.d0 d1 d2 . . . dK−1 dK )b ∈ [0, 1) e.g., 6 = 1 · 22 + 1 · 21 + 0 · 20 = (110)2 ⇒ ϕ2(6) = (0.011)2 = 0 · 1 2 + 1 · 1 4 + 1 · 1 8 = 3 8 Example (base 2): i binary ϕ2 (i) 0 0 0 1 (1)2 (0.1)2 = 1/2 2 (10)2 (0.01)2 = 1/4 3 (11)2 (0.11)2 = 3/4 4 (100)2 (0.001)2 = 1/8 5 (101)2 (0.101)2 = 5/8 6 (110)2 (0.011)2 = 3/8 . . . . . . . . . 17

From 1D to many dimensions - Halton Sequence Recall from
the grid example, when moving to higher dimensions, we need N points in [0, 1]d that are spread out in every coordinate. The Halton sequence places a one-dimensional van der Corput sequence in different bases in each coordinate: XHal i := (ϕ2 (i), ϕ3 (i), ϕ5 (i), . . . , ϕbd (i)) where ϕb is the radical inverse function. 18

From 1D to many dimensions - Lattices Another popular family
(and one of the simplest) of LD sets is lattices. They are defined as XLat i = ih N (mod 1), for i = 0, . . . , N − 1 where h is a well-chosen d-dimensional integer vector. 19

Quasi-Monte Carlo - Formal Analysis For smooth enough functions f
, we can theoretically say that the error of our (Q)MC approximation is at most the product of two terms: 1 N N i=1 f (Xi ) − [0,1]d f (x) dx ≤ V (f ) · D(PN ) where V (f ) is the ”spikeyness” of the function (not in our control), and D(PN ) is the uniformity of the sampling points PN = {X1 , X2 , . . . , XN }, called the discrepancy (our choice!) 21

Quasi-Monte Carlo - Discrepancy How can we calculate the uniformity
of a point set? Box 1 Box 2 Box 3 We look at all possible boxes, and search for where the points are most imbalanced. 22

Quasi-Monte Carlo - Formal Analysis For a sample point set
PN = {X1 , . . . , XN } ⊂ [0, 1]d , the discrepancy D(PN ) = sup b∈[0,1]d # (PN ∩ [0, b)) N − |[0, b)| . It measures the worst-case gap between empirical coverage and volume over all anchored boxes [0, b). Smaller discrepancy ⇒ more uniform point set 23

Discrepancy - An Example 24

Optimized quasi-Monte Carlo Quasi-Monte Carlo constructions are usually: • built
from number theoretic ideas (base 2 expansions, modulo 1 arithmetic, etc...) • analyzed ’asymptotically’, i.e., as N → ∞ or for very very large N Few methods address finding optimal node configurations for specific N and d parameters. Optimizing QMC constructions is what I have contributed to QMC methods 25

Optimized quasi-Monte Carlo In optimization pipelines, you must optimize against
an objective function. In our case, we optimize the sampling locations with respect to the discrepancy as the objective function. That is, we solve the following optimization problem: argmin PN ⊂[0,1]d D(PN) 26

Minimizing Functions - Link to Calculus You know (or, are
about to know) how to minimize functions! Given a function y = f (x), the optimum (maximum or minimum) will be attained when: dy dx = 0 e.g., for y = (x − 3)3 ⇒ dy dx = 3(x − 3)2 = 0 ⇒ x = 3 attains minimum of y = 0. However, oftentimes, either: • finding the gradient dy/dx is not analytically computable • There may be a few parameter values that all give very similar values (this is called ’local optima’) • Other more complicate reasons why ’first derivaive = 0’ may not work. In this case, you must rely on a numerical method for finding the optimum. 27

Minimizing Functions - Link to Calculus (a) One-dimensional minimum. (b)
Two-dimensional minimum. 28

Minimizing Functions - Link to Calculus Goal. Minimize an objective
function f (w) over parameters w ∈ Rd . Idea. Move opposite the gradient (steepest ascent direction) to decrease f . wt+1 = wt − η ∇f (wt) η > 0 is the learning rate / step size. Tiny 1D example. f (w) = (w − 3)2, so ∇f (w) = 2(w − 3). wt+1 = wt − η · 2(wt − 3) Starting at w0 = 0 and η = 0.25: w1 = 1.5, w2 = 2.25, w3 = 2.625, · · · → 3. Simple procedure. 1. Choose w0 and step size η. 2. Repeat: compute ∇f (wt) and update wt+1 = wt − η∇f (wt). 3. Stop when ∥∇f (wt)∥ is small or after a max number of steps. 29

Optimized quasi-Monte Carlo We would like to use D∗ as
our objective function. However: 1. very slow to compute even in small dimensions • the number of boxes to check is approximately Nd • e.g., for d = 6 and N = 100 points, I would need to check 1 trillion boxes (1,000,000,000,000) ⇒ Not feasible 2. the training objective needs to be differentiable for gradient-based learning Instead, we use the L2-discrepancy (Warnock’s formula): L2 2 (PN) = 1 3d − 2 N N i=1 d k=1 1 − X2 ik 2 + 1 N2 N i,j=1 d k=1 1 − max(Xik, Xjk) This is computable in N2 complexity, i.e., for d = 6, N = 100, only need 10,000 computations. 30

Optimized quasi-Monte Carlo - Message-Passing Monte Carlo (MPMC) One method
I developed, with collaborators from MIT, Oxford University and Google Deepmind is: Message-Passing Monte Carlo (MPMC). MPMC uses geometric deep learning and a graph neural network model to optimize the placement of our sampling nodes with respect to the L2-discrepancy. It is trained via gradient descent where the parameters are the points. 31

Optimized quasi-Monte Carlo - Message-Passing Monte Carlo (MPMC) 32

Optimized quasi-Monte Carlo - A Bayesian Optimization Approach Another I
approach I am currently developing with an Illinois Tech Masters student is a Subset Selection problem. We now start with a large N population set PN, and attempt to choose the best m < N points to minimize the discrepancy. That is, we wish to solve argmin Pm⊂PN , m<N D(Pm) We cannot check all subsets one-by-one. There are N m many. e.g, to find the ’best’ m = 50 points from N = 1000, we would need to check 946046101758521784606372227772804491872969400166865406479356932134325269719811 possible combinations! 33

Optimized quasi-Monte Carlo - A Bayesian Optimization Approach We attempt
a Bayesian Optimization (BO) strategy to solve the subset selection problem. It does not use gradient descent. Instead, relies on probabilistic modeling. BO is an optimization strategy that predicts the value, and gives uncertainty (error) its prediction. It uses this information to intelligently choose where to sample next. It efficiently balances exploration of unknown regions and exploitation of promising areas to find the optimum in as few evaluations as possible. 34

Optimized quasi-Monte Carlo - A Bayesian Optimization Approach 35

Optimized quasi-Monte Carlo - Other Objectives An objective function in
an optimization pipeline needs to be fast to evaluate; it is evaluated many thousands of times. We used the Warnock formula in the MPMC method. However, there are many others including the symmetric, centered, extreme. (a) Star (b) Centered (c) Symmetric 36

Randomized QMC The LD constructions described earlier are deterministic. The
sample mean I does not change across runs, like IID nodes would. While determinism has advantages, randomization has substantial advantages as well: • Randomization (done correctly) removes bias from the estimator • Replications can facilitate error estimates (think: confidence intervals) A good randomization must preserve the LD property of the QMC construction. 38

Randomized quasi-Monte Carlo - A Random Shift The simplest randomization
is to shift the point set by ∆ ∼ U[0, 1]d . For example, a shifted lattice rule is: X∆−Lat i := ih/N + ∆ (mod 1) . 39

Summary • Integrals are everywhere; we often need to approximate
them. • MC: random sampling; error ∼ N−1/2. • QMC: low-discrepancy points; error ∼ N−1. • LD constructions: 1D van der Corput; higher-D: Sobol’, Halton, lattices; optimized constructions are growing in use. • Randomized QMC often obtains the ’best-of-both-worlds’. 40

Thank you! References • F. J. Hickernell, N. Kirk, A.
G. Sorokin, Quasi-Monte Carlo Methods: What, Why and How?, Proc. MCQMC 2024. Available at: https://arxiv.org/abs/2502.03644 • J. Dick and F. Pillichshammer, Digital Nets and Sequences, Cambridge, 2010. • A. B. Owen, Monte Carlo Theory, Methods and Examples, 2013–, online book. https://artowen.su.domains/mc/ 41

QMC for Beginners (Oct 2025)

QMC for Beginners (Oct 2025)

More Decks by Nathan Kirk

Featured

Transcript