Quasi-Monte Carlo Kernel Density Estimation

Slide 1

Slide 1 text

Quasi-Monte Carlo Kernel Density Estimation Fred J. Hickernell, Illinois Tech, [email protected] Yuhan Ding, Illinois Tech Brooke Feinberg, Scripps College Guillem Grao I Grasa, Deloitte (formerly Illinois Tech) Aiwen Li, University of Pennsylvania Aadit Jain, Rancho Bernardo High School   Larysa Matiukha, Illinois Tech Aleksei Sorokin, Illinois Tech Richard Varela, Sacramento State University August 21, 2024 Thanks to • The organizers • The US National Science Foundation #2316011.& #2244553 See this Jupyter notebook for computations

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Problem If , where , then we estimate Y = f(X) X ∼ 𝒰 [0,1]d population mean μ := 𝔼 (Y) = integral ∫ [0,1]d f(x) dx by sample mean 1 n n ∑ i=1 f(xi ) =: ̂ μn But what about approximating the probability density, , of the random variable ? ϱ Y Kernel density estimation (KDE) ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: KDE(y; {xi }n i=1 , k) ∫ ℝ k(z, y) proportion of z assigned to y dy = 1 ⟹ ∫ ℝ KDE(y; {xi }n i=1 , k) dy = 1 kernel

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Problem If , where , then we may approximate the probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k) • How to choose the kernel, , including bandwidth, ? k h • Are low discrepancy (LD) points, , better than independent and identically distributed (IID) points? {xi }n i=1

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Where does this arise in practice? Y = f(X) = option payo ff underground water pressure with random rock porosity option price average water pressure = μ

Slide 11

Slide 11 text

Recent work • [BALEOP21] demonstrate that the mean integrated squared error of KDEs using randomized LD sequences is theoretically and empirically superior to using IID sequences. They also discuss how to choose the bandwidth of the kernel. • [LEPBA22] show how conditional Monte Carlo density estimators (CDEs) of using randomized LD sequences has a faster convergence rate than using IID sequences • [LEP22] compare KDEs, CDEs, and likelihood ratio density estimators using randomized LD sequences • The error of interpolated pointwise CDEs of is analyzed in the deterministic setting by [GKS23] ϱ ϱ

Slide 12

Slide 12 text

Why do we think that LD is better?

Slide 13

Slide 13 text

Why do we think that LD is better? Grids do not fi ll space well, IID are better, but LD is best

Slide 14

Slide 14 text

Big picture error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz

Slide 15

Slide 15 text

Big picture error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz If and k(z, y) = ˜ k((z − y)/h)/h |ϱ(y) − ˜ ϱ(y; k)| = 𝒪 (hp) and |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| = 𝒪 (n−qh−r) Then an optimal choice of is h hopt = 𝒪 (n−q/(p+r)) and |ϱ(y) − KDE(y; {xi }n i=1 , k)| = 𝒪 (n−q/(1+r/p))

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz |ϱ(y) − ˜ ϱ(y; k)| ≤ smooth(k, Ky ) density independent ∥ϱ∥Ky ⏟ kernel independent smooth2(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ≤ discrepancy({x}n i=1 ) kernel independent ∥k(f( ⋅ ), y)∥Kx sample independent discrepancy2({x}n i=1 ) = ∫ [0,1]d×[0,1]d Kx (x, t) dx dt − 2 n n ∑ i=1 ∫ [0,1]d Kx (x, xi ) dx + 1 n2 n ∑ i,j=1 Kx (xi , xj )

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ smooth(k, Ky ) ∥ϱ∥Ky + discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥K(f( ⋅ ), y)∥Kx

Slide 23

Slide 23 text

Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ smooth(k, Ky ) ∥ϱ∥Ky + discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥K(f( ⋅ ), y)∥Kx For k(z, y) = ˜ k((z − y)/h)/h), ˜ k even smooth(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt = 𝒪 (hr+2) if ∫ ∞ −∞ ˜ k(w) w2l dw = δ0,l , l = 0,…, r

Slide 24

Slide 24 text

Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 , k)| ≤ smooth(k, Ky ) ∥ϱ∥Ky + discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥K(f( ⋅ ), y)∥Kx r = 0 r = 1 For k(z, y) = ˜ k((z − y)/h)/h), ˜ k even smooth(k, Ky ) = 𝒪 (hr+2) if ∫ ∞ −∞ ˜ k(w) w2l dw = δ0,l , l = 0,…, r

Slide 25

Slide 25 text

Deterministic error analysis |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| = ∫ ∞ −∞ g(x; y) dx − 1 n n ∑ i=1 g(xi ; y) , g(x; y) := k(f(x), y) ≤ discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥g( ⋅ ; y)∥Kx E.g., discrepancy2({xi }n i=1 ) = d ∏ ℓ=1 ( 1 + γ2 ℓ 12) − 2 n n ∑ i=1 d ∏ ℓ=1 [ 1 + γ2 ℓ 2 ( xiℓ − 1/2 − xiℓ − 1/2 2 )] + 1 n2 n ∑ i,j=1 d ∏ ℓ=1 [ 1 + γ2 ℓ 2 ( xiℓ − 1/2 + xjℓ − 1/2 − xiℓ − xjℓ )] variation2(g) = ∫ [0,1] ∂g(x1 , 1/2) γ1 ∂x1 2 dx1 + ⋯ + ∫ [0,1]2 ∂2g(x1 , x2 , 1/2) γ1 γ2 ∂x1 ∂x2 2 dx1 dx2 +⋯ + ∫ [0,1]d ∂dg(x) γ1 ⋯γd ∂x1 ⋯∂xd 2 dx

Slide 26

Slide 26 text

Deterministic error analysis g(x; y):= k(f(x), y) = ˜ k((f(x) − y)/h)/h E.g., variation2(g) = ∫ [0,1] ∂g(x1 , 1/2) γ1 ∂x1 2 dx1 + ⋯ + ∫ [0,1]2 ∂2g(x1 , x2 , 1/2) γ1 γ2 ∂x1 ∂x2 2 dx1 dx2 +⋯ + ∫ [0,1]d ∂dg(x) γ1 ⋯γd ∂x1 ⋯∂xd 2 dx ∂g(x 𝔲 , 1/2) ∂x 𝔲 = ∑ P∈Π( 𝔲 ) 1 h|P|+1 ˜ k(|P|) ( f(x 𝔲 , 1/2) − y h )∏ 𝔳 ∈P ∂| 𝔳 | (f(x 𝔳 , 1/2)) ∂x 𝔳 Π( 𝔲 ) := set of partitions of 𝔲 Even if is additive, i.e., , f f(x) = f1 (x1 ) + ⋯ + fd (xd ) variation(˜ k(f(x − y)/h)/h) = 𝒪 (h−d)?

Slide 27

Slide 27 text

Ex. sum of uniforms, squared exponential K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d

Slide 28

Slide 28 text

Ex. sum of uniforms, squared exponential K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d

Slide 29

Slide 29 text

Ex. sum of uniforms, squared exponential K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))

Slide 30

Slide 30 text

Ex. sum of uniforms, squared exponential K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d hIID = 𝒪 (n−1/5) hLD = 𝒪 (n−1/(2+d))

Slide 31

Slide 31 text

Ex. weighted sum of Gaussians, squared exponential K Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ

Slide 32

Slide 32 text

Ex. weighted sum of Gaussians, squared exponential K Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))

Slide 33

Slide 33 text

Ex. weighted sum of Gaussians, Gauss kernel Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ hIID = 𝒪 (n−1/5) hLD = 𝒪 (n−1/(2+d))

Slide 34

Slide 34 text

Ex. sum of uniforms, squared exp x quad K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d

Slide 35

Slide 35 text

Ex. sum of uniforms, squared exp x quad K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d errorIID = 𝒪 (n−3/7) errorLD = 𝒪 (n−1/(1+d/3))

Slide 36

Slide 36 text

Ex. sum of uniforms, squared exponential K Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))

Slide 37

Slide 37 text

Ex. weighted sum of Gaussians, squared exp x quad K Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ errorIID = 𝒪 (n−3/7) errorLD = 𝒪 (n−1/(1+d/3))

Slide 38

Slide 38 text

Ex. weighted sum of Gaussians, squared exponential K Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))

Slide 39

Slide 39 text

Conditional Monte Carlo for Density Estimation (CDE) Estimate the probability density, , for , where ϱ Y = f(X) X ∼ 𝒰 [0,1]d

Slide 40

Slide 40 text

Conditional Monte Carlo for Density Estimation (CDE) Estimate the probability density, , for , where ϱ Y = f(X) X ∼ 𝒰 [0,1]d y = f(x) ⟺ x1 = u(y; x2:d ) If one can identify , such that u

Slide 41

Slide 41 text

Conditional Monte Carlo for Density Estimation (CDE) Estimate the probability density, , for , where ϱ Y = f(X) X ∼ 𝒰 [0,1]d y = f(x) ⟺ x1 = u(y; x2:d ) Then ϱ(y) = ∫ [0,1]d−1 u′ (y; x) dx If one can identify , such that u

Slide 42

Slide 42 text

Ex. sum of uniforms, KDE vs. CDE Y = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d

Slide 43

Slide 43 text

Ex. weighted sum of Gaussians, KDE vs CDE Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ

Slide 44

Slide 44 text

(Preliminary) conclusions • KDE allows density estimation for black box • Theory for LD sequences for KDE matches practice sometimes • Performance of LD sequences for KDE compared to IID is disappointing for larger ; can this be overcome? • Don’t know how to • Adjust to accommodate known boundaries in the sample space of • Choose bandwidth from data • Choose to get desired accuracy • CDE, when available, outperforms KDE f d k Y n

Slide 45

Slide 45 text

References [BALEOP21] A. Ben Abdellah, P. L'Ecuyer, A. B. Owen, and F. Puchhammer, Density estimation by randomized quasi-Monte Carlo, SIAM/ASA J. Uncertain. Quantif. 9 (2021), pp. 280-301. [GKS23] A. D. Gilbert, F. Y. Kuo, and I. H. Sloan, Analysis of preintegration followed by quasi-Monte Carlo integration for distribution functions and densities, SIAM J. Numer. Anal. 61 (2023), 135–166. [LEP22] P. L'Ecuyer and F. Puchhammer, Density estimation by Monte Carlo and quasi-Monte Carlo, Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Oxford, England, August 2020 (A. Keller, ed.), Springer Proceedings in Mathematics and Statistics, Springer, Cham, 2022, pp. 3–21. LEPBA22] P. L'Ecuyer, F. Puchhammer, and A. Ben Abdellah, Monte Carlo and quasi-Monte Carlo density estimation via conditioning, INFORMS J. Comput. 34 (2022), no. 3, 1729–1748.