Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quasi-Monte Carlo Kernel Density Estimation

Quasi-Monte Carlo Kernel Density Estimation

Fred J. Hickernell

August 26, 2024
Tweet

More Decks by Fred J. Hickernell

Other Decks in Research

Transcript

  1. Quasi-Monte Carlo Kernel Density Estimation Fred J. Hickernell, Illinois Tech,

    [email protected] Yuhan Ding, Illinois Tech Brooke Feinberg, Scripps College Guillem Grao I Grasa, Deloitte (formerly Illinois Tech) Aiwen Li, University of Pennsylvania Aadit Jain, Rancho Bernardo High School 
 Larysa Matiukha, Illinois Tech Aleksei Sorokin, Illinois Tech Richard Varela, Sacramento State University August 21, 2024 Thanks to • The organizers • The US National Science Foundation #2316011.& #2244553 See this Jupyter notebook for computations
  2. Problem If , where , then we estimate Y =

    f(X) X ∼ 𝒰 [0,1]d population mean μ := 𝔼 (Y) = integral ∫ [0,1]d f(x) dx by sample mean 1 n n ∑ i=1 f(xi ) =: ̂ μn But what about approximating the probability density, , of the random variable ? ϱ Y
  3. Problem If , where , then we estimate Y =

    f(X) X ∼ 𝒰 [0,1]d population mean μ := 𝔼 (Y) = integral ∫ [0,1]d f(x) dx by sample mean 1 n n ∑ i=1 f(xi ) =: ̂ μn But what about approximating the probability density, , of the random variable ? ϱ Y Kernel density estimation (KDE) ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: KDE(y; {xi }n i=1 , k) ∫ ℝ k(z, y) proportion of z assigned to y dy = 1 ⟹ ∫ ℝ KDE(y; {xi }n i=1 , k) dy = 1 kernel
  4. Problem If , where , then we may approximate the

    probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k), e.g., k(z, y) = ˜ k((z − y)/h)/h, ∫ ∞ −∞ ˜ k(y) dy = 1
  5. Problem If , where , then we may approximate the

    probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k)
  6. Problem If , where , then we may approximate the

    probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k) • How to choose the kernel, , including bandwidth, ? k h
  7. Problem If , where , then we may approximate the

    probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k) • How to choose the kernel, , including bandwidth, ? k h • Are low discrepancy (LD) points, , better than independent and identically distributed (IID) points? {xi }n i=1
  8. Problem If , where , then we may approximate the

    probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k) • How to choose the kernel, , including bandwidth, ? k h • Are low discrepancy (LD) points, , better than independent and identically distributed (IID) points? {xi }n i=1 • What is the convergence rate for LD points?
  9. Problem If , where , then we may approximate the

    probability density, , by a kernel density estimator (KDE): Y = f(X) X ∼ 𝒰 [0,1]d ϱ ϱ(y) = d dy ℙ(Y ≤ y) ≈ 1 n n ∑ i=1 k(f(xi ), y) =: ̂ ϱ(y; {xi }n i=1 , k) • How to choose the kernel, , including bandwidth, ? k h • Are low discrepancy (LD) points, , better than independent and identically distributed (IID) points? {xi }n i=1 • What is the convergence rate for LD points? • How to choose to reach the desired error tolerance? n
  10. Where does this arise in practice? Y = f(X) =

    option payo ff underground water pressure with random rock porosity option price average water pressure = μ
  11. Recent work • [BALEOP21] demonstrate that the mean integrated squared

    error of KDEs using randomized LD sequences is theoretically and empirically superior to using IID sequences. They also discuss how to choose the bandwidth of the kernel. • [LEPBA22] show how conditional Monte Carlo density estimators (CDEs) of using randomized LD sequences has a faster convergence rate than using IID sequences • [LEP22] compare KDEs, CDEs, and likelihood ratio density estimators using randomized LD sequences • The error of interpolated pointwise CDEs of is analyzed in the deterministic setting by [GKS23] ϱ ϱ
  12. Why do we think that LD is better? Grids do

    not fi ll space well, IID are better, but LD is best
  13. Big picture error analysis |ϱ(y) − KDE(y; {xi }n i=1

    , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz
  14. Big picture error analysis |ϱ(y) − KDE(y; {xi }n i=1

    , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz If and k(z, y) = ˜ k((z − y)/h)/h |ϱ(y) − ˜ ϱ(y; k)| = 𝒪 (hp) and |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| = 𝒪 (n−qh−r) Then an optimal choice of is h hopt = 𝒪 (n−q/(p+r)) and |ϱ(y) − KDE(y; {xi }n i=1 , k)| = 𝒪 (n−q/(1+r/p))
  15. Big picture error analysis |ϱ(y) − KDE(y; {xi }n i=1

    , k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz If and k(z, y) = ˜ k((z − y)/h)/h |ϱ(y) − ˜ ϱ(y; k)| = 𝒪 (hp) and |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| = 𝒪 (n−qh−r) Then an optimal choice of is h hopt = 𝒪 (n−q/(p+r)) and |ϱ(y) − KDE(y; {xi }n i=1 , k)| = 𝒪 (n−q/(1+r/p)) For IID, q = r = 1/2 hIID opt = 𝒪 (n−1/(2p+1)) and |ϱ(y) − KDE(y; {xIID i }n i=1 , k)| = 𝒪 (n−1/(2+1/p))
  16. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz
  17. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz |ϱ(y) − ˜ ϱ(y; k)| ≤ smooth(k, Ky ) density independent ∥ϱ∥Ky ⏟ kernel independent smooth2(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt
  18. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz |ϱ(y) − ˜ ϱ(y; k)| ≤ smooth(k, Ky ) density independent ∥ϱ∥Ky ⏟ kernel independent smooth2(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ≤ discrepancy({x}n i=1 ) kernel independent ∥k(f( ⋅ ), y)∥Kx sample independent discrepancy2({x}n i=1 ) = ∫ [0,1]d×[0,1]d Kx (x, t) dx dt − 2 n n ∑ i=1 ∫ [0,1]d Kx (x, xi ) dx + 1 n2 n ∑ i,j=1 Kx (xi , xj )
  19. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz |ϱ(y) − ˜ ϱ(y; k)| ≤ smooth(k, Ky ) density independent ∥ϱ∥Ky ⏟ kernel independent smooth2(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ≤ discrepancy({x}n i=1 ) kernel independent ∥k(f( ⋅ ), y)∥Kx sample independent discrepancy2({x}n i=1 ) = ∫ [0,1]d×[0,1]d Kx (x, t) dx dt − 2 n n ∑ i=1 ∫ [0,1]d Kx (x, xi ) dx + 1 n2 n ∑ i,j=1 Kx (xi , xj ) and are reproducing kernels Ky Kx
  20. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ |ϱ(y) − ˜ ϱ(y; k)| + |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ˜ ϱ(y; k):= 𝔼 [k(Y, y)] = ∫ ∞ −∞ k(z, y) ϱ(z) dz |ϱ(y) − ˜ ϱ(y; k)| ≤ smooth(k, Ky ) density independent ∥ϱ∥Ky ⏟ kernel independent smooth2(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt |˜ ϱ(y; k) − KDE(y; {xi }n i=1 , k)| ≤ discrepancy({x}n i=1 ) kernel independent ∥k(f( ⋅ ), y)∥Kx sample independent discrepancy2({x}n i=1 ) = ∫ [0,1]d×[0,1]d Kx (x, t) dx dt − 2 n n ∑ i=1 ∫ [0,1]d Kx (x, xi ) dx + 1 n2 n ∑ i,j=1 Kx (xi , xj ) and are reproducing kernels Ky Kx smooth(k, Ky ) ↓ ⟺ ∥k(f( ⋅ ), y)∥Kx ↑
  21. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ smooth(k, Ky ) ∥ϱ∥Ky + discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥K(f( ⋅ ), y)∥Kx
  22. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ smooth(k, Ky ) ∥ϱ∥Ky + discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥K(f( ⋅ ), y)∥Kx For k(z, y) = ˜ k((z − y)/h)/h), ˜ k even smooth(k, Ky ) = Ky (y, y) − 2 ∫ ∞ −∞ k(z, y) Ky (z, y) dz + ∫ ∞ −∞ ∫ ∞ −∞ k(z, y) Ky (z, t) k(t, y) dz dt = 𝒪 (hr+2) if ∫ ∞ −∞ ˜ k(w) w2l dw = δ0,l , l = 0,…, r
  23. Deterministic error analysis |ϱ(y) − KDE(y; {xi }n i=1 ,

    k)| ≤ smooth(k, Ky ) ∥ϱ∥Ky + discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥K(f( ⋅ ), y)∥Kx r = 0 r = 1 For k(z, y) = ˜ k((z − y)/h)/h), ˜ k even smooth(k, Ky ) = 𝒪 (hr+2) if ∫ ∞ −∞ ˜ k(w) w2l dw = δ0,l , l = 0,…, r
  24. Deterministic error analysis |˜ ϱ(y; k) − KDE(y; {xi }n

    i=1 , k)| = ∫ ∞ −∞ g(x; y) dx − 1 n n ∑ i=1 g(xi ; y) , g(x; y) := k(f(x), y) ≤ discrepancy({xi }n i=1 ) 𝒪 (n−1+δ) ∥g( ⋅ ; y)∥Kx E.g., discrepancy2({xi }n i=1 ) = d ∏ ℓ=1 ( 1 + γ2 ℓ 12) − 2 n n ∑ i=1 d ∏ ℓ=1 [ 1 + γ2 ℓ 2 ( xiℓ − 1/2 − xiℓ − 1/2 2 )] + 1 n2 n ∑ i,j=1 d ∏ ℓ=1 [ 1 + γ2 ℓ 2 ( xiℓ − 1/2 + xjℓ − 1/2 − xiℓ − xjℓ )] variation2(g) = ∫ [0,1] ∂g(x1 , 1/2) γ1 ∂x1 2 dx1 + ⋯ + ∫ [0,1]2 ∂2g(x1 , x2 , 1/2) γ1 γ2 ∂x1 ∂x2 2 dx1 dx2 +⋯ + ∫ [0,1]d ∂dg(x) γ1 ⋯γd ∂x1 ⋯∂xd 2 dx
  25. Deterministic error analysis g(x; y):= k(f(x), y) = ˜ k((f(x)

    − y)/h)/h E.g., variation2(g) = ∫ [0,1] ∂g(x1 , 1/2) γ1 ∂x1 2 dx1 + ⋯ + ∫ [0,1]2 ∂2g(x1 , x2 , 1/2) γ1 γ2 ∂x1 ∂x2 2 dx1 dx2 +⋯ + ∫ [0,1]d ∂dg(x) γ1 ⋯γd ∂x1 ⋯∂xd 2 dx ∂g(x 𝔲 , 1/2) ∂x 𝔲 = ∑ P∈Π( 𝔲 ) 1 h|P|+1 ˜ k(|P|) ( f(x 𝔲 , 1/2) − y h )∏ 𝔳 ∈P ∂| 𝔳 | (f(x 𝔳 , 1/2)) ∂x 𝔳 Π( 𝔲 ) := set of partitions of 𝔲 Even if is additive, i.e., , f f(x) = f1 (x1 ) + ⋯ + fd (xd ) variation(˜ k(f(x − y)/h)/h) = 𝒪 (h−d)?
  26. Ex. sum of uniforms, squared exponential K Y = X1

    + ⋯ + Xd , X ∼ 𝒰 [0,1]d
  27. Ex. sum of uniforms, squared exponential K Y = X1

    + ⋯ + Xd , X ∼ 𝒰 [0,1]d
  28. Ex. sum of uniforms, squared exponential K Y = X1

    + ⋯ + Xd , X ∼ 𝒰 [0,1]d errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))
  29. Ex. sum of uniforms, squared exponential K Y = X1

    + ⋯ + Xd , X ∼ 𝒰 [0,1]d hIID = 𝒪 (n−1/5) hLD = 𝒪 (n−1/(2+d))
  30. Ex. weighted sum of Gaussians, squared exponential K Y =

    γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ
  31. Ex. weighted sum of Gaussians, squared exponential K Y =

    γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))
  32. Ex. weighted sum of Gaussians, Gauss kernel Y = γ1

    X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ hIID = 𝒪 (n−1/5) hLD = 𝒪 (n−1/(2+d))
  33. Ex. sum of uniforms, squared exp x quad K Y

    = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d
  34. Ex. sum of uniforms, squared exp x quad K Y

    = X1 + ⋯ + Xd , X ∼ 𝒰 [0,1]d errorIID = 𝒪 (n−3/7) errorLD = 𝒪 (n−1/(1+d/3))
  35. Ex. sum of uniforms, squared exponential K Y = X1

    + ⋯ + Xd , X ∼ 𝒰 [0,1]d errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))
  36. Ex. weighted sum of Gaussians, squared exp x quad K

    Y = γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ errorIID = 𝒪 (n−3/7) errorLD = 𝒪 (n−1/(1+d/3))
  37. Ex. weighted sum of Gaussians, squared exponential K Y =

    γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ errorIID = 𝒪 (n−2/5) errorLD = 𝒪 (n−1/(1+d/2))
  38. Conditional Monte Carlo for Density Estimation (CDE) Estimate the probability

    density, , for , where ϱ Y = f(X) X ∼ 𝒰 [0,1]d
  39. Conditional Monte Carlo for Density Estimation (CDE) Estimate the probability

    density, , for , where ϱ Y = f(X) X ∼ 𝒰 [0,1]d y = f(x) ⟺ x1 = u(y; x2:d ) If one can identify , such that u
  40. Conditional Monte Carlo for Density Estimation (CDE) Estimate the probability

    density, , for , where ϱ Y = f(X) X ∼ 𝒰 [0,1]d y = f(x) ⟺ x1 = u(y; x2:d ) Then ϱ(y) = ∫ [0,1]d−1 u′  (y; x) dx If one can identify , such that u
  41. Ex. sum of uniforms, KDE vs. CDE Y = X1

    + ⋯ + Xd , X ∼ 𝒰 [0,1]d
  42. Ex. weighted sum of Gaussians, KDE vs CDE Y =

    γ1 X1 + ⋯ + γd Xd , Xℓ IID ∼ 𝒩 (0,1), γℓ = 1/ℓ
  43. (Preliminary) conclusions • KDE allows density estimation for black box

    • Theory for LD sequences for KDE matches practice sometimes • Performance of LD sequences for KDE compared to IID is disappointing for larger ; can this be overcome? • Don’t know how to • Adjust to accommodate known boundaries in the sample space of • Choose bandwidth from data • Choose to get desired accuracy • CDE, when available, outperforms KDE f d k Y n
  44. References [BALEOP21] A. Ben Abdellah, P. L'Ecuyer, A. B. Owen,

    and F. Puchhammer, Density estimation by randomized quasi-Monte Carlo, SIAM/ASA J. Uncertain. Quantif. 9 (2021), pp. 280-301. [GKS23] A. D. Gilbert, F. Y. Kuo, and I. H. Sloan, Analysis of preintegration followed by quasi-Monte Carlo integration for distribution functions and densities, SIAM J. Numer. Anal. 61 (2023), 135–166. [LEP22] P. L'Ecuyer and F. Puchhammer, Density estimation by Monte Carlo and quasi-Monte Carlo, Monte Carlo and Quasi-Monte Carlo Methods: MCQMC, Oxford, England, August 2020 (A. Keller, ed.), Springer Proceedings in Mathematics and Statistics, Springer, Cham, 2022, pp. 3–21. LEPBA22] P. L'Ecuyer, F. Puchhammer, and A. Ben Abdellah, Monte Carlo and quasi-Monte Carlo density estimation via conditioning, INFORMS J. Comput. 34 (2022), no. 3, 1729–1748.