Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ayoub Belhadji

S³ Seminar
December 07, 2023
66

Ayoub Belhadji

(Univ Lyon, ENS de Lyon, Inria, CNRS, UCBL, LIP UMR 5668)

Title — Signal reconstruction using determinantal sampling

Abstract — We study the approximation of a square integrable function from a finite number of its evaluations at well-chosen nodes. The function is assumed to belong to a reproducing kernel Hilbert space (RKHS), the approximation to be one of a few natural finite-dimensional approximations, and the nodes to be drawn from one of two probability distributions. Both distributions are related to determinantal point processes, and use the kernel of the RKHS to favor RKHS-adapted regularity in the random design. While previous work on determinantal sampling relied on the RKHS norm, we prove mean-square guarantees in L2 norm. We show that determinantal point processes and mixtures thereof can yield fast rates, and that they shed some light on how the rate changes as more smoothness is assumed, a phenomenon known as superconvergence. Besides, determinantal sampling generalizes i.i.d. sampling from the Christoffel function, a standard in the literature. In particular, determinantal sampling guarantees the so-called instance optimality property for a smaller number of function evaluations than i.i.d. sampling.

Bio
Ayoub Belhadji is currently a Postdoc in Laboratoire de l’Informatique du Parallélisme (LIP) at ENS de Lyon. He received his Ph.D. degree from Ecole Centrale de Lille in 2020, specializing in theoretical signal processing and machine learning. His main research interests include sampling, kernel methods and sketching methods.

S³ Seminar

December 07, 2023
Tweet

Transcript

  1. Signal Reconstruction using Determinantal Sampling Ayoub Belhadji OCKHAM team, LIP,

    ENS de Lyon Univ Lyon, Inria, CNRS, UCBL Joint work with R´ emi Bardenet and Pierre Chainais Centrale Lille, CRIStAL, Universit´ e de Lille, CNRS 07/12/2023 L2S - CentraleSup´ elec 1/45
  2. 2/45 Introduction A determinantal point process (DPP) is a distribution

    over subsets of some set X, I, . . . x1 x2 x3 x4 X I
  3. 2/45 Introduction A determinantal point process (DPP) is a distribution

    over subsets of some set X, I, . . . x1 x2 x3 x4 X I ...with the negative correlation property: ∀B, B ⊂ X, B ∩ B = ∅ =⇒ Cov(nx (B), nx (B )) ≤ 0, where nx (B) := |B ∩ x|
  4. 3/45 Introduction A DPP on [0, 1]2 0 1 1

    0 1 1 0 1 1 i.i.d. particles on [0, 1]2 0 1 1 0 1 1 0 1 1
  5. 4/45 Introduction A DPP on the unit circle 1.0 0.5

    0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 i.i.d. particles on the unit circle 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
  6. 5/45 Introduction Early appearances of DPPs may be traced back

    to the work of Dyson (1962) 1 and Ginibre (1965)2 A universal generic (X, ω) definition is given in the work of Macchi (1975)3 1Dyson, F.J., 1962. Statistical theory of the energy levels of complex systems. I. Journal of Mathematical Physics, 3(1), pp.140-156. 2Ginibre, J., 1965. Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), pp.440-449. 3Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.
  7. 5/45 Introduction Early appearances of DPPs may be traced back

    to the work of Dyson (1962) 1 and Ginibre (1965)2 A universal generic (X, ω) definition is given in the work of Macchi (1975)3 Definition (informal): Determinantal Point Process Given a metric space X and a measure ω, a DPP satisfies PDPP ∃ at least k points one in each Bi , i = 1, . . . , k = B1×···×Bk Det κ(x1, . . . , xk ) kernel matrix dω(x1 ) . . . dω(xk ) for a kernel κ : X × X → R. 1Dyson, F.J., 1962. Statistical theory of the energy levels of complex systems. I. Journal of Mathematical Physics, 3(1), pp.140-156. 2Ginibre, J., 1965. Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), pp.440-449. 3Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.
  8. 6/45 Introduction Sampling using DPPs until 2019... Discrete Continuous learning

    on budget 4 numerical integration node selection in a graph 5 X ⊂ R 6 feature selection universal results 7 X = [0, 1]d specific results 8 4[Deshpande et al. (2006)],[Derezinski et al. (2017,2018,2019)] 5[Tremblay et al. (2017)] 6[Lambert (2018)] 7[Belhadji et al. (2018)] 8[Bardenet and Hardy (2016)]
  9. 6/45 Introduction Sampling using DPPs until 2019... Discrete Continuous learning

    on budget 4 numerical integration node selection in a graph 5 X ⊂ R 6 feature selection universal results 7 X = [0, 1]d specific results 8 universal results for DPP-based sampling in continuous domain? 4[Deshpande et al. (2006)],[Derezinski et al. (2017,2018,2019)] 5[Tremblay et al. (2017)] 6[Lambert (2018)] 7[Belhadji et al. (2018)] 8[Bardenet and Hardy (2016)]
  10. 7/45 Outline 1 The setting 2 DPPs for numerical integration

    in RKHSs 3 Beyond numerical integration 4 Numerical simulations 5 Perspectives
  11. 8/45 Towards a universal construction of quadrature rules DPP based

    quadrature QMC Weyl criterion Gaussian quadrature OPE Kernel quadrature Vanilla Monte Carlo ? CLTs
  12. 8/45 Towards a universal construction of quadrature rules DPP based

    quadrature QMC Weyl criterion Gaussian quadrature OPE Kernel quadrature Vanilla Monte Carlo ? CLTs A universal construction of quadrature rules using DPPs?
  13. 9/45 Kernel-based analysis of quadrature rules The crux of kernel-based

    analysis of a quadrature rule is the study of the worst case error on the unit ball of a RKHS F sup f F ≤1 X f (x)g(x)dω(x) integral − i∈[N] wi f (xi ) quadrature rule L2 (ω) F
  14. 10/45 Reproducing kernel Hilbert spaces Definition An RKHS F is

    a Hilbert space associated to a kernel k, satisfying: ∀x ∈ X, f → f (x) is continuous ∀(x, f ) ∈ X × F, f , k(x, .) F = f (x) Example: X = [0, 1] and k1(x, y) := 1 − 2π2B2({x − y}) where {x − y} is the fractional part of x − y, and B2(x) = x2 − x + 1 6 . 0 1 x 0 1 2 3 4 k1 (x0 ,x)
  15. 11/45 Embeddings as elements of the RKHS F contains smooth

    functions Definition: an embedding of an element of L2(ω) Given g ∈ L2(ω), the embedding of g is defined by µg = Σg := X k(x, .)g(x)dω(x) ∈ F Σ : L2(ω) → L2(ω) = integration operator associated to (k, ω). 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 g g (s=1) 0.0 0.2 0.4 0.6 0.8 1.0 x 1.8 2.0 2.2 2.4 2.6 2.8 3.0 g g (s=1) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 g g (s=1)
  16. 12/45 Embeddings and quadrature rules Properties The reproducibility of integrals

    ∀f ∈ F, f , µg F = X f (x)g(x)dω(x). The worst integration error on the unit ball of F: sup f F ≤1 X f (x)g(x)dω(x) integral − N i=1 wi f (xi ) quadrature rule = µg − N i=1 wi k(xi , .) F WCE The study of quadrature rules boils down to the study of kernel approximations of embeddings
  17. 13/45 Embeddings and quadrature rules A sanity check using the

    ’Monte Carlo quadrature’ Let x1, . . . , xN = i.i.d. ∼ ω, wi = 1/N. Under some assumptions on k, we have E µg − N i=1 1 N k(xi , .) 2 F = O(1/N). We recover the ’Monte Carlo rate’ O(1/N)
  18. 13/45 Embeddings and quadrature rules A sanity check using the

    ’Monte Carlo quadrature’ Let x1, . . . , xN = i.i.d. ∼ ω, wi = 1/N. Under some assumptions on k, we have E µg − N i=1 1 N k(xi , .) 2 F = O(1/N). We recover the ’Monte Carlo rate’ O(1/N) Can we do better?
  19. 14/45 The optimal kernel quadrature Definition: the optimal kernel quadrature

    Given a set of nodes x = {x1, . . . , xN} s.t. K(x) :=    k(x1, x1) . . . k(x1, xN) . . . ... . . . k(xN, x1) . . . k(xN, xN)    is non-singular, the optimal kernel quadrature is the couple (x, ˆ w) such that µg − N i=1 ˆ wi (µg )k(xi , .) F = min w∈RN µg − N i=1 wi k(xi , .) F
  20. 15/45 The convergence of the optimal kernel quadrature The study

    of the convergence rate of the optimal kernel quadrature was carried out in several works X F or k x The rate Reference [0, 1] Sobolev S. Unif. grid O(N−2s ) [Novak et al., 2015] (g is cos or sin) [Bojanov , 1981] [0, 1]d ⊗ Sobolev S. QMC seq. QMC rates [Briol et al, 2019] (g is constant) [0, 1]d ⊗ Sobolev S. - O(N−2s/d ) [Wendland, 2004] Rd Gaussian ⊗ Hermite roots O(exp(−αN)) [Karvonen et al., 2019] (+ assumptions) ... ... ... ... ... Limitation The analysis is too specific to the RKHS F, to x, to g... Any universal analysis of the OKQ?
  21. 16/45 A spectral characterization of the RKHS What is common

    among existing results? Assumptions There exists a spectral decomposition (em, σm)m∈N∗ of Σ where (em)m∈N∗ is an o.n.b. of L2(ω) and σ1 ≥ σ2 ≥ ... > 0 s.t. m∈N∗ σm < +∞ The RKHS F is dense in L2(ω)
  22. 16/45 A spectral characterization of the RKHS What is common

    among existing results? Assumptions There exists a spectral decomposition (em, σm)m∈N∗ of Σ where (em)m∈N∗ is an o.n.b. of L2(ω) and σ1 ≥ σ2 ≥ ... > 0 s.t. m∈N∗ σm < +∞ The RKHS F is dense in L2(ω) f 2 F = m∈N∗ f , em 2 ω σm =⇒ { f F = 1} is an ellipsoid in L2 (ω) m∈N∗ µg , em 2 ω σ2 m = m∈N∗ g, em 2 ω =⇒ {µg ; g ω = 1} is an ellipsoid in L2 (ω)
  23. 17/45 A spectral characterization of the RKHS Σ1/2 Σ1/2 Σ

    L2(ω) F = Σ1/2 L2(ω) ΣL2(ω) B L2(ω) (0, 1) BF (0, 1) ΣB L2(ω) (0, 1) e1 e2 eF 1 eF 2 µe1 µe2 X F or k σN+1 (em) [0, 1] Sobolev O(N−2s) Fourier [0, 1]d Korobov O(log(N)2s(d−1)N−2s) ⊗ of Fourier [0, 1]d Sobolev O(N−2s/d ) ”Fourier” Sd Dot product ”-” Spherical Harmonics R Gaussian O(e−αN) Hermite Polys. Rd Gaussian O(e−αdN1/d ) ⊗ of Hermite Polys. ... ... ... ...
  24. 18/45 The determinantal distributions Definition Let κ be a kernel

    s.t. X κ(x, x)dω(x) < +∞. The function pκ(x1, . . . , xN) ∝ Det κ (x) = Det    κ(x1, x1) . . . κ(x1, xN) . . . ... . . . κ(xN, x1) . . . κ(xN, xN)    is a p.d.f. on XN w.r.t. ω⊗N.
  25. 18/45 The determinantal distributions Definition Let κ be a kernel

    s.t. X κ(x, x)dω(x) < +∞. The function pκ(x1, . . . , xN) ∝ Det κ (x) = Det    κ(x1, x1) . . . κ(x1, xN) . . . ... . . . κ(xN, x1) . . . κ(xN, xN)    is a p.d.f. on XN w.r.t. ω⊗N. Theorem (Belhadji et al. (2020)) For (x1, . . . , xN) ∼ pκ, the set x := {x1, . . . , xN} follows the distribution of a mixture of DPPs.
  26. 19/45 Main results Theorem (Belhadji, Bardenet and Chainais (2019)) Under

    the determinantal distribution corresponding to κ(x, y) = K(x, y) := n∈[N] en(x)en(y), x follows the distribution of a (projection) DPP, and we have Epκ sup g ω≤1 µg − N i=1 ˆ wi (µg )k(xi , .) 2 F ≤ 4N2rN+1, where rN+1 = +∞ m=N+1 σm. σm Theoretical rate (N2rN+1 ) Empirical rate m−2s N3O(σN+1 ) O(σN+1 ) αm N2O(σN+1 ) O(σN+1 ) ≈ O(σN+1 )
  27. 20/45 Main results Theorem (Belhadji (2021)) Under the determinantal distribution

    corresponding to κ(x, y) = K(x, y) := n∈[N] en(x)en(y), x follows the distribution of a (projection) DPP, and we have ∀g ∈ L2(ω), Epκ µg − N i=1 ˆ wi (µg )k(xi , .) 2 F ≤4 g 2 ω rN+1 =O(rN+1) where rN+1 := +∞ m=N+1 σm.
  28. 21/45 Main results Theorem (Belhadji, Bardenet and Chainais (2020)) Under

    the determinantal distribution corresponding to κ = k, we have ∀g ∈ L2(ω), Epκ µg − N i=1 ˆ wi (µg )k(xi , .) 2 F = +∞ m=1 g, em 2 ω m(N) ≤ g 2 ω 1(N) where m(N) = T⊂N∗ {m} |T|:=N t∈T σt T⊂N∗ |T|=N t∈T σt = O(σN+1) optimal rate (Pinkus (1985))
  29. 22/45 The optimal kernel quadrature and kernel interpolation The mixture

    corresponding to OKQ is an interpolant The function ˆ µg := N i=1 ˆ wi (µg )k(xi , .) satisfies ∀i ∈ [N], ˆ µg (xi ) = µg (xi ). 0.0 0.2 0.4 0.6 0.8 1.0 x 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 g (x) g (x) g k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 g (x) g (x) g k(xn ,.)
  30. 23/45 Main results Theorem (Belhadji, Bardenet and Chainais (2020)) Under

    the determinantal distribution corresponding to κ = k, for r ∈ [0, 1/2], we have ∀f ∈ Σ1/2+r L2(ω), Epκ f − N i=1 ˆ wi (f )k(xi , .) 2 F = O(σN+1 2r ) r = 0 → a generic element of F r = 1/2 → an embedding of some g ∈ L2(ω) B L2(ω) (0, 1) Σ1/2B L2(ω) (0, 1) BF (0, 1) Σ1/2+r B L2(ω) (0, 1) ΣB L2(ω) (0, 1)
  31. 24/45 Main results The RKHS norm . F is strong:

    f ∞ ≤ sup x∈X k(x, x) f F We seek convergence guarantees in a weaker norm such as . ω
  32. 25/45 Main results Definition: least squares approximation Given x =

    {x1, . . . , xN} ⊂ X, the least squares approximation of f ∈ F associated to x is the function ˆ fLS,x defined by f − ˆ fLS,x ω = min ˆ f ∈T (x) f − ˆ f ω, where T (x) := Span(k(x1, .), . . . , k(xN, .)).
  33. 25/45 Main results Definition: least squares approximation Given x =

    {x1, . . . , xN} ⊂ X, the least squares approximation of f ∈ F associated to x is the function ˆ fLS,x defined by f − ˆ fLS,x ω = min ˆ f ∈T (x) f − ˆ f ω, where T (x) := Span(k(x1, .), . . . , k(xN, .)). = Definition: optimal kernel approximation Given x = {x1, . . . , xN} ⊂ X, the optimal kernel approximation of f ∈ F associated to x is the function ˆ fOKA,x defined by f − ˆ fOKA,x F = min ˆ f ∈T (x) f − ˆ f F .
  34. 26/45 Main results Theorem (Belhadji, Bardenet and Chainais (2023)) Under

    the determinantal distribution corresponding to κ = K, we have ∀f ∈ F, Epκ f − ˆ fLS,x 2 ω ≤ 2 f − fN 2 ω O(σN+1) + fN 2 ω +∞ m=N+1 σ2 m O(σN+1rN+1) superconvergence where rN+1 = +∞ m=N+1 σm.
  35. 26/45 Main results Theorem (Belhadji, Bardenet and Chainais (2023)) Under

    the determinantal distribution corresponding to κ = K, we have ∀f ∈ F, Epκ f − ˆ fLS,x 2 ω ≤ 2 f − fN 2 ω O(σN+1) + fN 2 ω +∞ m=N+1 σ2 m O(σN+1rN+1) superconvergence where rN+1 = +∞ m=N+1 σm. The computation of ˆ fLS,x requires the values of µf (x1), . . . , µf (xN) not f (x1), . . . , f (xN).
  36. 27/45 The empirical least squares approximation Definition (Cohen, Davenport and

    Leviatan (2013)) Let x ∈ XN and q : X → R∗ + . Consider the so-called empirical semi-norm . q,x defined on L2(ω) by h 2 q,x := 1 N N i=1 q(xi )h(xi )2. The empirical least squares approximation is defined as ˆ fELS,M,x := arg min ˆ f ∈EM f − ˆ f 2 q,x , where EM := Span(e1, . . . , eM).
  37. 28/45 Main results When M = N, the function ˆ

    fELS,M,x does not depend on q. Theorem (Belhadji, Bardenet and Chainais (2023)) Consider N, M ∈ N∗ such that M ≤ N. Let f ∈ F, and define ˆ ftELS,M,x := M m=1 ˆ fELS,N,x , em ωem. Under the determinantal distribution corresponding to κ = K (of cardinality N), we have Epκ f − ˆ ftELS,M,x 2 ω = f − fM 2 ω + M f − fN 2 ω . In particular, Epκ f − ˆ ftELS,M,x 2 ω ≤ (1 + M) f − fM 2 ω ’Instance Optimal Property’ (IOP) .
  38. 29/45 Main results Some remarks: Epκ f − ˆ ftELS,N,x

    2 ω ≤ (1 + N) f − fN 2 ω = O(NσN+1) In general ˆ ftELS,M,x = ˆ fELS,M,x The IOP was proved for ˆ fELS,M,x under Christoffel sampling9: x1, . . . , xN ∼ i.i.d. 1 M M m=1 em(x)2dω(x) conditioned on the event { G − I op ≥ 1/2}, where G := ( ei , ej q,x )i,j∈[M] is the Gramian matrix of the family (ej )j∈[M] associated to the empirical scalar product ., . q,x 9Cohen, A. and Migliorati, G., 2017. Optimal weighted least-squares methods. The SMAI journal of computational mathematics, 3, pp.181-203.
  39. 30/45 A projection DPP as an extension of Christoffel sampling

    The determinantal distribution corresponding associated to the kernel κ(x, y) := 1 M M m=1 em(x)em(y) is a natural extension of Christoffel sampling 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f5 hn 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f10 hn Figure: Histograms of 50000 realizations of the projection DPP associated to the first N Hermite polynomials, compared to the inverse of Christoffel function, and the nodes of the Gaussian quadrature with N ∈ {5, 10}
  40. 31/45 Numerical simulations We report the empirical expectation of a

    surrogate of the worst interpolation error Eκ sup f =µg g ω≤1 f − ˆ fOKA,x 2 F ≈ Eκ sup f =µg g∈G f − ˆ fOKA,x 2 F ; κ = K where G ⊂ {g, g ω ≤ 1} is a finite set |G| = 5000. F is the periodic Sobolev space of order s = 3. 10 20 30 40 50 N 8 7 6 5 4 3 2 1 log10 (Squared error) DPPKQ LVSQ ( =0) LVSQ ( =0.01) LVSQ ( =0.1) UGKQ N+1
  41. 32/45 Numerical simulations F = Korobov space of order s

    = 1, X = [0, 1]2 We report m(N) = EK f − ˆ fOKA,x 2 F ; κ = K where f = µem 10 100 1000 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 (Squared error) 1 (N) 2 (N) 3 (N) 4 (N) 5 (N) 6 (N) 7 (N) 8 (N) 11 (N) 12 (N) 21 (N) 22 (N) N+1 10 100 1000 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 (Squared error) 1 (N) 2 (N) 3 (N) 4 (N) 5 (N) 6 (N) 7 (N) 8 (N) 11 (N) 12 (N) 21 (N) 22 (N) N+1 N 2s/d Figure: OKQ using DPPs (left) vs OKQ using the uniform grid (right)
  42. 33/45 Numerical simulations F = the periodic Sobolev space of

    order s = 1, X = [0, 1] We report Eκ f − ˆ fLS,x 2 ω ; κ = K where f = M m=1 ξmeω m ; ξ1, . . . , ξM ∼ i.i.d N(0, 1) 10 20 30 40 50 N 5 4 3 2 1 log10 (Squared error) m {1, ,50} median N+1 2 N+1 10 20 30 40 50 N 5 4 3 2 1 log10 (Squared error) m {1, ,50} median N+1 2 N+1 Figure: M = 10 (left) and M = 20 (right)
  43. 34/45 Numerical simulations Consider F to be the RKHS defined

    by the Sinc kernel k(x, y) = sin(F(x − y)) F(x − y) ; X = [−T/2, T/2] The eigenfunctions em correspond to the prolate spheroidal wave functions10 (PSWF) The asymptotics of the eigenvalues in the limit c := TF → +∞ were investigated11: there is approximately c eigenvalues close to 1, and the remaining eigenvalues decrease to 0 at an exponential rate. 10D. Slepian and H. O. Pollak. Prolate spheroidal wave functions, fourier analysis and uncertainty— i. Bell System Technical Journal, 40(1):43 63, 1961. 11H. J. Landau and H. Widom. Eigenvalue distribution of time and frequency limiting. Journal of Mathematical Analysis and Applications, 77(2):469 481, 1980.
  44. 35/45 Numerical simulations We report f − ˆ fLS,x 2

    ω averaged over 50 realizations for f ∈ {e1, e2, e3, e4}. 5 10 20 25 30 35 N 10 11 10 9 10 7 10 5 10 3 10 1 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 9 10 7 10 5 10 3 10 1 101 103 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 10 10 8 10 6 10 4 10 2 100 102 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 9 10 7 10 5 10 3 10 1 101 103 105 Squared error T×F DPP PSWF DPP Legendre ChS
  45. 36/45 Conclusion Take Home Messages The theoretical study of the

    optimal kernel quadrature under determinantal sampling The study of function reconstruction under determinantal sampling The analysis is universal Empirical validation on various RKHSs Projection DPPs are natural extensions of Christoffel sampling, and yield better empirical results RKHS DPPs ∃ functional space point processes
  46. 37/45 Perspectives: extending the theoretical results? The study of E

    f − ˆ fELS,M,x 2 ω when dimension M = N number of nodes E sup f F ≤1 f − ˆ fLS,x 2 ω instead of E f − ˆ fLS,x 2 ω ? E sup f F ≤1 f − ˆ fOKA,x 2 ω and E f − ˆ fOKA,x 2 ω ? High order moments? ...
  47. 38/45 Perspectives: efficient sampling in continuous domain? Let x =

    {x1, . . . , xN} such that Det κ(x) > 0. We have Det κ(x) =κ(x1, x1) × κ(x2, x2) − κ(x1, x2)2 κ(x1, x1) . . . × κ(x , x ) − φx −1 (x )T κ(x −1)−1φx −1 (x ) . . . , × κ(xN, xN) − φxN−1 (xN)T κ(xN−1)−1φxN−1 (xN) where φx −1 (x) = (κ(ξ, x))T ξ∈x −1 ∈ R −1, x −1 = {x1, . . . , x −1}.
  48. 39/45 Perspectives: efficient sampling in continuous domain? Define pκ,1(x) =

    κ(x, x), pκ, (x) = κ(x, x) − φx −1 (x) κ(x −1)−1φx −1 (x); ≥ 2 If κ is a projection kernel X pκ, (x)dω(x) = N − + 1, and pκ(x) := 1 N! Det κ(x) = N =1 1 N − + 1 pκ, (x) and the sequential algorithm is exact (the HKPV algorithm).
  49. 40/45 Perspectives: efficient sampling in continuous domain? Example: to conduct

    Christoffel sampling for Legendre polynomials we can use the Bernstein’s bound12 ∀n ∈ N∗, Ln(x)2 ≤ 2 π 1 √ 1 − x2 =⇒ pκ,1(x) ≤ 2 π 1 √ 1 − x2 , and use rejection sampling (proposal = Beta distribution). 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x 1 2 3 4 f ,1 (x) (N = 5) Bernstein's bound 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 5 6 f ,1 (x) (N = 10) Bernstein's bound 12Lorch, L. (1983). “Alternative proof of a sharpened form of Bernstein’s inequality for Legendre polynomials”. In: Applicable Analysis 14.3, pp. 237–240.
  50. 41/45 Perspectives: efficient sampling in continuous domain? One sample from

    a projection DPP (of cardinality N) requires to draw in average N2 samples from the Christoffel distribution as a proposal: sampling is getting harder as N get bigger 0.0 0.2 0.4 0.6 0.8 1.0 x 4.1 4.2 4.3 4.4 4.5 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x)
  51. 42/45 Perspectives: efficient sampling in continuous domain? Sampling from a

    projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.
  52. 42/45 Perspectives: efficient sampling in continuous domain? Sampling from a

    projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? We may Look for upper bounds or asymptotics of the inverse of Christoffel function in some RKHS on general domains 13 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.
  53. 42/45 Perspectives: efficient sampling in continuous domain? Sampling from a

    projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? We may Look for upper bounds or asymptotics of the inverse of Christoffel function in some RKHS on general domains 13 Work with continuous volume sampling → no need to spectral decomposition14 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.
  54. 43/45 Perspectives: extension to atomic measure reconstruction? Given a class

    of objects M, is it possible to approximate the elements of M using its evaluation on some functionals: L1(µ), . . . , LN(µ) −→ ˆ µ ≈ µ; µ ∈ M The objects Functions Atomic measures The class M RKHS not a Hilbert space Functionals L1, . . . , LN µ → µ(xj ) µ → eiωT j x dµ(x) Distance preserving P. IOP RIP 15 Decoding ˆ fLS,x , ˆ fOKA,x , ... CL-OMP, Mean-shift 16 15Belhadji, A. and Gribonval, R., 2022. Revisiting RIP guarantees for sketching operators on mixture models. 16Belhadji, A. and Gribonval, R., 2022. Sketch and shift: a robust decoder for compressive clustering
  55. 45/45 References A. Belhadji, R. Bardenet, and P. Chainais Kernel

    quadrature with DPPs. NeurIPS, 2019. A. Belhadji, R. Bardenet, and P. Chainais Kernel interpolation with continuous volume sampling ICML, 2020. A. Belhadji An analysis of Ermakov-Zolotukhin quadrature using kernels. NeurIPS, 2021. A. Belhadji, R. Bardenet, and P. Chainais Signal reconstruction using determinantal sampling arXiv:2310.09437.
  56. 1/20 Numerical simulations F = the Gaussian space, X =

    R We report m(N) = EK f − ˆ fOKA,x 2 F ; κ = K where f = µm 0 10 20 30 40 50 N 10 8 6 4 2 0 log10 (Squared error) DPPKQ DPPKQ (UB) MCKQ SBQ MC N 0 10 20 30 40 50 N 10 8 6 4 2 0 log10 (Squared error) DPPKQ DPPKQ (UB) MCKQ SBQ MC N Figure: The squared interpolation error for e1 (Left), vs e15 (Right).
  57. 2/20 Perspectives: efficient sampling in continuous domain? If κ =

    k, the sequential algorithm is an approximation Theorem (Rezaei and Gharan (2019)) Let x the output of the sequential algorithm for κ = k, then x follows the density fseq that satisfies fseq(x) ≤ N!2fk(x). An MCMC algorithm for CVS [Rezaei and Gharan (2019)] CVS is the stationary distribution of a Markov chain that can be implemented in a fully kernelized way: using only the evaluations of the kernel k. fseq is the initialization of the Markov Chain.
  58. 3/20 Numerical simulations F = the periodic Sobolev space of

    order s = 2, X = [0, 1] We report m(N) = Eκ f − ˆ fOKA,x 2 F ; f = µem for m ∈ {1, 5, 7} under CVS (κ = k). 5 10 20 N 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 log10 ( VS ( em ;x)2 ) m = 1 (th) m = 5 (th) m = 7 (th) m = 1 (emp) m = 5 (emp) m = 7 (emp)
  59. 4/20 Kernel quadrature: examples Consider X = [0, 1], s

    ∈ N∗ and ks(x, y) := 1 + (−1)s−1(2π)2s (2s)! B2s({x − y}), where {x − y} is the fractional part of x − y, and B2s is the Bernoulli polynomial of degree 2s: B0(x) = 1, B2(x) = x2 − x + 1 6 , B4(x) = x4 − 2x3 + x2 − 1 30 . . . 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5
  60. 5/20 Kernel quadrature: examples The corresponding RKHS is the periodic

    Sobolev space of order s: Fs = f ∈ L2([0, 1]), f (0) = f (1), f , f , . . . , f (s) ∈ L2([0, 1]) , and the corresponding norm is the Sobolev norm: f 2 Fs = 1 0 f (x)dx 2 + m∈N∗ m2s 1 0 f (x)e−2πimx dx 2 . · · · ⊂ F4 ⊂ F3 ⊂ F2 ⊂ F1 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5
  61. 6/20 Kernel quadrature: an example The kernel ks satisfies the

    following identity [Wahba 90] ks(x, y) = 1 + m∈N∗ 1 m2s cos(2πm(x − y)) it is equivalent to the Mercer decomposition with σm = O(m−2s) and (em)m∈N∗ is the Fourier family 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5
  62. 7/20 Kernel quadrature: Korobov space An extension to [0, 1]d

    is possible via tensorization: kd,s(x, y) = δ∈[d] ks(xδ, yδ) = u∈Nd δ∈[d] σuδ δ∈[d] euδ (xδ) δ∈[d] euδ (yδ) The eigenvalue The multiplicity 1 3d 1/22s d.(d + 1) 1/62s d(3d − 1) ... ...
  63. 8/20 Kernel quadrature: Korobov space We have σN+1 = O

    (log N)2s(d−1)N−2s [Bach 2017]. 0.0 0.5 1.0 1.5 2.0 2.5 log10 (N) 8 7 6 5 4 3 2 1 0 N+1 (logN)2s(d 1)N 2s 0.0 0.5 1.0 1.5 2.0 2.5 log10 (N) 4 3 2 1 0 N+1 (logN)2s(d 1)N 2s Figure: (Left): d = 2 and s = 2, (Right): d = 3 and s = 2. Compare it to QMC rates The sequence s The rate Halton 1 O (log N)2d N−2 Hammersley 1 O (log N)2(d−1)N−2 Higher order digital nets ∈ N∗ O (log N)2sd N−2s
  64. 9/20 Numerical simulations (EZQ vs OKQ) Let F be the

    RKHS corresponding to the Sobolev space of periodic functions of order s = 3, g ∈ {φ1, φ10, φ20}. 10 20 30 50 100 log10 (N) 10 9 8 7 6 5 4 3 log10 (Squared error) EZQ 1 (N) EZQ 10 (N) EZQ 20 (N) rN+1 N+1 10 20 30 50 100 log10 (N) 10 9 8 7 6 5 4 3 log10 (Squared error) OKQ 1 (N) OKQ 10 (N) OKQ 20 (N) rN+1 N+1 Squared worst-case integration error vs. number of nodes N for EZQ (left) and OKQ (right) in F.
  65. 10/20 Main results: a tractable formula under volume sampling The

    theoretical guarantee in the case κ = k is given in the following result. Theorem (Belhadji, Bardenet and Chainais (2020)) Let g = m∈N∗ g, em ωem then Ek µg − ΠT (x) µg 2 F = m∈N∗ g, em 2 ω m(N), m(N) = Ek µem − ΠT (x) µem 2 F = σm |T|=N,m/ ∈T t∈T σt |T|=N t∈T σt . How good is it?
  66. 11/20 Main results: how large are the epsilons? Theorem (Belhadji,

    Bardenet and Chainais (2020)) If there exists B > 0 such that min M∈[N] m≥M σm (N − M + 1)σN+1 ≤ B. Then sup g ω≤1 Ek µg − ΠT (x) µg 2 F = 1(N) ≤ (1 + B)σN+1. Examples: σN B N−2s (1 + 1/(2s − 1))2s αN α/(1 − α)
  67. 12/20 Main results: how large are the epsilons? 5 10

    20 40 N 9 8 7 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N Figure: log10 m (N) as a function of N when σN = N−2s, with s = 3.
  68. 13/20 Main results: how large are the epsilons? 5 10

    20 40 N 3.0 2.5 2.0 1.5 1.0 0.5 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 20 40 N 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 20 40 N 9 8 7 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 12 10 8 6 4 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 6 5 4 3 2 1 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N Figure: Other examples in different RKHSs.
  69. 14/20 Intuitions Observe that Eκ µg − ΠT (x) µg

    2 F = Eκ Ox Σg 2 F = Eκ Ox ΣNg + Ox Σ⊥ N g 2 F ≤ 2 Eκ Ox ΣNg 2 F + Eκ Ox Σ⊥ N g 2 F where Ox = IF − ΠT (x) = ΠT (x)⊥ , ΣN = N m=1 σmem ⊗ em, Σ⊥ N = +∞ m=N+1 σmem ⊗ em.
  70. 14/20 Intuitions Observe that Eκ µg − ΠT (x) µg

    2 F = Eκ Ox Σg 2 F = Eκ Ox ΣNg + Ox Σ⊥ N g 2 F ≤ 2 Eκ Ox ΣNg 2 F + Eκ Ox Σ⊥ N g 2 F where Ox = IF − ΠT (x) = ΠT (x)⊥ , ΣN = N m=1 σmem ⊗ em, Σ⊥ N = +∞ m=N+1 σmem ⊗ em. Ox = ΠT (x)⊥ is an orthogonal projection, then Ox Σ⊥ N g 2 F ≤ Σ⊥ N g 2 F = m≥N+1 σm g, em 2 ω ≤ σN+1 g 2 ω .
  71. 15/20 Intuitions Let m ∈ N∗ and put g =

    em Ox ΣNem 2 F = σm Ox eF m 2 F = σm eF m − ΠT (x) eF m 2 F The error term is the product of two terms: the eigenvalue σm the reconstruction term eF m − ΠT (x) eF m 2 F ∈ [0, 1]
  72. 16/20 Intuitions σm eF m − ΠT (x) eF m

    2 F = σ(1 − ΠT (x) eF m 2 F ) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2 1 0 1 2 3 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2 1 0 1 2 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 em (x) em k(xn ,.)
  73. 17/20 Intuitions Theorem Under the distribution of CVS κ =

    k, we have ∀m ∈ N∗, Ek ΠT (x) eF m 2 F = |T|=N,m∈T t∈T σt |T|=N t∈T σt , and ∀m = m ∈ N∗, Ek ΠT (x) eF m , ΠT (x) eF m F = 0.
  74. 18/20 Intuitions EkτF m (x) := Ek ΠT (x) eF

    m 2 F = |T|=N,m∈T t∈T σt |T|=N t∈T σt . 1 2 3 4 5 6 7 8 9 N 0.2 0.4 0.6 0.8 1.0 VS m (x) m = 1 m = 2 m = 3 m = 4 m = 5 Under CVS, T (x) gets closer to EN = Span(eF m )m∈[N] as N → +∞
  75. 19/20 Intuitions Alternatively, we can quantify the proximity between the

    subspaces T (x) and EF N using the principal angles (θi (T (x), EF N ))i∈[N] . θN(T (x), EF N ) EF N = Span(eF m )m∈[N] T (x) = Span k(xn, .)n∈[N] For example, we have sup m∈[N] eF m − ΠT (x) eF m 2 F ≤ 1 cos2 θN(T (x), EF N ) − 1.
  76. 20/20 The determinantal distributions: link with DPPs Theorem (Belhadji, Bardenet

    and Chainais (2020)) For U ⊂ N∗ define the kernel KU(x, y) = u∈U eu(x)eu(y). We have fk(x1, . . . , xN) = |U|=N u∈U σu |W |=N w∈W σw 1 N! Det(KU(x)). The largest weight in the mixture corresponds to U = [N]: K(x, y) = m∈[N] em(x)em(y).