Ayoub Belhadji

Signal Reconstruction using Determinantal Sampling Ayoub Belhadji OCKHAM team, LIP,
ENS de Lyon Univ Lyon, Inria, CNRS, UCBL Joint work with R´ emi Bardenet and Pierre Chainais Centrale Lille, CRIStAL, Universit´ e de Lille, CNRS 07/12/2023 L2S - CentraleSup´ elec 1/45

2/45 Introduction A determinantal point process (DPP) is a distribution
over subsets of some set X, I, . . . x1 x2 x3 x4 X I

2/45 Introduction A determinantal point process (DPP) is a distribution
over subsets of some set X, I, . . . x1 x2 x3 x4 X I ...with the negative correlation property: ∀B, B ⊂ X, B ∩ B = ∅ =⇒ Cov(nx (B), nx (B )) ≤ 0, where nx (B) := |B ∩ x|

3/45 Introduction A DPP on [0, 1]2 0 1 1
0 1 1 0 1 1 i.i.d. particles on [0, 1]2 0 1 1 0 1 1 0 1 1

4/45 Introduction A DPP on the unit circle 1.0 0.5
0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 i.i.d. particles on the unit circle 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

5/45 Introduction Early appearances of DPPs may be traced back
to the work of Dyson (1962) 1 and Ginibre (1965)2 A universal generic (X, ω) deﬁnition is given in the work of Macchi (1975)3 1Dyson, F.J., 1962. Statistical theory of the energy levels of complex systems. I. Journal of Mathematical Physics, 3(1), pp.140-156. 2Ginibre, J., 1965. Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), pp.440-449. 3Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

5/45 Introduction Early appearances of DPPs may be traced back
to the work of Dyson (1962) 1 and Ginibre (1965)2 A universal generic (X, ω) definition is given in the work of Macchi (1975)3 Definition (informal): Determinantal Point Process Given a metric space X and a measure ω, a DPP satisfies PDPP ∃ at least k points one in each Bi , i = 1, . . . , k = B1×···×Bk Det κ(x1, . . . , xk ) kernel matrix dω(x1 ) . . . dω(xk ) for a kernel κ : X × X → R. 1Dyson, F.J., 1962. Statistical theory of the energy levels of complex systems. I. Journal of Mathematical Physics, 3(1), pp.140-156. 2Ginibre, J., 1965. Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), pp.440-449. 3Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

6/45 Introduction Sampling using DPPs until 2019... Discrete Continuous learning
on budget 4 numerical integration node selection in a graph 5 X ⊂ R 6 feature selection universal results 7 X = [0, 1]d speciﬁc results 8 4[Deshpande et al. (2006)],[Derezinski et al. (2017,2018,2019)] 5[Tremblay et al. (2017)] 6[Lambert (2018)] 7[Belhadji et al. (2018)] 8[Bardenet and Hardy (2016)]

6/45 Introduction Sampling using DPPs until 2019... Discrete Continuous learning
on budget 4 numerical integration node selection in a graph 5 X ⊂ R 6 feature selection universal results 7 X = [0, 1]d speciﬁc results 8 universal results for DPP-based sampling in continuous domain? 4[Deshpande et al. (2006)],[Derezinski et al. (2017,2018,2019)] 5[Tremblay et al. (2017)] 6[Lambert (2018)] 7[Belhadji et al. (2018)] 8[Bardenet and Hardy (2016)]

7/45 Outline 1 The setting 2 DPPs for numerical integration
in RKHSs 3 Beyond numerical integration 4 Numerical simulations 5 Perspectives

8/45 Towards a universal construction of quadrature rules DPP based
quadrature QMC Weyl criterion Gaussian quadrature OPE Kernel quadrature Vanilla Monte Carlo ? CLTs

8/45 Towards a universal construction of quadrature rules DPP based
quadrature QMC Weyl criterion Gaussian quadrature OPE Kernel quadrature Vanilla Monte Carlo ? CLTs A universal construction of quadrature rules using DPPs?

9/45 Kernel-based analysis of quadrature rules The crux of kernel-based
analysis of a quadrature rule is the study of the worst case error on the unit ball of a RKHS F sup f F ≤1 X f (x)g(x)dω(x) integral − i∈[N] wi f (xi ) quadrature rule L2 (ω) F

10/45 Reproducing kernel Hilbert spaces Deﬁnition An RKHS F is
a Hilbert space associated to a kernel k, satisfying: ∀x ∈ X, f → f (x) is continuous ∀(x, f ) ∈ X × F, f , k(x, .) F = f (x) Example: X = [0, 1] and k1(x, y) := 1 − 2π2B2({x − y}) where {x − y} is the fractional part of x − y, and B2(x) = x2 − x + 1 6 . 0 1 x 0 1 2 3 4 k1 (x0 ,x)

11/45 Embeddings as elements of the RKHS F contains smooth
functions Deﬁnition: an embedding of an element of L2(ω) Given g ∈ L2(ω), the embedding of g is deﬁned by µg = Σg := X k(x, .)g(x)dω(x) ∈ F Σ : L2(ω) → L2(ω) = integration operator associated to (k, ω). 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 g g (s=1) 0.0 0.2 0.4 0.6 0.8 1.0 x 1.8 2.0 2.2 2.4 2.6 2.8 3.0 g g (s=1) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 g g (s=1)

12/45 Embeddings and quadrature rules Properties The reproducibility of integrals
∀f ∈ F, f , µg F = X f (x)g(x)dω(x). The worst integration error on the unit ball of F: sup f F ≤1 X f (x)g(x)dω(x) integral − N i=1 wi f (xi ) quadrature rule = µg − N i=1 wi k(xi , .) F WCE The study of quadrature rules boils down to the study of kernel approximations of embeddings

13/45 Embeddings and quadrature rules A sanity check using the
’Monte Carlo quadrature’ Let x1, . . . , xN = i.i.d. ∼ ω, wi = 1/N. Under some assumptions on k, we have E µg − N i=1 1 N k(xi , .) 2 F = O(1/N). We recover the ’Monte Carlo rate’ O(1/N)

13/45 Embeddings and quadrature rules A sanity check using the
’Monte Carlo quadrature’ Let x1, . . . , xN = i.i.d. ∼ ω, wi = 1/N. Under some assumptions on k, we have E µg − N i=1 1 N k(xi , .) 2 F = O(1/N). We recover the ’Monte Carlo rate’ O(1/N) Can we do better?

14/45 The optimal kernel quadrature Deﬁnition: the optimal kernel quadrature
Given a set of nodes x = {x1, . . . , xN} s.t. K(x) :=    k(x1, x1) . . . k(x1, xN) . . . ... . . . k(xN, x1) . . . k(xN, xN)    is non-singular, the optimal kernel quadrature is the couple (x, ˆ w) such that µg − N i=1 ˆ wi (µg )k(xi , .) F = min w∈RN µg − N i=1 wi k(xi , .) F

15/45 The convergence of the optimal kernel quadrature The study
of the convergence rate of the optimal kernel quadrature was carried out in several works X F or k x The rate Reference [0, 1] Sobolev S. Unif. grid O(N−2s ) [Novak et al., 2015] (g is cos or sin) [Bojanov , 1981] [0, 1]d ⊗ Sobolev S. QMC seq. QMC rates [Briol et al, 2019] (g is constant) [0, 1]d ⊗ Sobolev S. - O(N−2s/d ) [Wendland, 2004] Rd Gaussian ⊗ Hermite roots O(exp(−αN)) [Karvonen et al., 2019] (+ assumptions) ... ... ... ... ... Limitation The analysis is too speciﬁc to the RKHS F, to x, to g... Any universal analysis of the OKQ?

16/45 A spectral characterization of the RKHS What is common
among existing results? Assumptions There exists a spectral decomposition (em, σm)m∈N∗ of Σ where (em)m∈N∗ is an o.n.b. of L2(ω) and σ1 ≥ σ2 ≥ ... > 0 s.t. m∈N∗ σm < +∞ The RKHS F is dense in L2(ω)

16/45 A spectral characterization of the RKHS What is common
among existing results? Assumptions There exists a spectral decomposition (em, σm)m∈N∗ of Σ where (em)m∈N∗ is an o.n.b. of L2(ω) and σ1 ≥ σ2 ≥ ... > 0 s.t. m∈N∗ σm < +∞ The RKHS F is dense in L2(ω) f 2 F = m∈N∗ f , em 2 ω σm =⇒ { f F = 1} is an ellipsoid in L2 (ω) m∈N∗ µg , em 2 ω σ2 m = m∈N∗ g, em 2 ω =⇒ {µg ; g ω = 1} is an ellipsoid in L2 (ω)

17/45 A spectral characterization of the RKHS Σ1/2 Σ1/2 Σ
L2(ω) F = Σ1/2 L2(ω) ΣL2(ω) B L2(ω) (0, 1) BF (0, 1) ΣB L2(ω) (0, 1) e1 e2 eF 1 eF 2 µe1 µe2 X F or k σN+1 (em) [0, 1] Sobolev O(N−2s) Fourier [0, 1]d Korobov O(log(N)2s(d−1)N−2s) ⊗ of Fourier [0, 1]d Sobolev O(N−2s/d ) ”Fourier” Sd Dot product ”-” Spherical Harmonics R Gaussian O(e−αN) Hermite Polys. Rd Gaussian O(e−αdN1/d ) ⊗ of Hermite Polys. ... ... ... ...

18/45 The determinantal distributions Deﬁnition Let κ be a kernel
s.t. X κ(x, x)dω(x) < +∞. The function pκ(x1, . . . , xN) ∝ Det κ (x) = Det    κ(x1, x1) . . . κ(x1, xN) . . . ... . . . κ(xN, x1) . . . κ(xN, xN)    is a p.d.f. on XN w.r.t. ω⊗N.

18/45 The determinantal distributions Deﬁnition Let κ be a kernel
s.t. X κ(x, x)dω(x) < +∞. The function pκ(x1, . . . , xN) ∝ Det κ (x) = Det    κ(x1, x1) . . . κ(x1, xN) . . . ... . . . κ(xN, x1) . . . κ(xN, xN)    is a p.d.f. on XN w.r.t. ω⊗N. Theorem (Belhadji et al. (2020)) For (x1, . . . , xN) ∼ pκ, the set x := {x1, . . . , xN} follows the distribution of a mixture of DPPs.

19/45 Main results Theorem (Belhadji, Bardenet and Chainais (2019)) Under
the determinantal distribution corresponding to κ(x, y) = K(x, y) := n∈[N] en(x)en(y), x follows the distribution of a (projection) DPP, and we have Epκ sup g ω≤1 µg − N i=1 ˆ wi (µg )k(xi , .) 2 F ≤ 4N2rN+1, where rN+1 = +∞ m=N+1 σm. σm Theoretical rate (N2rN+1 ) Empirical rate m−2s N3O(σN+1 ) O(σN+1 ) αm N2O(σN+1 ) O(σN+1 ) ≈ O(σN+1 )

20/45 Main results Theorem (Belhadji (2021)) Under the determinantal distribution
corresponding to κ(x, y) = K(x, y) := n∈[N] en(x)en(y), x follows the distribution of a (projection) DPP, and we have ∀g ∈ L2(ω), Epκ µg − N i=1 ˆ wi (µg )k(xi , .) 2 F ≤4 g 2 ω rN+1 =O(rN+1) where rN+1 := +∞ m=N+1 σm.

the determinantal distribution corresponding to κ = k, we have ∀g ∈ L2(ω), Epκ µg − N i=1 ˆ wi (µg )k(xi , .) 2 F = +∞ m=1 g, em 2 ω m(N) ≤ g 2 ω 1(N) where m(N) = T⊂N∗ {m} |T|:=N t∈T σt T⊂N∗ |T|=N t∈T σt = O(σN+1) optimal rate (Pinkus (1985))

22/45 The optimal kernel quadrature and kernel interpolation The mixture
corresponding to OKQ is an interpolant The function ˆ µg := N i=1 ˆ wi (µg )k(xi , .) satisﬁes ∀i ∈ [N], ˆ µg (xi ) = µg (xi ). 0.0 0.2 0.4 0.6 0.8 1.0 x 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 g (x) g (x) g k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 g (x) g (x) g k(xn ,.)

the determinantal distribution corresponding to κ = k, for r ∈ [0, 1/2], we have ∀f ∈ Σ1/2+r L2(ω), Epκ f − N i=1 ˆ wi (f )k(xi , .) 2 F = O(σN+1 2r ) r = 0 → a generic element of F r = 1/2 → an embedding of some g ∈ L2(ω) B L2(ω) (0, 1) Σ1/2B L2(ω) (0, 1) BF (0, 1) Σ1/2+r B L2(ω) (0, 1) ΣB L2(ω) (0, 1)

24/45 Main results The RKHS norm . F is strong:
f ∞ ≤ sup x∈X k(x, x) f F We seek convergence guarantees in a weaker norm such as . ω

25/45 Main results Deﬁnition: least squares approximation Given x =
{x1, . . . , xN} ⊂ X, the least squares approximation of f ∈ F associated to x is the function ˆ fLS,x deﬁned by f − ˆ fLS,x ω = min ˆ f ∈T (x) f − ˆ f ω, where T (x) := Span(k(x1, .), . . . , k(xN, .)).

25/45 Main results Definition: least squares approximation Given x =
{x1, . . . , xN} ⊂ X, the least squares approximation of f ∈ F associated to x is the function ˆ fLS,x defined by f − ˆ fLS,x ω = min ˆ f ∈T (x) f − ˆ f ω, where T (x) := Span(k(x1, .), . . . , k(xN, .)). = Definition: optimal kernel approximation Given x = {x1, . . . , xN} ⊂ X, the optimal kernel approximation of f ∈ F associated to x is the function ˆ fOKA,x defined by f − ˆ fOKA,x F = min ˆ f ∈T (x) f − ˆ f F .

the determinantal distribution corresponding to κ = K, we have ∀f ∈ F, Epκ f − ˆ fLS,x 2 ω ≤ 2 f − fN 2 ω O(σN+1) + fN 2 ω +∞ m=N+1 σ2 m O(σN+1rN+1) superconvergence where rN+1 = +∞ m=N+1 σm.

the determinantal distribution corresponding to κ = K, we have ∀f ∈ F, Epκ f − ˆ fLS,x 2 ω ≤ 2 f − fN 2 ω O(σN+1) + fN 2 ω +∞ m=N+1 σ2 m O(σN+1rN+1) superconvergence where rN+1 = +∞ m=N+1 σm. The computation of ˆ fLS,x requires the values of µf (x1), . . . , µf (xN) not f (x1), . . . , f (xN).

27/45 The empirical least squares approximation Definition (Cohen, Davenport and
Leviatan (2013)) Let x ∈ XN and q : X → R∗ + . Consider the so-called empirical semi-norm . q,x defined on L2(ω) by h 2 q,x := 1 N N i=1 q(xi )h(xi )2. The empirical least squares approximation is defined as ˆ fELS,M,x := arg min ˆ f ∈EM f − ˆ f 2 q,x , where EM := Span(e1, . . . , eM).

28/45 Main results When M = N, the function ˆ
fELS,M,x does not depend on q. Theorem (Belhadji, Bardenet and Chainais (2023)) Consider N, M ∈ N∗ such that M ≤ N. Let f ∈ F, and deﬁne ˆ ftELS,M,x := M m=1 ˆ fELS,N,x , em ωem. Under the determinantal distribution corresponding to κ = K (of cardinality N), we have Epκ f − ˆ ftELS,M,x 2 ω = f − fM 2 ω + M f − fN 2 ω . In particular, Epκ f − ˆ ftELS,M,x 2 ω ≤ (1 + M) f − fM 2 ω ’Instance Optimal Property’ (IOP) .

29/45 Main results Some remarks: Epκ f − ˆ ftELS,N,x
2 ω ≤ (1 + N) f − fN 2 ω = O(NσN+1) In general ˆ ftELS,M,x = ˆ fELS,M,x The IOP was proved for ˆ fELS,M,x under Christoﬀel sampling9: x1, . . . , xN ∼ i.i.d. 1 M M m=1 em(x)2dω(x) conditioned on the event { G − I op ≥ 1/2}, where G := ( ei , ej q,x )i,j∈[M] is the Gramian matrix of the family (ej )j∈[M] associated to the empirical scalar product ., . q,x 9Cohen, A. and Migliorati, G., 2017. Optimal weighted least-squares methods. The SMAI journal of computational mathematics, 3, pp.181-203.

30/45 A projection DPP as an extension of Christoffel sampling
The determinantal distribution corresponding associated to the kernel κ(x, y) := 1 M M m=1 em(x)em(y) is a natural extension of Christoffel sampling 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f5 hn 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f10 hn Figure: Histograms of 50000 realizations of the projection DPP associated to the first N Hermite polynomials, compared to the inverse of Christoffel function, and the nodes of the Gaussian quadrature with N ∈ {5, 10}

31/45 Numerical simulations We report the empirical expectation of a
surrogate of the worst interpolation error Eκ sup f =µg g ω≤1 f − ˆ fOKA,x 2 F ≈ Eκ sup f =µg g∈G f − ˆ fOKA,x 2 F ; κ = K where G ⊂ {g, g ω ≤ 1} is a ﬁnite set |G| = 5000. F is the periodic Sobolev space of order s = 3. 10 20 30 40 50 N 8 7 6 5 4 3 2 1 log10 (Squared error) DPPKQ LVSQ ( =0) LVSQ ( =0.01) LVSQ ( =0.1) UGKQ N+1

32/45 Numerical simulations F = Korobov space of order s
= 1, X = [0, 1]2 We report m(N) = EK f − ˆ fOKA,x 2 F ; κ = K where f = µem 10 100 1000 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 (Squared error) 1 (N) 2 (N) 3 (N) 4 (N) 5 (N) 6 (N) 7 (N) 8 (N) 11 (N) 12 (N) 21 (N) 22 (N) N+1 10 100 1000 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 (Squared error) 1 (N) 2 (N) 3 (N) 4 (N) 5 (N) 6 (N) 7 (N) 8 (N) 11 (N) 12 (N) 21 (N) 22 (N) N+1 N 2s/d Figure: OKQ using DPPs (left) vs OKQ using the uniform grid (right)

33/45 Numerical simulations F = the periodic Sobolev space of
order s = 1, X = [0, 1] We report Eκ f − ˆ fLS,x 2 ω ; κ = K where f = M m=1 ξmeω m ; ξ1, . . . , ξM ∼ i.i.d N(0, 1) 10 20 30 40 50 N 5 4 3 2 1 log10 (Squared error) m {1, ,50} median N+1 2 N+1 10 20 30 40 50 N 5 4 3 2 1 log10 (Squared error) m {1, ,50} median N+1 2 N+1 Figure: M = 10 (left) and M = 20 (right)

34/45 Numerical simulations Consider F to be the RKHS deﬁned
by the Sinc kernel k(x, y) = sin(F(x − y)) F(x − y) ; X = [−T/2, T/2] The eigenfunctions em correspond to the prolate spheroidal wave functions10 (PSWF) The asymptotics of the eigenvalues in the limit c := TF → +∞ were investigated11: there is approximately c eigenvalues close to 1, and the remaining eigenvalues decrease to 0 at an exponential rate. 10D. Slepian and H. O. Pollak. Prolate spheroidal wave functions, fourier analysis and uncertainty— i. Bell System Technical Journal, 40(1):43 63, 1961. 11H. J. Landau and H. Widom. Eigenvalue distribution of time and frequency limiting. Journal of Mathematical Analysis and Applications, 77(2):469 481, 1980.

35/45 Numerical simulations We report f − ˆ fLS,x 2
ω averaged over 50 realizations for f ∈ {e1, e2, e3, e4}. 5 10 20 25 30 35 N 10 11 10 9 10 7 10 5 10 3 10 1 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 9 10 7 10 5 10 3 10 1 101 103 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 10 10 8 10 6 10 4 10 2 100 102 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 9 10 7 10 5 10 3 10 1 101 103 105 Squared error T×F DPP PSWF DPP Legendre ChS

36/45 Conclusion Take Home Messages The theoretical study of the
optimal kernel quadrature under determinantal sampling The study of function reconstruction under determinantal sampling The analysis is universal Empirical validation on various RKHSs Projection DPPs are natural extensions of Christoﬀel sampling, and yield better empirical results RKHS DPPs ∃ functional space point processes

37/45 Perspectives: extending the theoretical results? The study of E
f − ˆ fELS,M,x 2 ω when dimension M = N number of nodes E sup f F ≤1 f − ˆ fLS,x 2 ω instead of E f − ˆ fLS,x 2 ω ? E sup f F ≤1 f − ˆ fOKA,x 2 ω and E f − ˆ fOKA,x 2 ω ? High order moments? ...

38/45 Perspectives: eﬃcient sampling in continuous domain? Let x =
{x1, . . . , xN} such that Det κ(x) > 0. We have Det κ(x) =κ(x1, x1) × κ(x2, x2) − κ(x1, x2)2 κ(x1, x1) . . . × κ(x , x ) − φx −1 (x )T κ(x −1)−1φx −1 (x ) . . . , × κ(xN, xN) − φxN−1 (xN)T κ(xN−1)−1φxN−1 (xN) where φx −1 (x) = (κ(ξ, x))T ξ∈x −1 ∈ R −1, x −1 = {x1, . . . , x −1}.

39/45 Perspectives: eﬃcient sampling in continuous domain? Deﬁne pκ,1(x) =
κ(x, x), pκ, (x) = κ(x, x) − φx −1 (x) κ(x −1)−1φx −1 (x); ≥ 2 If κ is a projection kernel X pκ, (x)dω(x) = N − + 1, and pκ(x) := 1 N! Det κ(x) = N =1 1 N − + 1 pκ, (x) and the sequential algorithm is exact (the HKPV algorithm).

40/45 Perspectives: eﬃcient sampling in continuous domain? Example: to conduct
Christoﬀel sampling for Legendre polynomials we can use the Bernstein’s bound12 ∀n ∈ N∗, Ln(x)2 ≤ 2 π 1 √ 1 − x2 =⇒ pκ,1(x) ≤ 2 π 1 √ 1 − x2 , and use rejection sampling (proposal = Beta distribution). 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x 1 2 3 4 f ,1 (x) (N = 5) Bernstein's bound 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 5 6 f ,1 (x) (N = 10) Bernstein's bound 12Lorch, L. (1983). “Alternative proof of a sharpened form of Bernstein’s inequality for Legendre polynomials”. In: Applicable Analysis 14.3, pp. 237–240.

41/45 Perspectives: eﬃcient sampling in continuous domain? One sample from
a projection DPP (of cardinality N) requires to draw in average N2 samples from the Christoﬀel distribution as a proposal: sampling is getting harder as N get bigger 0.0 0.2 0.4 0.6 0.8 1.0 x 4.1 4.2 4.3 4.4 4.5 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x)

42/45 Perspectives: efficient sampling in continuous domain? Sampling from a
projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.

projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? We may Look for upper bounds or asymptotics of the inverse of Christoffel function in some RKHS on general domains 13 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.

projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? We may Look for upper bounds or asymptotics of the inverse of Christoffel function in some RKHS on general domains 13 Work with continuous volume sampling → no need to spectral decomposition14 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.

43/45 Perspectives: extension to atomic measure reconstruction? Given a class
of objects M, is it possible to approximate the elements of M using its evaluation on some functionals: L1(µ), . . . , LN(µ) −→ ˆ µ ≈ µ; µ ∈ M The objects Functions Atomic measures The class M RKHS not a Hilbert space Functionals L1, . . . , LN µ → µ(xj ) µ → eiωT j x dµ(x) Distance preserving P. IOP RIP 15 Decoding ˆ fLS,x , ˆ fOKA,x , ... CL-OMP, Mean-shift 16 15Belhadji, A. and Gribonval, R., 2022. Revisiting RIP guarantees for sketching operators on mixture models. 16Belhadji, A. and Gribonval, R., 2022. Sketch and shift: a robust decoder for compressive clustering

44/45 Thank you for your attention!

45/45 References A. Belhadji, R. Bardenet, and P. Chainais Kernel
quadrature with DPPs. NeurIPS, 2019. A. Belhadji, R. Bardenet, and P. Chainais Kernel interpolation with continuous volume sampling ICML, 2020. A. Belhadji An analysis of Ermakov-Zolotukhin quadrature using kernels. NeurIPS, 2021. A. Belhadji, R. Bardenet, and P. Chainais Signal reconstruction using determinantal sampling arXiv:2310.09437.

1/20 Numerical simulations F = the Gaussian space, X =
R We report m(N) = EK f − ˆ fOKA,x 2 F ; κ = K where f = µm 0 10 20 30 40 50 N 10 8 6 4 2 0 log10 (Squared error) DPPKQ DPPKQ (UB) MCKQ SBQ MC N 0 10 20 30 40 50 N 10 8 6 4 2 0 log10 (Squared error) DPPKQ DPPKQ (UB) MCKQ SBQ MC N Figure: The squared interpolation error for e1 (Left), vs e15 (Right).

2/20 Perspectives: eﬃcient sampling in continuous domain? If κ =
k, the sequential algorithm is an approximation Theorem (Rezaei and Gharan (2019)) Let x the output of the sequential algorithm for κ = k, then x follows the density fseq that satisﬁes fseq(x) ≤ N!2fk(x). An MCMC algorithm for CVS [Rezaei and Gharan (2019)] CVS is the stationary distribution of a Markov chain that can be implemented in a fully kernelized way: using only the evaluations of the kernel k. fseq is the initialization of the Markov Chain.

3/20 Numerical simulations F = the periodic Sobolev space of
order s = 2, X = [0, 1] We report m(N) = Eκ f − ˆ fOKA,x 2 F ; f = µem for m ∈ {1, 5, 7} under CVS (κ = k). 5 10 20 N 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 log10 ( VS ( em ;x)2 ) m = 1 (th) m = 5 (th) m = 7 (th) m = 1 (emp) m = 5 (emp) m = 7 (emp)

4/20 Kernel quadrature: examples Consider X = [0, 1], s
∈ N∗ and ks(x, y) := 1 + (−1)s−1(2π)2s (2s)! B2s({x − y}), where {x − y} is the fractional part of x − y, and B2s is the Bernoulli polynomial of degree 2s: B0(x) = 1, B2(x) = x2 − x + 1 6 , B4(x) = x4 − 2x3 + x2 − 1 30 . . . 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5

5/20 Kernel quadrature: examples The corresponding RKHS is the periodic
Sobolev space of order s: Fs = f ∈ L2([0, 1]), f (0) = f (1), f , f , . . . , f (s) ∈ L2([0, 1]) , and the corresponding norm is the Sobolev norm: f 2 Fs = 1 0 f (x)dx 2 + m∈N∗ m2s 1 0 f (x)e−2πimx dx 2 . · · · ⊂ F4 ⊂ F3 ⊂ F2 ⊂ F1 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5

6/20 Kernel quadrature: an example The kernel ks satisﬁes the
following identity [Wahba 90] ks(x, y) = 1 + m∈N∗ 1 m2s cos(2πm(x − y)) it is equivalent to the Mercer decomposition with σm = O(m−2s) and (em)m∈N∗ is the Fourier family 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5

7/20 Kernel quadrature: Korobov space An extension to [0, 1]d
is possible via tensorization: kd,s(x, y) = δ∈[d] ks(xδ, yδ) = u∈Nd δ∈[d] σuδ δ∈[d] euδ (xδ) δ∈[d] euδ (yδ) The eigenvalue The multiplicity 1 3d 1/22s d.(d + 1) 1/62s d(3d − 1) ... ...

8/20 Kernel quadrature: Korobov space We have σN+1 = O
(log N)2s(d−1)N−2s [Bach 2017]. 0.0 0.5 1.0 1.5 2.0 2.5 log10 (N) 8 7 6 5 4 3 2 1 0 N+1 (logN)2s(d 1)N 2s 0.0 0.5 1.0 1.5 2.0 2.5 log10 (N) 4 3 2 1 0 N+1 (logN)2s(d 1)N 2s Figure: (Left): d = 2 and s = 2, (Right): d = 3 and s = 2. Compare it to QMC rates The sequence s The rate Halton 1 O (log N)2d N−2 Hammersley 1 O (log N)2(d−1)N−2 Higher order digital nets ∈ N∗ O (log N)2sd N−2s

9/20 Numerical simulations (EZQ vs OKQ) Let F be the
RKHS corresponding to the Sobolev space of periodic functions of order s = 3, g ∈ {φ1, φ10, φ20}. 10 20 30 50 100 log10 (N) 10 9 8 7 6 5 4 3 log10 (Squared error) EZQ 1 (N) EZQ 10 (N) EZQ 20 (N) rN+1 N+1 10 20 30 50 100 log10 (N) 10 9 8 7 6 5 4 3 log10 (Squared error) OKQ 1 (N) OKQ 10 (N) OKQ 20 (N) rN+1 N+1 Squared worst-case integration error vs. number of nodes N for EZQ (left) and OKQ (right) in F.

10/20 Main results: a tractable formula under volume sampling The
theoretical guarantee in the case κ = k is given in the following result. Theorem (Belhadji, Bardenet and Chainais (2020)) Let g = m∈N∗ g, em ωem then Ek µg − ΠT (x) µg 2 F = m∈N∗ g, em 2 ω m(N), m(N) = Ek µem − ΠT (x) µem 2 F = σm |T|=N,m/ ∈T t∈T σt |T|=N t∈T σt . How good is it?

11/20 Main results: how large are the epsilons? Theorem (Belhadji,
Bardenet and Chainais (2020)) If there exists B > 0 such that min M∈[N] m≥M σm (N − M + 1)σN+1 ≤ B. Then sup g ω≤1 Ek µg − ΠT (x) µg 2 F = 1(N) ≤ (1 + B)σN+1. Examples: σN B N−2s (1 + 1/(2s − 1))2s αN α/(1 − α)

12/20 Main results: how large are the epsilons? 5 10
20 40 N 9 8 7 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N Figure: log10 m (N) as a function of N when σN = N−2s, with s = 3.

13/20 Main results: how large are the epsilons? 5 10
20 40 N 3.0 2.5 2.0 1.5 1.0 0.5 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 20 40 N 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 20 40 N 9 8 7 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 12 10 8 6 4 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 6 5 4 3 2 1 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N Figure: Other examples in diﬀerent RKHSs.

14/20 Intuitions Observe that Eκ µg − ΠT (x) µg
2 F = Eκ Ox Σg 2 F = Eκ Ox ΣNg + Ox Σ⊥ N g 2 F ≤ 2 Eκ Ox ΣNg 2 F + Eκ Ox Σ⊥ N g 2 F where Ox = IF − ΠT (x) = ΠT (x)⊥ , ΣN = N m=1 σmem ⊗ em, Σ⊥ N = +∞ m=N+1 σmem ⊗ em.

14/20 Intuitions Observe that Eκ µg − ΠT (x) µg
2 F = Eκ Ox Σg 2 F = Eκ Ox ΣNg + Ox Σ⊥ N g 2 F ≤ 2 Eκ Ox ΣNg 2 F + Eκ Ox Σ⊥ N g 2 F where Ox = IF − ΠT (x) = ΠT (x)⊥ , ΣN = N m=1 σmem ⊗ em, Σ⊥ N = +∞ m=N+1 σmem ⊗ em. Ox = ΠT (x)⊥ is an orthogonal projection, then Ox Σ⊥ N g 2 F ≤ Σ⊥ N g 2 F = m≥N+1 σm g, em 2 ω ≤ σN+1 g 2 ω .

15/20 Intuitions Let m ∈ N∗ and put g =
em Ox ΣNem 2 F = σm Ox eF m 2 F = σm eF m − ΠT (x) eF m 2 F The error term is the product of two terms: the eigenvalue σm the reconstruction term eF m − ΠT (x) eF m 2 F ∈ [0, 1]

16/20 Intuitions σm eF m − ΠT (x) eF m
2 F = σ(1 − ΠT (x) eF m 2 F ) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2 1 0 1 2 3 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2 1 0 1 2 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 em (x) em k(xn ,.)

17/20 Intuitions Theorem Under the distribution of CVS κ =
k, we have ∀m ∈ N∗, Ek ΠT (x) eF m 2 F = |T|=N,m∈T t∈T σt |T|=N t∈T σt , and ∀m = m ∈ N∗, Ek ΠT (x) eF m , ΠT (x) eF m F = 0.

18/20 Intuitions EkτF m (x) := Ek ΠT (x) eF
m 2 F = |T|=N,m∈T t∈T σt |T|=N t∈T σt . 1 2 3 4 5 6 7 8 9 N 0.2 0.4 0.6 0.8 1.0 VS m (x) m = 1 m = 2 m = 3 m = 4 m = 5 Under CVS, T (x) gets closer to EN = Span(eF m )m∈[N] as N → +∞

19/20 Intuitions Alternatively, we can quantify the proximity between the
subspaces T (x) and EF N using the principal angles (θi (T (x), EF N ))i∈[N] . θN(T (x), EF N ) EF N = Span(eF m )m∈[N] T (x) = Span k(xn, .)n∈[N] For example, we have sup m∈[N] eF m − ΠT (x) eF m 2 F ≤ 1 cos2 θN(T (x), EF N ) − 1.

20/20 The determinantal distributions: link with DPPs Theorem (Belhadji, Bardenet
and Chainais (2020)) For U ⊂ N∗ deﬁne the kernel KU(x, y) = u∈U eu(x)eu(y). We have fk(x1, . . . , xN) = |U|=N u∈U σu |W |=N w∈W σw 1 N! Det(KU(x)). The largest weight in the mixture corresponds to U = [N]: K(x, y) = m∈[N] em(x)em(y).

Ayoub Belhadji

Ayoub Belhadji

More Decks by S³ Seminar

Featured

Transcript