Ayoub Belhadji - Speaker Deck

Slide 1

Slide 1 text

Signal Reconstruction using Determinantal Sampling Ayoub Belhadji OCKHAM team, LIP, ENS de Lyon Univ Lyon, Inria, CNRS, UCBL Joint work with R´ emi Bardenet and Pierre Chainais Centrale Lille, CRIStAL, Universit´ e de Lille, CNRS 07/12/2023 L2S - CentraleSup´ elec 1/45

Slide 2

Slide 2 text

2/45 Introduction A determinantal point process (DPP) is a distribution over subsets of some set X, I, . . . x1 x2 x3 x4 X I

Slide 3

Slide 3 text

2/45 Introduction A determinantal point process (DPP) is a distribution over subsets of some set X, I, . . . x1 x2 x3 x4 X I ...with the negative correlation property: ∀B, B ⊂ X, B ∩ B = ∅ =⇒ Cov(nx (B), nx (B )) ≤ 0, where nx (B) := |B ∩ x|

Slide 4

Slide 4 text

3/45 Introduction A DPP on [0, 1]2 0 1 1 0 1 1 0 1 1 i.i.d. particles on [0, 1]2 0 1 1 0 1 1 0 1 1

Slide 5

Slide 5 text

4/45 Introduction A DPP on the unit circle 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 i.i.d. particles on the unit circle 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

Slide 6

Slide 6 text

5/45 Introduction Early appearances of DPPs may be traced back to the work of Dyson (1962) 1 and Ginibre (1965)2 A universal generic (X, ω) deﬁnition is given in the work of Macchi (1975)3 1Dyson, F.J., 1962. Statistical theory of the energy levels of complex systems. I. Journal of Mathematical Physics, 3(1), pp.140-156. 2Ginibre, J., 1965. Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), pp.440-449. 3Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

Slide 7

Slide 7 text

5/45 Introduction Early appearances of DPPs may be traced back to the work of Dyson (1962) 1 and Ginibre (1965)2 A universal generic (X, ω) definition is given in the work of Macchi (1975)3 Definition (informal): Determinantal Point Process Given a metric space X and a measure ω, a DPP satisfies PDPP ∃ at least k points one in each Bi , i = 1, . . . , k = B1×···×Bk Det κ(x1, . . . , xk ) kernel matrix dω(x1 ) . . . dω(xk ) for a kernel κ : X × X → R. 1Dyson, F.J., 1962. Statistical theory of the energy levels of complex systems. I. Journal of Mathematical Physics, 3(1), pp.140-156. 2Ginibre, J., 1965. Statistical ensembles of complex, quaternion, and real matrices. Journal of Mathematical Physics, 6(3), pp.440-449. 3Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

Slide 8

Slide 8 text

6/45 Introduction Sampling using DPPs until 2019... Discrete Continuous learning on budget 4 numerical integration node selection in a graph 5 X ⊂ R 6 feature selection universal results 7 X = [0, 1]d speciﬁc results 8 4[Deshpande et al. (2006)],[Derezinski et al. (2017,2018,2019)] 5[Tremblay et al. (2017)] 6[Lambert (2018)] 7[Belhadji et al. (2018)] 8[Bardenet and Hardy (2016)]

Slide 9

Slide 9 text

6/45 Introduction Sampling using DPPs until 2019... Discrete Continuous learning on budget 4 numerical integration node selection in a graph 5 X ⊂ R 6 feature selection universal results 7 X = [0, 1]d speciﬁc results 8 universal results for DPP-based sampling in continuous domain? 4[Deshpande et al. (2006)],[Derezinski et al. (2017,2018,2019)] 5[Tremblay et al. (2017)] 6[Lambert (2018)] 7[Belhadji et al. (2018)] 8[Bardenet and Hardy (2016)]

Slide 10

Slide 10 text

7/45 Outline 1 The setting 2 DPPs for numerical integration in RKHSs 3 Beyond numerical integration 4 Numerical simulations 5 Perspectives

Slide 11

Slide 11 text

8/45 Towards a universal construction of quadrature rules DPP based quadrature QMC Weyl criterion Gaussian quadrature OPE Kernel quadrature Vanilla Monte Carlo ? CLTs

Slide 12

Slide 12 text

8/45 Towards a universal construction of quadrature rules DPP based quadrature QMC Weyl criterion Gaussian quadrature OPE Kernel quadrature Vanilla Monte Carlo ? CLTs A universal construction of quadrature rules using DPPs?

Slide 13

Slide 13 text

9/45 Kernel-based analysis of quadrature rules The crux of kernel-based analysis of a quadrature rule is the study of the worst case error on the unit ball of a RKHS F sup f F ≤1 X f (x)g(x)dω(x) integral − i∈[N] wi f (xi ) quadrature rule L2 (ω) F

Slide 14

Slide 14 text

10/45 Reproducing kernel Hilbert spaces Deﬁnition An RKHS F is a Hilbert space associated to a kernel k, satisfying: ∀x ∈ X, f → f (x) is continuous ∀(x, f ) ∈ X × F, f , k(x, .) F = f (x) Example: X = [0, 1] and k1(x, y) := 1 − 2π2B2({x − y}) where {x − y} is the fractional part of x − y, and B2(x) = x2 − x + 1 6 . 0 1 x 0 1 2 3 4 k1 (x0 ,x)

Slide 15

Slide 15 text

11/45 Embeddings as elements of the RKHS F contains smooth functions Deﬁnition: an embedding of an element of L2(ω) Given g ∈ L2(ω), the embedding of g is deﬁned by µg = Σg := X k(x, .)g(x)dω(x) ∈ F Σ : L2(ω) → L2(ω) = integration operator associated to (k, ω). 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 g g (s=1) 0.0 0.2 0.4 0.6 0.8 1.0 x 1.8 2.0 2.2 2.4 2.6 2.8 3.0 g g (s=1) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 g g (s=1)

Slide 16

Slide 16 text

12/45 Embeddings and quadrature rules Properties The reproducibility of integrals ∀f ∈ F, f , µg F = X f (x)g(x)dω(x). The worst integration error on the unit ball of F: sup f F ≤1 X f (x)g(x)dω(x) integral − N i=1 wi f (xi ) quadrature rule = µg − N i=1 wi k(xi , .) F WCE The study of quadrature rules boils down to the study of kernel approximations of embeddings

Slide 17

Slide 17 text

13/45 Embeddings and quadrature rules A sanity check using the ’Monte Carlo quadrature’ Let x1, . . . , xN = i.i.d. ∼ ω, wi = 1/N. Under some assumptions on k, we have E µg − N i=1 1 N k(xi , .) 2 F = O(1/N). We recover the ’Monte Carlo rate’ O(1/N)

Slide 18

Slide 18 text

Slide 19

Slide 19 text

14/45 The optimal kernel quadrature Deﬁnition: the optimal kernel quadrature Given a set of nodes x = {x1, . . . , xN} s.t. K(x) :=    k(x1, x1) . . . k(x1, xN) . . . ... . . . k(xN, x1) . . . k(xN, xN)    is non-singular, the optimal kernel quadrature is the couple (x, ˆ w) such that µg − N i=1 ˆ wi (µg )k(xi , .) F = min w∈RN µg − N i=1 wi k(xi , .) F

Slide 20

Slide 20 text

15/45 The convergence of the optimal kernel quadrature The study of the convergence rate of the optimal kernel quadrature was carried out in several works X F or k x The rate Reference [0, 1] Sobolev S. Unif. grid O(N−2s ) [Novak et al., 2015] (g is cos or sin) [Bojanov , 1981] [0, 1]d ⊗ Sobolev S. QMC seq. QMC rates [Briol et al, 2019] (g is constant) [0, 1]d ⊗ Sobolev S. - O(N−2s/d ) [Wendland, 2004] Rd Gaussian ⊗ Hermite roots O(exp(−αN)) [Karvonen et al., 2019] (+ assumptions) ... ... ... ... ... Limitation The analysis is too speciﬁc to the RKHS F, to x, to g... Any universal analysis of the OKQ?

Slide 21

Slide 21 text

Slide 22

Slide 22 text

16/45 A spectral characterization of the RKHS What is common among existing results? Assumptions There exists a spectral decomposition (em, σm)m∈N∗ of Σ where (em)m∈N∗ is an o.n.b. of L2(ω) and σ1 ≥ σ2 ≥ ... > 0 s.t. m∈N∗ σm < +∞ The RKHS F is dense in L2(ω) f 2 F = m∈N∗ f , em 2 ω σm =⇒ { f F = 1} is an ellipsoid in L2 (ω) m∈N∗ µg , em 2 ω σ2 m = m∈N∗ g, em 2 ω =⇒ {µg ; g ω = 1} is an ellipsoid in L2 (ω)

Slide 23

Slide 23 text

17/45 A spectral characterization of the RKHS Σ1/2 Σ1/2 Σ L2(ω) F = Σ1/2 L2(ω) ΣL2(ω) B L2(ω) (0, 1) BF (0, 1) ΣB L2(ω) (0, 1) e1 e2 eF 1 eF 2 µe1 µe2 X F or k σN+1 (em) [0, 1] Sobolev O(N−2s) Fourier [0, 1]d Korobov O(log(N)2s(d−1)N−2s) ⊗ of Fourier [0, 1]d Sobolev O(N−2s/d ) ”Fourier” Sd Dot product ”-” Spherical Harmonics R Gaussian O(e−αN) Hermite Polys. Rd Gaussian O(e−αdN1/d ) ⊗ of Hermite Polys. ... ... ... ...

Slide 24

Slide 24 text

Slide 25

Slide 25 text

18/45 The determinantal distributions Deﬁnition Let κ be a kernel s.t. X κ(x, x)dω(x) < +∞. The function pκ(x1, . . . , xN) ∝ Det κ (x) = Det    κ(x1, x1) . . . κ(x1, xN) . . . ... . . . κ(xN, x1) . . . κ(xN, xN)    is a p.d.f. on XN w.r.t. ω⊗N. Theorem (Belhadji et al. (2020)) For (x1, . . . , xN) ∼ pκ, the set x := {x1, . . . , xN} follows the distribution of a mixture of DPPs.

Slide 26

Slide 26 text

19/45 Main results Theorem (Belhadji, Bardenet and Chainais (2019)) Under the determinantal distribution corresponding to κ(x, y) = K(x, y) := n∈[N] en(x)en(y), x follows the distribution of a (projection) DPP, and we have Epκ sup g ω≤1 µg − N i=1 ˆ wi (µg )k(xi , .) 2 F ≤ 4N2rN+1, where rN+1 = +∞ m=N+1 σm. σm Theoretical rate (N2rN+1 ) Empirical rate m−2s N3O(σN+1 ) O(σN+1 ) αm N2O(σN+1 ) O(σN+1 ) ≈ O(σN+1 )

Slide 27

Slide 27 text

20/45 Main results Theorem (Belhadji (2021)) Under the determinantal distribution corresponding to κ(x, y) = K(x, y) := n∈[N] en(x)en(y), x follows the distribution of a (projection) DPP, and we have ∀g ∈ L2(ω), Epκ µg − N i=1 ˆ wi (µg )k(xi , .) 2 F ≤4 g 2 ω rN+1 =O(rN+1) where rN+1 := +∞ m=N+1 σm.

Slide 28

Slide 28 text

21/45 Main results Theorem (Belhadji, Bardenet and Chainais (2020)) Under the determinantal distribution corresponding to κ = k, we have ∀g ∈ L2(ω), Epκ µg − N i=1 ˆ wi (µg )k(xi , .) 2 F = +∞ m=1 g, em 2 ω m(N) ≤ g 2 ω 1(N) where m(N) = T⊂N∗ {m} |T|:=N t∈T σt T⊂N∗ |T|=N t∈T σt = O(σN+1) optimal rate (Pinkus (1985))

Slide 29

Slide 29 text

22/45 The optimal kernel quadrature and kernel interpolation The mixture corresponding to OKQ is an interpolant The function ˆ µg := N i=1 ˆ wi (µg )k(xi , .) satisﬁes ∀i ∈ [N], ˆ µg (xi ) = µg (xi ). 0.0 0.2 0.4 0.6 0.8 1.0 x 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 g (x) g (x) g k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 g (x) g (x) g k(xn ,.)

Slide 30

Slide 30 text

23/45 Main results Theorem (Belhadji, Bardenet and Chainais (2020)) Under the determinantal distribution corresponding to κ = k, for r ∈ [0, 1/2], we have ∀f ∈ Σ1/2+r L2(ω), Epκ f − N i=1 ˆ wi (f )k(xi , .) 2 F = O(σN+1 2r ) r = 0 → a generic element of F r = 1/2 → an embedding of some g ∈ L2(ω) B L2(ω) (0, 1) Σ1/2B L2(ω) (0, 1) BF (0, 1) Σ1/2+r B L2(ω) (0, 1) ΣB L2(ω) (0, 1)

Slide 31

Slide 31 text

24/45 Main results The RKHS norm . F is strong: f ∞ ≤ sup x∈X k(x, x) f F We seek convergence guarantees in a weaker norm such as . ω

Slide 32

Slide 32 text

Slide 33

Slide 33 text

25/45 Main results Definition: least squares approximation Given x = {x1, . . . , xN} ⊂ X, the least squares approximation of f ∈ F associated to x is the function ˆ fLS,x defined by f − ˆ fLS,x ω = min ˆ f ∈T (x) f − ˆ f ω, where T (x) := Span(k(x1, .), . . . , k(xN, .)). = Definition: optimal kernel approximation Given x = {x1, . . . , xN} ⊂ X, the optimal kernel approximation of f ∈ F associated to x is the function ˆ fOKA,x defined by f − ˆ fOKA,x F = min ˆ f ∈T (x) f − ˆ f F .

Slide 34

Slide 34 text

26/45 Main results Theorem (Belhadji, Bardenet and Chainais (2023)) Under the determinantal distribution corresponding to κ = K, we have ∀f ∈ F, Epκ f − ˆ fLS,x 2 ω ≤ 2 f − fN 2 ω O(σN+1) + fN 2 ω +∞ m=N+1 σ2 m O(σN+1rN+1) superconvergence where rN+1 = +∞ m=N+1 σm.

Slide 35

Slide 35 text

Slide 36

Slide 36 text

27/45 The empirical least squares approximation Definition (Cohen, Davenport and Leviatan (2013)) Let x ∈ XN and q : X → R∗ + . Consider the so-called empirical semi-norm . q,x defined on L2(ω) by h 2 q,x := 1 N N i=1 q(xi )h(xi )2. The empirical least squares approximation is defined as ˆ fELS,M,x := arg min ˆ f ∈EM f − ˆ f 2 q,x , where EM := Span(e1, . . . , eM).

Slide 37

Slide 37 text

28/45 Main results When M = N, the function ˆ fELS,M,x does not depend on q. Theorem (Belhadji, Bardenet and Chainais (2023)) Consider N, M ∈ N∗ such that M ≤ N. Let f ∈ F, and deﬁne ˆ ftELS,M,x := M m=1 ˆ fELS,N,x , em ωem. Under the determinantal distribution corresponding to κ = K (of cardinality N), we have Epκ f − ˆ ftELS,M,x 2 ω = f − fM 2 ω + M f − fN 2 ω . In particular, Epκ f − ˆ ftELS,M,x 2 ω ≤ (1 + M) f − fM 2 ω ’Instance Optimal Property’ (IOP) .

Slide 38

Slide 38 text

29/45 Main results Some remarks: Epκ f − ˆ ftELS,N,x 2 ω ≤ (1 + N) f − fN 2 ω = O(NσN+1) In general ˆ ftELS,M,x = ˆ fELS,M,x The IOP was proved for ˆ fELS,M,x under Christoﬀel sampling9: x1, . . . , xN ∼ i.i.d. 1 M M m=1 em(x)2dω(x) conditioned on the event { G − I op ≥ 1/2}, where G := ( ei , ej q,x )i,j∈[M] is the Gramian matrix of the family (ej )j∈[M] associated to the empirical scalar product ., . q,x 9Cohen, A. and Migliorati, G., 2017. Optimal weighted least-squares methods. The SMAI journal of computational mathematics, 3, pp.181-203.

Slide 39

Slide 39 text

30/45 A projection DPP as an extension of Christoffel sampling The determinantal distribution corresponding associated to the kernel κ(x, y) := 1 M M m=1 em(x)em(y) is a natural extension of Christoffel sampling 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f5 hn 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f10 hn Figure: Histograms of 50000 realizations of the projection DPP associated to the first N Hermite polynomials, compared to the inverse of Christoffel function, and the nodes of the Gaussian quadrature with N ∈ {5, 10}

Slide 40

Slide 40 text

31/45 Numerical simulations We report the empirical expectation of a surrogate of the worst interpolation error Eκ sup f =µg g ω≤1 f − ˆ fOKA,x 2 F ≈ Eκ sup f =µg g∈G f − ˆ fOKA,x 2 F ; κ = K where G ⊂ {g, g ω ≤ 1} is a ﬁnite set |G| = 5000. F is the periodic Sobolev space of order s = 3. 10 20 30 40 50 N 8 7 6 5 4 3 2 1 log10 (Squared error) DPPKQ LVSQ ( =0) LVSQ ( =0.01) LVSQ ( =0.1) UGKQ N+1

Slide 41

Slide 41 text

32/45 Numerical simulations F = Korobov space of order s = 1, X = [0, 1]2 We report m(N) = EK f − ˆ fOKA,x 2 F ; κ = K where f = µem 10 100 1000 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 (Squared error) 1 (N) 2 (N) 3 (N) 4 (N) 5 (N) 6 (N) 7 (N) 8 (N) 11 (N) 12 (N) 21 (N) 22 (N) N+1 10 100 1000 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 (Squared error) 1 (N) 2 (N) 3 (N) 4 (N) 5 (N) 6 (N) 7 (N) 8 (N) 11 (N) 12 (N) 21 (N) 22 (N) N+1 N 2s/d Figure: OKQ using DPPs (left) vs OKQ using the uniform grid (right)

Slide 42

Slide 42 text

33/45 Numerical simulations F = the periodic Sobolev space of order s = 1, X = [0, 1] We report Eκ f − ˆ fLS,x 2 ω ; κ = K where f = M m=1 ξmeω m ; ξ1, . . . , ξM ∼ i.i.d N(0, 1) 10 20 30 40 50 N 5 4 3 2 1 log10 (Squared error) m {1, ,50} median N+1 2 N+1 10 20 30 40 50 N 5 4 3 2 1 log10 (Squared error) m {1, ,50} median N+1 2 N+1 Figure: M = 10 (left) and M = 20 (right)

Slide 43

Slide 43 text

34/45 Numerical simulations Consider F to be the RKHS deﬁned by the Sinc kernel k(x, y) = sin(F(x − y)) F(x − y) ; X = [−T/2, T/2] The eigenfunctions em correspond to the prolate spheroidal wave functions10 (PSWF) The asymptotics of the eigenvalues in the limit c := TF → +∞ were investigated11: there is approximately c eigenvalues close to 1, and the remaining eigenvalues decrease to 0 at an exponential rate. 10D. Slepian and H. O. Pollak. Prolate spheroidal wave functions, fourier analysis and uncertainty— i. Bell System Technical Journal, 40(1):43 63, 1961. 11H. J. Landau and H. Widom. Eigenvalue distribution of time and frequency limiting. Journal of Mathematical Analysis and Applications, 77(2):469 481, 1980.

Slide 44

Slide 44 text

35/45 Numerical simulations We report f − ˆ fLS,x 2 ω averaged over 50 realizations for f ∈ {e1, e2, e3, e4}. 5 10 20 25 30 35 N 10 11 10 9 10 7 10 5 10 3 10 1 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 9 10 7 10 5 10 3 10 1 101 103 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 10 10 8 10 6 10 4 10 2 100 102 Squared error T×F DPP PSWF DPP Legendre ChS 5 10 20 25 30 35 N 10 9 10 7 10 5 10 3 10 1 101 103 105 Squared error T×F DPP PSWF DPP Legendre ChS

Slide 45

Slide 45 text

36/45 Conclusion Take Home Messages The theoretical study of the optimal kernel quadrature under determinantal sampling The study of function reconstruction under determinantal sampling The analysis is universal Empirical validation on various RKHSs Projection DPPs are natural extensions of Christoﬀel sampling, and yield better empirical results RKHS DPPs ∃ functional space point processes

Slide 46

Slide 46 text

37/45 Perspectives: extending the theoretical results? The study of E f − ˆ fELS,M,x 2 ω when dimension M = N number of nodes E sup f F ≤1 f − ˆ fLS,x 2 ω instead of E f − ˆ fLS,x 2 ω ? E sup f F ≤1 f − ˆ fOKA,x 2 ω and E f − ˆ fOKA,x 2 ω ? High order moments? ...

Slide 47

Slide 47 text

38/45 Perspectives: eﬃcient sampling in continuous domain? Let x = {x1, . . . , xN} such that Det κ(x) > 0. We have Det κ(x) =κ(x1, x1) × κ(x2, x2) − κ(x1, x2)2 κ(x1, x1) . . . × κ(x , x ) − φx −1 (x )T κ(x −1)−1φx −1 (x ) . . . , × κ(xN, xN) − φxN−1 (xN)T κ(xN−1)−1φxN−1 (xN) where φx −1 (x) = (κ(ξ, x))T ξ∈x −1 ∈ R −1, x −1 = {x1, . . . , x −1}.

Slide 48

Slide 48 text

39/45 Perspectives: eﬃcient sampling in continuous domain? Deﬁne pκ,1(x) = κ(x, x), pκ, (x) = κ(x, x) − φx −1 (x) κ(x −1)−1φx −1 (x); ≥ 2 If κ is a projection kernel X pκ, (x)dω(x) = N − + 1, and pκ(x) := 1 N! Det κ(x) = N =1 1 N − + 1 pκ, (x) and the sequential algorithm is exact (the HKPV algorithm).

Slide 49

Slide 49 text

40/45 Perspectives: eﬃcient sampling in continuous domain? Example: to conduct Christoﬀel sampling for Legendre polynomials we can use the Bernstein’s bound12 ∀n ∈ N∗, Ln(x)2 ≤ 2 π 1 √ 1 − x2 =⇒ pκ,1(x) ≤ 2 π 1 √ 1 − x2 , and use rejection sampling (proposal = Beta distribution). 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x 1 2 3 4 f ,1 (x) (N = 5) Bernstein's bound 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 5 6 f ,1 (x) (N = 10) Bernstein's bound 12Lorch, L. (1983). “Alternative proof of a sharpened form of Bernstein’s inequality for Legendre polynomials”. In: Applicable Analysis 14.3, pp. 237–240.

Slide 50

Slide 50 text

41/45 Perspectives: eﬃcient sampling in continuous domain? One sample from a projection DPP (of cardinality N) requires to draw in average N2 samples from the Christoﬀel distribution as a proposal: sampling is getting harder as N get bigger 0.0 0.2 0.4 0.6 0.8 1.0 x 4.1 4.2 4.3 4.4 4.5 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x) 0.0 0.2 0.4 0.6 0.8 1.0 x 0 1 2 3 4 p(x;x)

Slide 51

Slide 51 text

42/45 Perspectives: efficient sampling in continuous domain? Sampling from a projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.

Slide 52

Slide 52 text

42/45 Perspectives: efficient sampling in continuous domain? Sampling from a projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? We may Look for upper bounds or asymptotics of the inverse of Christoffel function in some RKHS on general domains 13 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.

Slide 53

Slide 53 text

42/45 Perspectives: efficient sampling in continuous domain? Sampling from a projection DPP = looking for the eigenfunctions + looking for an upper bound for the inverse Christoffel function + looking for a sampling algorithm from the upper bound How to address these issues in practice? We may Look for upper bounds or asymptotics of the inverse of Christoffel function in some RKHS on general domains 13 Work with continuous volume sampling → no need to spectral decomposition14 13Dolbeault, M. and Cohen, A., 2022. Optimal sampling and Christoffel functions on general domains. Constructive Approximation, 56(1), pp.121-163. 14Belhadji, A., Bardenet, R. and Chainais, P., 2020, November. Kernel interpolation with continuous volume sampling. In International Conference on Machine Learning (pp. 725-735). PMLR.

Slide 54

Slide 54 text

43/45 Perspectives: extension to atomic measure reconstruction? Given a class of objects M, is it possible to approximate the elements of M using its evaluation on some functionals: L1(µ), . . . , LN(µ) −→ ˆ µ ≈ µ; µ ∈ M The objects Functions Atomic measures The class M RKHS not a Hilbert space Functionals L1, . . . , LN µ → µ(xj ) µ → eiωT j x dµ(x) Distance preserving P. IOP RIP 15 Decoding ˆ fLS,x , ˆ fOKA,x , ... CL-OMP, Mean-shift 16 15Belhadji, A. and Gribonval, R., 2022. Revisiting RIP guarantees for sketching operators on mixture models. 16Belhadji, A. and Gribonval, R., 2022. Sketch and shift: a robust decoder for compressive clustering

Slide 55

Slide 55 text

44/45 Thank you for your attention!

Slide 56

Slide 56 text

45/45 References A. Belhadji, R. Bardenet, and P. Chainais Kernel quadrature with DPPs. NeurIPS, 2019. A. Belhadji, R. Bardenet, and P. Chainais Kernel interpolation with continuous volume sampling ICML, 2020. A. Belhadji An analysis of Ermakov-Zolotukhin quadrature using kernels. NeurIPS, 2021. A. Belhadji, R. Bardenet, and P. Chainais Signal reconstruction using determinantal sampling arXiv:2310.09437.

Slide 57

Slide 57 text

1/20 Numerical simulations F = the Gaussian space, X = R We report m(N) = EK f − ˆ fOKA,x 2 F ; κ = K where f = µm 0 10 20 30 40 50 N 10 8 6 4 2 0 log10 (Squared error) DPPKQ DPPKQ (UB) MCKQ SBQ MC N 0 10 20 30 40 50 N 10 8 6 4 2 0 log10 (Squared error) DPPKQ DPPKQ (UB) MCKQ SBQ MC N Figure: The squared interpolation error for e1 (Left), vs e15 (Right).

Slide 58

Slide 58 text

2/20 Perspectives: eﬃcient sampling in continuous domain? If κ = k, the sequential algorithm is an approximation Theorem (Rezaei and Gharan (2019)) Let x the output of the sequential algorithm for κ = k, then x follows the density fseq that satisﬁes fseq(x) ≤ N!2fk(x). An MCMC algorithm for CVS [Rezaei and Gharan (2019)] CVS is the stationary distribution of a Markov chain that can be implemented in a fully kernelized way: using only the evaluations of the kernel k. fseq is the initialization of the Markov Chain.

Slide 59

Slide 59 text

3/20 Numerical simulations F = the periodic Sobolev space of order s = 2, X = [0, 1] We report m(N) = Eκ f − ˆ fOKA,x 2 F ; f = µem for m ∈ {1, 5, 7} under CVS (κ = k). 5 10 20 N 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 log10 ( VS ( em ;x)2 ) m = 1 (th) m = 5 (th) m = 7 (th) m = 1 (emp) m = 5 (emp) m = 7 (emp)

Slide 60

Slide 60 text

4/20 Kernel quadrature: examples Consider X = [0, 1], s ∈ N∗ and ks(x, y) := 1 + (−1)s−1(2π)2s (2s)! B2s({x − y}), where {x − y} is the fractional part of x − y, and B2s is the Bernoulli polynomial of degree 2s: B0(x) = 1, B2(x) = x2 − x + 1 6 , B4(x) = x4 − 2x3 + x2 − 1 30 . . . 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5

Slide 61

Slide 61 text

5/20 Kernel quadrature: examples The corresponding RKHS is the periodic Sobolev space of order s: Fs = f ∈ L2([0, 1]), f (0) = f (1), f , f , . . . , f (s) ∈ L2([0, 1]) , and the corresponding norm is the Sobolev norm: f 2 Fs = 1 0 f (x)dx 2 + m∈N∗ m2s 1 0 f (x)e−2πimx dx 2 . · · · ⊂ F4 ⊂ F3 ⊂ F2 ⊂ F1 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5

Slide 62

Slide 62 text

6/20 Kernel quadrature: an example The kernel ks satisﬁes the following identity [Wahba 90] ks(x, y) = 1 + m∈N∗ 1 m2s cos(2πm(x − y)) it is equivalent to the Mercer decomposition with σm = O(m−2s) and (em)m∈N∗ is the Fourier family 0.0 0.2 0.4 0.6 0.8 1.0 x 1 0 1 2 3 4 ks (x0 ,x) s=1 s=2 s=3 s=4 s=5

Slide 63

Slide 63 text

7/20 Kernel quadrature: Korobov space An extension to [0, 1]d is possible via tensorization: kd,s(x, y) = δ∈[d] ks(xδ, yδ) = u∈Nd δ∈[d] σuδ δ∈[d] euδ (xδ) δ∈[d] euδ (yδ) The eigenvalue The multiplicity 1 3d 1/22s d.(d + 1) 1/62s d(3d − 1) ... ...

Slide 64

Slide 64 text

8/20 Kernel quadrature: Korobov space We have σN+1 = O (log N)2s(d−1)N−2s [Bach 2017]. 0.0 0.5 1.0 1.5 2.0 2.5 log10 (N) 8 7 6 5 4 3 2 1 0 N+1 (logN)2s(d 1)N 2s 0.0 0.5 1.0 1.5 2.0 2.5 log10 (N) 4 3 2 1 0 N+1 (logN)2s(d 1)N 2s Figure: (Left): d = 2 and s = 2, (Right): d = 3 and s = 2. Compare it to QMC rates The sequence s The rate Halton 1 O (log N)2d N−2 Hammersley 1 O (log N)2(d−1)N−2 Higher order digital nets ∈ N∗ O (log N)2sd N−2s

Slide 65

Slide 65 text

9/20 Numerical simulations (EZQ vs OKQ) Let F be the RKHS corresponding to the Sobolev space of periodic functions of order s = 3, g ∈ {φ1, φ10, φ20}. 10 20 30 50 100 log10 (N) 10 9 8 7 6 5 4 3 log10 (Squared error) EZQ 1 (N) EZQ 10 (N) EZQ 20 (N) rN+1 N+1 10 20 30 50 100 log10 (N) 10 9 8 7 6 5 4 3 log10 (Squared error) OKQ 1 (N) OKQ 10 (N) OKQ 20 (N) rN+1 N+1 Squared worst-case integration error vs. number of nodes N for EZQ (left) and OKQ (right) in F.

Slide 66

Slide 66 text

10/20 Main results: a tractable formula under volume sampling The theoretical guarantee in the case κ = k is given in the following result. Theorem (Belhadji, Bardenet and Chainais (2020)) Let g = m∈N∗ g, em ωem then Ek µg − ΠT (x) µg 2 F = m∈N∗ g, em 2 ω m(N), m(N) = Ek µem − ΠT (x) µem 2 F = σm |T|=N,m/ ∈T t∈T σt |T|=N t∈T σt . How good is it?

Slide 67

Slide 67 text

11/20 Main results: how large are the epsilons? Theorem (Belhadji, Bardenet and Chainais (2020)) If there exists B > 0 such that min M∈[N] m≥M σm (N − M + 1)σN+1 ≤ B. Then sup g ω≤1 Ek µg − ΠT (x) µg 2 F = 1(N) ≤ (1 + B)σN+1. Examples: σN B N−2s (1 + 1/(2s − 1))2s αN α/(1 − α)

Slide 68

Slide 68 text

12/20 Main results: how large are the epsilons? 5 10 20 40 N 9 8 7 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N Figure: log10 m (N) as a function of N when σN = N−2s, with s = 3.

Slide 69

Slide 69 text

13/20 Main results: how large are the epsilons? 5 10 20 40 N 3.0 2.5 2.0 1.5 1.0 0.5 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 20 40 N 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 20 40 N 9 8 7 6 5 4 3 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 12 10 8 6 4 2 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 6 5 4 3 2 1 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N 5 10 15 20 N 3.0 2.5 2.0 1.5 1.0 0.5 0.0 log10 ( VS ( em ;x)2 ) m = 1 m = 2 m = 3 m = 4 m = 5 UB: (1+B) N Figure: Other examples in diﬀerent RKHSs.

Slide 70

Slide 70 text

14/20 Intuitions Observe that Eκ µg − ΠT (x) µg 2 F = Eκ Ox Σg 2 F = Eκ Ox ΣNg + Ox Σ⊥ N g 2 F ≤ 2 Eκ Ox ΣNg 2 F + Eκ Ox Σ⊥ N g 2 F where Ox = IF − ΠT (x) = ΠT (x)⊥ , ΣN = N m=1 σmem ⊗ em, Σ⊥ N = +∞ m=N+1 σmem ⊗ em.

Slide 71

Slide 71 text

14/20 Intuitions Observe that Eκ µg − ΠT (x) µg 2 F = Eκ Ox Σg 2 F = Eκ Ox ΣNg + Ox Σ⊥ N g 2 F ≤ 2 Eκ Ox ΣNg 2 F + Eκ Ox Σ⊥ N g 2 F where Ox = IF − ΠT (x) = ΠT (x)⊥ , ΣN = N m=1 σmem ⊗ em, Σ⊥ N = +∞ m=N+1 σmem ⊗ em. Ox = ΠT (x)⊥ is an orthogonal projection, then Ox Σ⊥ N g 2 F ≤ Σ⊥ N g 2 F = m≥N+1 σm g, em 2 ω ≤ σN+1 g 2 ω .

Slide 72

Slide 72 text

15/20 Intuitions Let m ∈ N∗ and put g = em Ox ΣNem 2 F = σm Ox eF m 2 F = σm eF m − ΠT (x) eF m 2 F The error term is the product of two terms: the eigenvalue σm the reconstruction term eF m − ΠT (x) eF m 2 F ∈ [0, 1]

Slide 73

Slide 73 text

16/20 Intuitions σm eF m − ΠT (x) eF m 2 F = σ(1 − ΠT (x) eF m 2 F ) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2 1 0 1 2 3 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2 1 0 1 2 em (x) em k(xn ,.) 0.0 0.2 0.4 0.6 0.8 1.0 x 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 em (x) em k(xn ,.)

Slide 74

Slide 74 text

17/20 Intuitions Theorem Under the distribution of CVS κ = k, we have ∀m ∈ N∗, Ek ΠT (x) eF m 2 F = |T|=N,m∈T t∈T σt |T|=N t∈T σt , and ∀m = m ∈ N∗, Ek ΠT (x) eF m , ΠT (x) eF m F = 0.

Slide 75

Slide 75 text

18/20 Intuitions EkτF m (x) := Ek ΠT (x) eF m 2 F = |T|=N,m∈T t∈T σt |T|=N t∈T σt . 1 2 3 4 5 6 7 8 9 N 0.2 0.4 0.6 0.8 1.0 VS m (x) m = 1 m = 2 m = 3 m = 4 m = 5 Under CVS, T (x) gets closer to EN = Span(eF m )m∈[N] as N → +∞

Slide 76

Slide 76 text

19/20 Intuitions Alternatively, we can quantify the proximity between the subspaces T (x) and EF N using the principal angles (θi (T (x), EF N ))i∈[N] . θN(T (x), EF N ) EF N = Span(eF m )m∈[N] T (x) = Span k(xn, .)n∈[N] For example, we have sup m∈[N] eF m − ΠT (x) eF m 2 F ≤ 1 cos2 θN(T (x), EF N ) − 1.

Slide 77

Slide 77 text

20/20 The determinantal distributions: link with DPPs Theorem (Belhadji, Bardenet and Chainais (2020)) For U ⊂ N∗ deﬁne the kernel KU(x, y) = u∈U eu(x)eu(y). We have fk(x1, . . . , xN) = |U|=N u∈U σu |W |=N w∈W σw 1 N! Det(KU(x)). The largest weight in the mixture corresponds to U = [N]: K(x, y) = m∈[N] em(x)em(y).