Small Deviations for the $\beta$-Jacobi Ensemble

89cdde1a980f37c3558340d7a8412e0f?s=47 Ben
September 28, 2012

Small Deviations for the $\beta$-Jacobi Ensemble

89cdde1a980f37c3558340d7a8412e0f?s=128

Ben

September 28, 2012
Tweet

Transcript

  1. 2.
  2. 3.

    Random Matrix Theory The spectral theory of random matrices studies

    the distribution of the eigenvalues as the size of the matrix goes to infinity. Global Regime Empirical Spectral Measure: On average, how are the eigenvalues distributed? Local Regime Bulk statistics: What does the spacing between eigenvalues look like? Edge statistics: What is the limiting distribution of the largest/smallest eigenvalue?
  3. 4.
  4. 5.

    Gaussian Unitary Ensemble (β = 2) Let {xi,j , yi,j

    }∞ i,j=1 be i.i.d. standard Gaussian random variables. Definition The Gaussian Unitary Ensemble (GUE) is the space of all n × n matrices of the form: Xn =        x1,1 x1,2+iy1,2 √ 2 · · · x1,n+iy1,n √ 2 x1,2−iy1,2 √ 2 x2,2 · · · x1,n+iy1,n √ 2 . . . . . . ... . . . x1,n+iy1,n √ 2 xn,2−iyn,2 √ 2 · · · xn,n        The eigenvalues of Xn have joint density given by P(λ1, . . . , λn) = 1 Zn,2 j<k |λj − λk|2e−2 4 n k=1 λ2 k
  5. 6.

    Wigner’s Semicircle Law Theorem (Wigner 1954) Let λn 1 ≤

    λn 2 ≤ · · · ≤ λn n denote the ordered eigenvalues of a matrix 1 √ n Xn. For almost every sequence, {Xn}∞ n=1 , of GOE or GUE matrices 1 n n i=1 δλn i =⇒ SC, where SC is the probability distribution on R with density σ(x) = 1 2π 4 − x21|x|≤2
  6. 7.

    Empirical Semicircle Law ￿1 ￿0.5 0 0.5 1 Λ 0.0

    0.1 0.2 0.3 0.4 P￿Λ￿ Figure : Eigenvalue distribution of a scaled 5000 × 5000 GOE matrix
  7. 8.
  8. 9.

    Tracy-Widom Distribution Theorem (C. Tracy & H. Widom, 1994) Let

    λmax be the largest eigenvalue of a GUE matrix. Then lim n→∞ P n2/3 λmax √ n − 2 < t = FTW2 (t) where FTW2 (t) = det (I − KAiry ) The Airy Kernel operates on L2(t, ∞) Fredholm determinant can be computed explicitly det (I − KAiry ) = exp − ∞ t (x − t)q(x)2dx where q solves the Painlevé II differential equation.
  9. 10.

    Empirical Tracy-Widom Distribution ￿4 ￿2 0 2 4 Λ 0.00

    0.05 0.10 0.15 0.20 0.25 0.30 P￿Λ￿ Figure : Distribution of scaled λmax for 10, 000 GOE matrices of size 500 × 500
  10. 11.
  11. 12.

    Rate of Concentration Question: How quickly do the finite n

    distributions concentrate on the Tracy-Widom distribution?
  12. 13.

    Rate of Concentration Question: How quickly do the finite n

    distributions concentrate on the Tracy-Widom distribution? Rewrite the Tracy-Widom limit theorem as lim n→∞ P λmax < 2 √ n(1 + tn−2/3) = FTW2 (t)
  13. 14.

    Rate of Concentration Question: How quickly do the finite n

    distributions concentrate on the Tracy-Widom distribution? Rewrite the Tracy-Widom limit theorem as lim n→∞ P λmax < 2 √ n(1 + tn−2/3) = FTW2 (t) For n ≥ 1 and ε ∈ (0, 1], look for bounds on P λmax ≥ 2 √ n(1 + ε) and P λmax ≤ 2 √ n(1 − ε)
  14. 15.

    Rate of Concentration Question: How quickly do the finite n

    distributions concentrate on the Tracy-Widom distribution? Rewrite the Tracy-Widom limit theorem as lim n→∞ P λmax < 2 √ n(1 + tn−2/3) = FTW2 (t) For n ≥ 1 and ε ∈ (0, 1], look for bounds on P λmax ≥ 2 √ n(1 + ε) and P λmax ≤ 2 √ n(1 − ε) ε must be small to capture the Tracy-Widom shape
  15. 16.

    Clasical Small Deviations For β = 1, 2, 4, the

    Tracy-Widom distribution functions have shape 1 − FTW2 (t) ∼ e−2 3 βt3/2 as t → ∞, FTW2 (t) ∼ e− 1 24 βt3 as t → −∞ Ledoux (2004) For the largest eigenvalue of a GUE matrix, one has P λmax ≥ 2 √ n(1 + ε) ≤ Ce−βnε3/2/C P λmax ≤ 2 √ n(1 − ε) ≤ Ce−βn2ε3/C Upper bounds are on the right-tail and left-tail respectively Exponents match Tracy-Widom distribution
  16. 17.
  17. 18.

    The β-Hermite Ensemble Definition For any β > 0, the

    β-Hermite ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zn,β j<k |λj − λk|βe−β 4 n k=1 λ2 k
  18. 19.

    The β-Hermite Ensemble Definition For any β > 0, the

    β-Hermite ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zn,β j<k |λj − λk|βe−β 4 n k=1 λ2 k When β = 1, 2, 4 The above density is shared with the Gaussian ensembles
  19. 20.

    The β-Hermite Ensemble Definition For any β > 0, the

    β-Hermite ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zn,β j<k |λj − λk|βe−β 4 n k=1 λ2 k When β = 1, 2, 4 The above density is shared with the Gaussian ensembles Models are exactly solvable: k-point correlation functions can be written in terms of Hermite polynomials
  20. 21.

    The β-Laguerre Ensemble Definition For any β > 0, the

    β-Laguerre ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zβ,a,n i<j |λi − λj |β n−1 k=0 λ β 2 a−1 k e−β 2 n k=1 λk
  21. 22.

    The β-Laguerre Ensemble Definition For any β > 0, the

    β-Laguerre ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zβ,a,n i<j |λi − λj |β n−1 k=0 λ β 2 a−1 k e−β 2 n k=1 λk When β = 1, 2 LOE and LUE are of type XX∗, where X is a n × M(n) matrix with gaussian entries.
  22. 23.

    The β-Laguerre Ensemble Definition For any β > 0, the

    β-Laguerre ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zβ,a,n i<j |λi − λj |β n−1 k=0 λ β 2 a−1 k e−β 2 n k=1 λk When β = 1, 2 LOE and LUE are of type XX∗, where X is a n × M(n) matrix with gaussian entries. Global behavior is characterized by the Marchenko-Pastur Law
  23. 24.

    The β-Laguerre Ensemble Definition For any β > 0, the

    β-Laguerre ensemble is the point-process on R defined by joint density function P(λ1, . . . , λn) = 1 Zβ,a,n i<j |λi − λj |β n−1 k=0 λ β 2 a−1 k e−β 2 n k=1 λk When β = 1, 2 LOE and LUE are of type XX∗, where X is a n × M(n) matrix with gaussian entries. Global behavior is characterized by the Marchenko-Pastur Law Scaled largest eigenvalue converges to Tracy-Widom
  24. 25.
  25. 26.

    The Hermite Tridiagonal Matrix The following tridiagonal matrix models the

    β-Hermite ensemble. Hβ = 1 √ 2        N(0, 2) χ(n−1)β χ(n−1)β N(0, 2) χ(n−2)β ... ... ... χ2β N(0, 2) χβ χβ N(0, 2)       
  26. 27.

    The Hermite Tridiagonal Matrix The following tridiagonal matrix models the

    β-Hermite ensemble. Hβ = 1 √ 2        N(0, 2) χ(n−1)β χ(n−1)β N(0, 2) χ(n−2)β ... ... ... χ2β N(0, 2) χβ χβ N(0, 2)        Theorem (Dimitriu, Edelman (2003)) For any β > 0, the eigenvalues of Hβ have joint density P(λ1, λ2, . . . , λn) = 1 Zβ,n 1≤i<j≤n |λi − λj |βe−β 2 n i=1 λ2 i
  27. 28.

    The Laguerre Tridiagonal Matrix The β-Laguerre ensemble is modeled by

    the tridiagonal matrix Lβ,a = Bβ,aBT β,a , where Bβ,a =      χ(a+n)β χ(n−1)β χ(a+n−1)β ... ... χβ χ(a+1)β      Theorem (Dimitriu, Edelman (2003)) When β, a > 0 the eigenvalues of Lβ,a have joint density P(λ1, . . . , λn) = 1 Zβ,a,n i<j |λi − λj |β n−1 k=0 λ β 2 a−1 k e−β 2 n k=1 λk
  28. 29.
  29. 30.

    Edelman and Sutton Conjectures The centered and scaled Hermite tridiagonal

    matrix should converge to the Stochastic Airy Operator. Edelman and Sutton Conjecture (2005) n1/6 Hβ − 2 √ nIn =⇒ −Hβ Hβ is the Stochastic Airy Operator is defined by Hβ = − d2 dx2 + x + 2 √ β b (x) b’ is "white noise", the formal derivative of Brownian motion Centering and scaling agrees with Tracy-Widom for β = 1, 2, 4
  30. 31.

    Tracy-Widom for β > 0 The centered and scaled largest

    β-Hermite eigenvalue converges to the largest eigenvalue of the Stochastic Airy Operator. Ramírez, Rider, Virág (2006) For almost every sequence of Hermite tridiagonal matrices n1/6 λmax (Hβ) − 2 √ n −→ λmax (Hβ)
  31. 32.

    Tracy-Widom for β > 0 The centered and scaled largest

    β-Hermite eigenvalue converges to the largest eigenvalue of the Stochastic Airy Operator. Ramírez, Rider, Virág (2006) For almost every sequence of Hermite tridiagonal matrices n1/6 λmax (Hβ) − 2 √ n −→ λmax (Hβ) Comments: Almost sure convergence
  32. 33.

    Tracy-Widom for β > 0 The centered and scaled largest

    β-Hermite eigenvalue converges to the largest eigenvalue of the Stochastic Airy Operator. Ramírez, Rider, Virág (2006) For almost every sequence of Hermite tridiagonal matrices n1/6 λmax (Hβ) − 2 √ n −→ λmax (Hβ) Comments: Almost sure convergence Actually proved for the largest k eigenvalues
  33. 34.

    Tracy-Widom for β > 0 The centered and scaled largest

    β-Hermite eigenvalue converges to the largest eigenvalue of the Stochastic Airy Operator. Ramírez, Rider, Virág (2006) For almost every sequence of Hermite tridiagonal matrices n1/6 λmax (Hβ) − 2 √ n −→ λmax (Hβ) Comments: Almost sure convergence Actually proved for the largest k eigenvalues Leads to a definition of the Tracy-Widom Law for all β > 0
  34. 35.

    Variation Viewpoint Define an appropriate space, L∗, on which Hβ

    ”makes sense” L∗ := f : f (0) = 0, and ∞ 0 (f )2 + (1 + x)2f 2dx < ∞
  35. 36.

    Variation Viewpoint Define an appropriate space, L∗, on which Hβ

    ”makes sense” L∗ := f : f (0) = 0, and ∞ 0 (f )2 + (1 + x)2f 2dx < ∞ Associate Hβ with the quadratic form ≺ φ, Hβ φ := ∞ 0 (φ (x))2dx + ∞ 0 xφ2(x) − 2 √ β ∞ 0 bx φ2(x)dx
  36. 37.

    Variation Viewpoint Define an appropriate space, L∗, on which Hβ

    ”makes sense” L∗ := f : f (0) = 0, and ∞ 0 (f )2 + (1 + x)2f 2dx < ∞ Associate Hβ with the quadratic form ≺ φ, Hβ φ := ∞ 0 (φ (x))2dx + ∞ 0 xφ2(x) − 2 √ β ∞ 0 bx φ2(x)dx Characterize the eigenvalue problem in terms of a variational principle Λ0 := inf f ∈L∗ {≺ f , Hβ f : f (0) = 0 and f 2 = 1}
  37. 39.

    Variation Viewpoint (continued) For the Hermite Tridiagonal, define the quadratic

    form v, v Hn := vT n2/3 1 √ n Hβ − 2In v Definition (Tracy-Widom Law for β > 0) TWβ = sup f ∈L∗ 2 √ β ∞ 0 f 2(x)db(x) − ∞ 0 (f (x))2 + xf 2(x) dx Consistant with prior definitions for β = 1, 2, 4
  38. 40.
  39. 41.

    Small Deviations: β-Hermite Ledoux, Rider (2010) For all n ≥

    1, 0 < ε ≤ 1 and β ≥ 1: P λmax(Hβ) ≥ 2 √ n(1 + ε)) ≤ Ce−βnε3/2/C , P λmax(Hβ) ≥ 2 √ n(1 + ε) ≥ C−βe−βnε3/2/C , and P λmax(Hβ) ≥ 2 √ n(1 − ε)) ≤ Ce−βn2ε3/C , P λmax(Hβ) ≥ 2 √ n(1 − ε) ≥ C−βe−βn2ε3/C , where each C is a numerical constant (can be different).
  40. 42.

    Small Deviations: β-Laguerre Ledoux, Rider 2010 For all n ≥

    1, 0 < ε ≤ 1 and β ≥ 1: P λmax (Lβ ) ≥ ( √ κ + √ n)2(1 + ε)) ≤ Ce−β √ nκε3/2 1 √ ε ∧(κ n )1/4 /C , P λmax (Lβ ) ≥ ( √ κ + √ n)2(1 + ε) ≥ C−βe−β √ nκε3/2 1 √ ε ∧(κ n )1/4 /C , and P λmax (Lβ ) ≥ ( √ κ + √ n)2(1 − ε)) ≤ Cβe−βnκε3 1 ε ∧(κ n )1/2 /C , where each C is a numerical constant (can be different).
  41. 43.

    β-Jacobi Ensemble Definition For any β > 0, the β-Jacobi

    ensemble is the point-process on R defined by joint density function P(λ1, λ2, . . . , λn ) = 1 Zβ,n j<k |λj − λk |β n k=1 λ β 2 a−1 k (1 − λk )β 2 b−1 When β = 1, 2 JOE and JUE are of type (A + B)−1B, where A and B are Wishart matrices.
  42. 44.

    β-Jacobi Ensemble Definition For any β > 0, the β-Jacobi

    ensemble is the point-process on R defined by joint density function P(λ1, λ2, . . . , λn ) = 1 Zβ,n j<k |λj − λk |β n k=1 λ β 2 a−1 k (1 − λk )β 2 b−1 When β = 1, 2 JOE and JUE are of type (A + B)−1B, where A and B are Wishart matrices. Scaled largest eigenvalue converges to Tracy-Widom
  43. 45.

    β-Jacobi Ensemble Definition For any β > 0, the β-Jacobi

    ensemble is the point-process on R defined by joint density function P(λ1, λ2, . . . , λn ) = 1 Zβ,n j<k |λj − λk |β n k=1 λ β 2 a−1 k (1 − λk )β 2 b−1 When β = 1, 2 JOE and JUE are of type (A + B)−1B, where A and B are Wishart matrices. Scaled largest eigenvalue converges to Tracy-Widom Wide range of applications in multivariate statistics principal components, canonical correlations, MANOVA
  44. 46.

    Jacobi Tridiagonal The Jacobi Tridiagonal is given by Jβ,n,a,b =

    Bβ,n,a,b · BT β,n,a,b , where Bβ,n,a,b=             cn −sncn−1 cn−1sn−1 −sn−1cn−2 cn−2sn−2 ... ... −s2c1 c1s1             ci and ci independent with ci ∼ Beta(β 2 (an+i), β 2 (bn+i)) and c i ∼ Beta(β 2 i, β 2 (an+bn+1+i)) si = √ 1−c2 i and s i = √ 1−c 2 i
  45. 47.

    Small Deviations Results Theorem (Right-Tail Upper Bound) For all n

    ≥ 1, 0 < ε ≤ 1, and β ≥ 1 P λmax (Jβ) ≥ γ √ n(1 + ε) ≤ Ce−β(a+b)nε3/2/C , where C is a numerical constant. Theorem (Left-Tail Upper Bound) For all n ≥ 1, 0 < ε ≤ 1, and β ≥ 1 P λmax (Jβ) ≤ γ √ n(1 − ε) ≤ Ce−β(a+b)n2ε3/C where C is a numerical constant.
  46. 48.

    A Variance Bound Using both tail bounds we get the

    following bound on the variance of the largest eigenvalue eigenvalue of the β-Jacobi ensemble. Corollary For n ≥ 1 and β ≥ 1 Var [λmax(Jβ)] ≤ Cβn−1/3, where Cβ is a numerical constant. Sketch of Proof: Use Fubini’s Theorem to get Var[λmax(Jβ)]≤ γ2n ∞ 0 P(|λmax(Jβ)−γ √ n|≥γ √ nε)dε2. Use tail bounds for small ε.
  47. 49.
  48. 50.

    Bounding Jβ 1 Define a finite dimensional quadratic form 1

    √ n Jn (v) := v, v Jβ = vT [Jβ − γIn ] v 2 Bound Jn above and below by Jc (v) = n k=1 Zk v2 k + n k=1 Zk v2 k + 2 n−1 k=1 Yk vk vk+1 − c √ n n k=1 (vk+1 + vk )2 − c √ n n k=1 kv2 k , where Zk = √ n c2 n−k+1 s 2 n−k+1 − E[c2 n−k+1 ]E[s 2 n−k+1 ] , Zk = √ n s2 n−k+1 c 2 n−k − E[s2 n−k+1 ]E[c 2 n−k ] , Yk = √ n cn−k sn−k+1 cn−k sn−k − E[cn−k ]E[sn−k+1 ]E[cn−k sn−k ]
  49. 51.

    Gaussian Estimates Need Gaussian bounds on Zk, Zk, and Yk

    of the form E[eλzk ] ≤ ecλ2/β(a+b) for some c > 0 and all λ ∈ R 1 Log-Sobelov inequality exists for the beta measure f log f dµ − f dµ log f dµ ≤ 2C | f |2dµ 2 Can apply Herbst argument to any Lipschitz function to get E eλF ≤ eCλ2 F 2 Lip /2 3 Problem: F(X) = √ X is not Lipshitz on [0, 1] 4 Fix: The beta measure is the invariant measure for a diffusion process that converges rapidly on [0, 1].
  50. 53.

    Lower Bounds (Open Problem) Right-Tail involves choosing the right test

    vector Left-Tail seems much harder Hermite case used a Gaussian argument Leguerre case is open