Upgrade to Pro — share decks privately, control downloads, hide ads and more …

論文紹介:User-friendly introduction to PAC-Bayes

Masanari Kimura
February 19, 2022

論文紹介:User-friendly introduction to PAC-Bayes

Masanari Kimura

February 19, 2022
Tweet

More Decks by Masanari Kimura

Other Decks in Research

Transcript

  1. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References 論文紹介:User-friendly introduction to PAC-Bayes Masanari Kimura 総研大 統計科学専攻 日野研究室 [email protected]
  2. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Intro 2/30
  3. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Introduction 3/30
  4. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References TL;DR ▶ 汎化誤差解析のために有用なツールである PAC-Bayes について紹介 4/30
  5. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Basic notations ▶ 予測器集合:{fθ ; θ ∈ Θ} ▶ 汎化誤差:R(θ) := E(X,Y)∼P [ ℓ(fθ (X), Y) ] ▶ 経験誤差:r(θ) := 1 n ∑n i=1 ℓ(fθ (Xi), Yi) ▶ 推定量:ˆ θ : ∪ ∞ n=1 (X × Y)n → Θ 5/30
  6. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References PAC bound Theorem (PAC bound) card(Θ) = M < +∞ とする.このとき任意の ϵ > 0 について以下が成り 立つ: P ( ∀θ ∈ Θ, R(θ) ≤ r(θ) + C √ log M ϵ 2n ) ≥ 1 − ϵ. (1) ここから,ERM によって得られる推定量 ˆ θERM := arg min θ∈Θ r(θ) について, P ( R(ˆ θERM) ≤ inf θ∈Θ [ r(θ) + C √ log M ϵ 2n ]) ≥ 1 − ϵ. (2) 6/30
  7. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Proof. ある θ ∈ Θ について,Hoeffding の不等式から, E [ etn(R(θ)−r(θ)) ] ≤ e nt2C2 8 . (3) ここで,ある任意の s > 0 について, P(R(θ) − r(θ) > s) = P(ent(R(θ)−r(θ)) > ents) ≤ E[ent(R(θ)−r(θ))] ents (∵ Markov 不等式) ≤ e nt2C2 8 −nts (∵ Eq. (3)). nt2C2/8 − nts は t = 4s/C2 で最小化されるので, P(R(θ) > r(θ) + s) ≤ e −2ns2 C2 . (4) card(Θ) = M < +∞ を仮定すると,union bound から, P ( sup θ∈Θ (R(θ) − r(θ)) > s ) = P ( ∪ θ∈Θ { R(θ) − r(θ) > s }) ≤ ∑ θ∈Θ P(R(θ) > r(θ) + s) (5) ≤ Me −2ns2 C2 . (6) 7/30
  8. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References PAC-Bayes bounds 8/30
  9. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Additional notations ▶ data-dependent probability measure: ˆ ρ : ∪ ∞ n=1 (X × Y)n → P(Θ); ▶ randomized estimator: ˜ θ ∼ ˆ ρ; ▶ aggregated predictor: fˆ ρ (·) = Eθ∼ˆ ρ [fθ (·)]; ▶ prior: π ∈ P(Θ) 9/30
  10. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Catoni’s bound Theorem (Catoni’s bound) 任意の λ > 0 と ϵ ∈ (0, 1) について, P ( ∀ρ ∈ P(Θ), Eθ∼ρ [R(θ)] ≤ Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] + log 1 ϵ λ ) ≥ 1 − ϵ. (7) Lemma (Donsker and Varadhan’s variational formula [Donsker and Varadhan, 1976]) 任意の可測 有界な関数 h : Θ → R について以下が成り立つ. log Eθ∼π [eh(θ)] = sup ρ∼P(Θ) [ Eθ∼ρ [h(θ)] − KL[ρ∥π] ] . (8) 10/30
  11. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Proof. ES [ etn(R(θ)−r(θ)) ] ≤ e nt2C2 8 (∵ Hoeffding の不等式) ES [ eλ(R(θ)−r(θ)) ] ≤ e λ2C2 8n (∵ t = λ/n) Eθ∼πES [eλ(R(θ)−r(θ))] ≤ e λ2C2 8n ES Eθ∼π[eλ(R(θ)−r(θ))] ≤ e λ2C2 8n (∵ Fubini の定理) ES [ esup ρ∼P(Θ) λEθ∼ρ[R(θ)−r(θ)]−KL[ρ∥π] ] ≤ e λ2C2 8n (∵ [Donsker and Varadhan, 1976]) PS [ sup ρ∈P(Θ) λEθ∼ρ[R(θ) − r(θ)] − KL[ρ∥π] − λ2C2 8n > s ] ≤ e−s PS [ sup ρ∈P(Θ) λEθ∼ρ[R(θ) − r(θ)] − KL[ρ∥π] − λ2C2 8n > log 1 ϵ ] ≤ ϵ (∵ e−s = ϵ). 11/30
  12. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Gibbs posterior Definition ˆ ρλ (dθ) = e−λr(θ)π(dθ) Eη∼π [e−λr(η)] . (Gibbs posterior) (9) Corollary Gibbs posterior は Catoni’s bound (1)の右辺を最小化する: ˆ ρλ = arg min ρ∈P(Θ) Eθ∼ρ [r(θ)] + KL[ρ∥π] λ . (10) ∀ϵ > 0, PS Eθ∼ˆ ρλ [R(θ)] ≤ inf ρ∈P(Θ) Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] + log 1 ϵ λ ≥ 1 − ϵ. (11) 12/30
  13. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Example: Finite case card(Θ) = M < +∞ とする.このとき Gibbs posterior は ˆ ρλ (θ) = e−λr(θ)π(θ) η∈Θ e−λr(η)π(η) . (12) このとき,少なくとも 1 − ϵ の確率で Eθ∼ˆ ρλ [R(θ)] ≤ inf ρ∼P(Θ) Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] + log 1 ϵ λ . (13) このバウンドは全ての ρ ∈ P(Θ) について成り立つので,ディラック測度の集合 {δθ ; θ ∈ Θ} につい て明らかに Eθ∼δθ [r(θ)] = r(θ), KL[δθ ∥π] = η∈Θ log δθ (η) π(dη) δθ (η) = log 1 π(θ) (14) であるので,バウンドは PS Eθ∼ˆ ρλ [R(θ)] ≤ inf θ∈Θ r(θ) + λC2 8n + log 1 π(θ) + log 1 ϵ λ ≥ 1 − ϵ. (15) 13/30
  14. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Tight and non-vacuous PAC-Bayes bounds 14/30
  15. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Why is there a race to the tighter PAC-Bayes bound? 以下で与えられるシンプルな 2 値ニューラルネットワークを考える: fw(x) =  M i=1 w(2) i φ d j=1 w(1) j,i xj ≥ 0 . (16) 例えば,0−1 損失を用いて, ▶ 入力が 100 × 100 のグレースケール画像, ▶ サンプルサイズが n = 10, 000, ▶ ニューラルネットのユニット数が M = 100 とすると,ϵ = 0.05 で PS Eθ∼ˆ ρλ [R(θ)] ≤ 1 · log 21000,100 0.005 2 × 10, 000 = PS Eθ∼ˆ ρλ [R(θ)] ≤ 13.58 ≥ 0.95. (17) となることから,Gibbs posterior を用いたリスクは 少なくとも 95% の確率で 13.58 以下になる 意味のない主張(∵ R(θ) ≤ 1) . 15/30
  16. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References A few PAC-Bayes bounds ▶ McAllester’s bound [McAllester, 1999] ▶ Catoni’s bound (another one) [Catoni, 2007] ▶ Maurer’s bound [Maurer, 2004] ▶ Seeger’s bound [Seeger, 2002] ▶ Tolstikhin and Seldin’s bound [Tolstikhin and Seldin, 2013] ▶ Thieman, Igel, Wintenberger and Seldin’s bound [Thiemann et al., 2017] ▶ Germain, Lacasse, Laviolette and Marchan’s bound [Germain et al., 2009] 16/30
  17. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References McAllester’s bound [McAllester, 1999] Theorem (McAllester’s bound [McAllester, 1999]) 任意の ϵ > 0 について, PS [ ∀ρ ∈ P(Θ), Eθ∈ρ [r(θ)] + √ KL[ρ∥π] + log 1 ϵ + 5 2 log(n) + 8 2n − 1 ] ≤ ϵ. (18) ▶ Catoni’s bound と比べてパラメータ λ が出てこないので右辺の最小化ができない; ▶ 不等式 √ ab ≤ aλ/2 + b/(2λ) を使って意図的に最適化対象のパラメータを追加するテ クがある. 17/30
  18. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Catoni’s bound (another one) [Catoni, 2007] Theorem (Catoni’s bound (another one) [Catoni, 2007]) ある α > 0 と p ∈ (0, 1) についての関数を Φα (p) = − log{1 − p(1 − e−α)} α (19) と定義すると,ある λ > 0 と ϵ ∈ (0, 1) について, PS [ ∀ρ ∈ P(Θ), Eθ∈ρ [R(θ)] ≤ Φ−1 λ n ( Eθ∼ρ [r(θ)] + KL[ρ∥π] + log 1 ϵ λ )] ≥ 1 − ϵ. (20) 18/30
  19. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Maurer’s bound [Maurer, 2004] Theorem (Maurer’s bound [Maurer, 2004]) Bernoulli 分布 B(p) を用いて, kl(p, q) := KL[B(p)∥B(q)] = p log p q + (1 − p) log 1 − p 1 − q (21) と定義すると,任意の ϵ > 0 について PS [ ∀ρ ∈ P(Θ), kl ( Eθ∼ρ [r(θ)], Eθ∼ρ [R(θ)] ) ≤ KL[ρ∥π] + log 2 √ n ϵ n ] ≤ ϵ. (22) 19/30
  20. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Seeger’s bound [Seeger, 2002] Theorem (Seeger’s bound [Seeger, 2002]) kl−1(q, b) = sup{p ∈ [0, 1]; kl(p, q) ≤ b} (23) と定義すると,Maurer’s bound [Maurer, 2004] から, PS [ ∀ρ ∈ P(Θ), Eθ∼ρ [R(θ)] ≤ kl−1 ( Eθ∼ρ [r(θ)], KL[ρ∥π] + log 2 √ n ϵ n )] ≤ ϵ. (24) 20/30
  21. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Tolstikhin and Seldin’s bound [Tolstikhin and Seldin, 2013] Theorem (Tolstikhin and Seldin’s bound [Tolstikhin and Seldin, 2013]) 以下の不等式 kl−1(q, b) ≤ q + √ 2qb + 2b (25) を用いて,任意の ϵ > 0 について以下が得られる: PS [ ∀ρ ∈ P(Θ), Eθ∼ρ[R(θ)] ≤ Eθ∼ρ[r(θ)] + √ 2Eθ∼ρ[r(θ)] KL[ρ∥π] + log 2 √ n ϵ 2n + 2 KL[ρ∥π] + log 2 √ n ϵ 2n ] ≤ ϵ. (26) ▶ 一般的に,経験的な PAC-Bayes bound のオーダは 1/ √ n; ▶ Eθ∼ρ[r(θ)] = 0 のとき,1/ √ n の項が落ちて 1/n の項だけが残る(noiseless なケースでのオーダは 1/n) . 21/30
  22. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Thieman, Igel, Wintenberger and Seldin’s bound [Thiemann et al., 2017] Theorem (Thieman, Igel, Wintenberger and Seldin’s bound [Thiemann et al., 2017]) 以下の不等式 √ ab ≤ λa 2 + b 2λ (27) を用いると,Seeger’s bound [Seeger, 2002] から, PS [ ∀ρ ∈ P(Θ), Eθ∼ρ [R(θ)] ≤ Eθ∼ρ [r(θ)] 1 − λ 2 + KL[ρ∥π] + log 2 √ n ϵ nλ(1 − λ 2 ) ] ≤ ϵ. (28) 22/30
  23. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Germain, Lacasse, Laviolette and Marchan’s bound [Germain et al., 2009] Theorem (Germain, Lacasse, Laviolette and Marchan’s bound [Germain et al., 2009]) 任意の凸関数 D : [0, 1]2 → R と ϵ > 0 について, PS [ ∀ρ ∈ P(Θ), D(Eθ∼ρ [r(θ)], Eθ∼ρ [R(θ)]) ≤ KL[ρ∥π] + log ES Eθ∼π enD(r(θ),R(θ)) n ] ≤ ϵ. (29) 23/30
  24. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References PAC-Bayes oracle inequalities and fast rates 24/30
  25. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Oracle bound in expectation 経験的 PAC-Bayes bound と同様に,オラクル PAC-Bayes bound の導出も可能. ES Eθ∼ˆ ρλ [R(θ)] ≤ ES [ inf ρ∈P(θ) { Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] λ }] ≤ inf ρ∈P(θ) { ES [ Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] λ ]} = inf ρ∈P(θ) { ES [ Eθ∼ρ [r(θ)] ] + λC2 8n + KL[ρ∥π] λ } = inf ρ∈P(θ) { Eθ∼ρ [ ES [r(θ)] ] + λC2 8n + KL[ρ∥π] λ } (∵ Fubini の定理) = inf ρ∈P(θ) { Eθ∼ρ [ R(θ) ] + λC2 8n + KL[ρ∥π] λ } (∵ ES [r(θ)] = R(θ)). 25/30
  26. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Oracle bound in probability Theorem 任意の λ > 0 と ϵ ∈ (0, 1) について, PS ( Eθ∼ˆ ρλ [R(θ)] ≤ inf ρ∈P(Θ) { Eθ∼ρ [R(θ)] + 2 λC2 4n + KL[ρ∥π] + log 2 ϵ λ }) ≥ 1 − ϵ. (30) 26/30
  27. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Conclusion 27/30
  28. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References Other topics ▶ log の項の削除(Catoni’s localization trick) ▶ 非有界な損失関数についての議論 ▶ non-i.i.d な設定における議論 28/30
  29. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References References I Olivier Catoni. Pac-bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248, 2007. Monroe D Donsker and SR Srinivasa Varadhan. Asymptotic evaluation of certain markov process expectations for large time―iii. Communications on pure and applied Mathematics, 29(4):389–461, 1976. Pascal Germain, Alexandre Lacasse, François Laviolette, and Mario Marchand. Pac-bayesian learning of linear classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 353–360, 2009. Andreas Maurer. A note on the pac bayesian theorem. arXiv preprint cs/0411099, 2004. David A McAllester. Some pac-bayesian theorems. Machine Learning, 37(3):355–363, 1999. Matthias Seeger. Pac-bayesian generalisation error bounds for gaussian process classification. Journal of machine learning research, 3(Oct):233–269, 2002. 29/30
  30. Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

    inequalities and fast rates Conclusion References References II Niklas Thiemann, Christian Igel, Olivier Wintenberger, and Yevgeny Seldin. A strongly quasiconvex pac-bayesian bound. In International Conference on Algorithmic Learning Theory, pages 466–492. PMLR, 2017. Ilya O Tolstikhin and Yevgeny Seldin. Pac-bayes-empirical-bernstein inequality. Advances in Neural Information Processing Systems, 26, 2013. 30/30