Masanari Kimura
February 19, 2022
290

# 論文紹介：User-friendly introduction to PAC-Bayes

## Masanari Kimura

February 19, 2022

## Transcript

1. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References 論文紹介：User-friendly introduction to PAC-Bayes Masanari Kimura 総研大 統計科学専攻 日野研究室 mkimura@ism.ac.jp
2. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Intro 2/30
3. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Introduction 3/30
4. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References TL;DR ▶ 汎化誤差解析のために有用なツールである PAC-Bayes について紹介 4/30
5. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Basic notations ▶ 予測器集合：{fθ ; θ ∈ Θ} ▶ 汎化誤差：R(θ) := E(X,Y)∼P [ ℓ(fθ (X), Y) ] ▶ 経験誤差：r(θ) := 1 n ∑n i=1 ℓ(fθ (Xi), Yi) ▶ 推定量：ˆ θ : ∪ ∞ n=1 (X × Y)n → Θ 5/30
6. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References PAC bound Theorem (PAC bound) card(Θ) = M < +∞ とする．このとき任意の ϵ > 0 について以下が成り 立つ： P ( ∀θ ∈ Θ, R(θ) ≤ r(θ) + C √ log M ϵ 2n ) ≥ 1 − ϵ. (1) ここから，ERM によって得られる推定量 ˆ θERM := arg min θ∈Θ r(θ) について， P ( R(ˆ θERM) ≤ inf θ∈Θ [ r(θ) + C √ log M ϵ 2n ]) ≥ 1 − ϵ. (2) 6/30
7. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Proof. ある θ ∈ Θ について，Hoeffding の不等式から， E [ etn(R(θ)−r(θ)) ] ≤ e nt2C2 8 . (3) ここで，ある任意の s > 0 について， P(R(θ) − r(θ) > s) = P(ent(R(θ)−r(θ)) > ents) ≤ E[ent(R(θ)−r(θ))] ents (∵ Markov 不等式) ≤ e nt2C2 8 −nts (∵ Eq. (3)). nt2C2/8 − nts は t = 4s/C2 で最小化されるので， P(R(θ) > r(θ) + s) ≤ e −2ns2 C2 . (4) card(Θ) = M < +∞ を仮定すると，union bound から， P ( sup θ∈Θ (R(θ) − r(θ)) > s ) = P ( ∪ θ∈Θ { R(θ) − r(θ) > s }) ≤ ∑ θ∈Θ P(R(θ) > r(θ) + s) (5) ≤ Me −2ns2 C2 . (6) 7/30
8. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References PAC-Bayes bounds 8/30
9. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Additional notations ▶ data-dependent probability measure: ˆ ρ : ∪ ∞ n=1 (X × Y)n → P(Θ)； ▶ randomized estimator: ˜ θ ∼ ˆ ρ； ▶ aggregated predictor: fˆ ρ (·) = Eθ∼ˆ ρ [fθ (·)]； ▶ prior: π ∈ P(Θ) 9/30
10. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Catoni’s bound Theorem (Catoni’s bound) 任意の λ > 0 と ϵ ∈ (0, 1) について， P ( ∀ρ ∈ P(Θ), Eθ∼ρ [R(θ)] ≤ Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] + log 1 ϵ λ ) ≥ 1 − ϵ. (7) Lemma (Donsker and Varadhan’s variational formula [Donsker and Varadhan, 1976]) 任意の可測 有界な関数 h : Θ → R について以下が成り立つ． log Eθ∼π [eh(θ)] = sup ρ∼P(Θ) [ Eθ∼ρ [h(θ)] − KL[ρ∥π] ] . (8) 10/30
11. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Proof. ES [ etn(R(θ)−r(θ)) ] ≤ e nt2C2 8 (∵ Hoeffding の不等式) ES [ eλ(R(θ)−r(θ)) ] ≤ e λ2C2 8n (∵ t = λ/n) Eθ∼πES [eλ(R(θ)−r(θ))] ≤ e λ2C2 8n ES Eθ∼π[eλ(R(θ)−r(θ))] ≤ e λ2C2 8n (∵ Fubini の定理) ES [ esup ρ∼P(Θ) λEθ∼ρ[R(θ)−r(θ)]−KL[ρ∥π] ] ≤ e λ2C2 8n (∵ [Donsker and Varadhan, 1976]) PS [ sup ρ∈P(Θ) λEθ∼ρ[R(θ) − r(θ)] − KL[ρ∥π] − λ2C2 8n > s ] ≤ e−s PS [ sup ρ∈P(Θ) λEθ∼ρ[R(θ) − r(θ)] − KL[ρ∥π] − λ2C2 8n > log 1 ϵ ] ≤ ϵ (∵ e−s = ϵ). 11/30
12. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Gibbs posterior Definition ˆ ρλ (dθ) = e−λr(θ)π(dθ) Eη∼π [e−λr(η)] . (Gibbs posterior) (9) Corollary Gibbs posterior は Catoni’s bound (1)の右辺を最小化する： ˆ ρλ = arg min ρ∈P(Θ) Eθ∼ρ [r(θ)] + KL[ρ∥π] λ . (10) ∀ϵ > 0, PS Eθ∼ˆ ρλ [R(θ)] ≤ inf ρ∈P(Θ) Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] + log 1 ϵ λ ≥ 1 − ϵ. (11) 12/30
13. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Example: Finite case card(Θ) = M < +∞ とする．このとき Gibbs posterior は ˆ ρλ (θ) = e−λr(θ)π(θ) η∈Θ e−λr(η)π(η) . (12) このとき，少なくとも 1 − ϵ の確率で Eθ∼ˆ ρλ [R(θ)] ≤ inf ρ∼P(Θ) Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] + log 1 ϵ λ . (13) このバウンドは全ての ρ ∈ P(Θ) について成り立つので，ディラック測度の集合 {δθ ; θ ∈ Θ} につい て明らかに Eθ∼δθ [r(θ)] = r(θ), KL[δθ ∥π] = η∈Θ log δθ (η) π(dη) δθ (η) = log 1 π(θ) (14) であるので，バウンドは PS Eθ∼ˆ ρλ [R(θ)] ≤ inf θ∈Θ r(θ) + λC2 8n + log 1 π(θ) + log 1 ϵ λ ≥ 1 − ϵ. (15) 13/30
14. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Tight and non-vacuous PAC-Bayes bounds 14/30
15. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Why is there a race to the tighter PAC-Bayes bound? 以下で与えられるシンプルな 2 値ニューラルネットワークを考える： fw(x) =  M i=1 w(2) i φ d j=1 w(1) j,i xj ≥ 0 . (16) 例えば，0−1 損失を用いて， ▶ 入力が 100 × 100 のグレースケール画像， ▶ サンプルサイズが n = 10, 000， ▶ ニューラルネットのユニット数が M = 100 とすると，ϵ = 0.05 で PS Eθ∼ˆ ρλ [R(θ)] ≤ 1 · log 21000,100 0.005 2 × 10, 000 = PS Eθ∼ˆ ρλ [R(θ)] ≤ 13.58 ≥ 0.95. (17) となることから，Gibbs posterior を用いたリスクは 少なくとも 95% の確率で 13.58 以下になる 意味のない主張（∵ R(θ) ≤ 1） ． 15/30
16. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References A few PAC-Bayes bounds ▶ McAllester’s bound [McAllester, 1999] ▶ Catoni’s bound (another one) [Catoni, 2007] ▶ Maurer’s bound [Maurer, 2004] ▶ Seeger’s bound [Seeger, 2002] ▶ Tolstikhin and Seldin’s bound [Tolstikhin and Seldin, 2013] ▶ Thieman, Igel, Wintenberger and Seldin’s bound [Thiemann et al., 2017] ▶ Germain, Lacasse, Laviolette and Marchan’s bound [Germain et al., 2009] 16/30
17. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References McAllester’s bound [McAllester, 1999] Theorem (McAllester’s bound [McAllester, 1999]) 任意の ϵ > 0 について， PS [ ∀ρ ∈ P(Θ), Eθ∈ρ [r(θ)] + √ KL[ρ∥π] + log 1 ϵ + 5 2 log(n) + 8 2n − 1 ] ≤ ϵ. (18) ▶ Catoni’s bound と比べてパラメータ λ が出てこないので右辺の最小化ができない； ▶ 不等式 √ ab ≤ aλ/2 + b/(2λ) を使って意図的に最適化対象のパラメータを追加するテ クがある． 17/30
18. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Catoni’s bound (another one) [Catoni, 2007] Theorem (Catoni’s bound (another one) [Catoni, 2007]) ある α > 0 と p ∈ (0, 1) についての関数を Φα (p) = − log{1 − p(1 − e−α)} α (19) と定義すると，ある λ > 0 と ϵ ∈ (0, 1) について， PS [ ∀ρ ∈ P(Θ), Eθ∈ρ [R(θ)] ≤ Φ−1 λ n ( Eθ∼ρ [r(θ)] + KL[ρ∥π] + log 1 ϵ λ )] ≥ 1 − ϵ. (20) 18/30
19. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Maurer’s bound [Maurer, 2004] Theorem (Maurer’s bound [Maurer, 2004]) Bernoulli 分布 B(p) を用いて， kl(p, q) := KL[B(p)∥B(q)] = p log p q + (1 − p) log 1 − p 1 − q (21) と定義すると，任意の ϵ > 0 について PS [ ∀ρ ∈ P(Θ), kl ( Eθ∼ρ [r(θ)], Eθ∼ρ [R(θ)] ) ≤ KL[ρ∥π] + log 2 √ n ϵ n ] ≤ ϵ. (22) 19/30
20. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Seeger’s bound [Seeger, 2002] Theorem (Seeger’s bound [Seeger, 2002]) kl−1(q, b) = sup{p ∈ [0, 1]; kl(p, q) ≤ b} (23) と定義すると，Maurer’s bound [Maurer, 2004] から， PS [ ∀ρ ∈ P(Θ), Eθ∼ρ [R(θ)] ≤ kl−1 ( Eθ∼ρ [r(θ)], KL[ρ∥π] + log 2 √ n ϵ n )] ≤ ϵ. (24) 20/30
21. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Tolstikhin and Seldin’s bound [Tolstikhin and Seldin, 2013] Theorem (Tolstikhin and Seldin’s bound [Tolstikhin and Seldin, 2013]) 以下の不等式 kl−1(q, b) ≤ q + √ 2qb + 2b (25) を用いて，任意の ϵ > 0 について以下が得られる： PS [ ∀ρ ∈ P(Θ), Eθ∼ρ[R(θ)] ≤ Eθ∼ρ[r(θ)] + √ 2Eθ∼ρ[r(θ)] KL[ρ∥π] + log 2 √ n ϵ 2n + 2 KL[ρ∥π] + log 2 √ n ϵ 2n ] ≤ ϵ. (26) ▶ 一般的に，経験的な PAC-Bayes bound のオーダは 1/ √ n； ▶ Eθ∼ρ[r(θ)] = 0 のとき，1/ √ n の項が落ちて 1/n の項だけが残る（noiseless なケースでのオーダは 1/n） ． 21/30
22. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Thieman, Igel, Wintenberger and Seldin’s bound [Thiemann et al., 2017] Theorem (Thieman, Igel, Wintenberger and Seldin’s bound [Thiemann et al., 2017]) 以下の不等式 √ ab ≤ λa 2 + b 2λ (27) を用いると，Seeger’s bound [Seeger, 2002] から， PS [ ∀ρ ∈ P(Θ), Eθ∼ρ [R(θ)] ≤ Eθ∼ρ [r(θ)] 1 − λ 2 + KL[ρ∥π] + log 2 √ n ϵ nλ(1 − λ 2 ) ] ≤ ϵ. (28) 22/30
23. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Germain, Lacasse, Laviolette and Marchan’s bound [Germain et al., 2009] Theorem (Germain, Lacasse, Laviolette and Marchan’s bound [Germain et al., 2009]) 任意の凸関数 D : [0, 1]2 → R と ϵ > 0 について， PS [ ∀ρ ∈ P(Θ), D(Eθ∼ρ [r(θ)], Eθ∼ρ [R(θ)]) ≤ KL[ρ∥π] + log ES Eθ∼π enD(r(θ),R(θ)) n ] ≤ ϵ. (29) 23/30
24. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References PAC-Bayes oracle inequalities and fast rates 24/30
25. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Oracle bound in expectation 経験的 PAC-Bayes bound と同様に，オラクル PAC-Bayes bound の導出も可能． ES Eθ∼ˆ ρλ [R(θ)] ≤ ES [ inf ρ∈P(θ) { Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] λ }] ≤ inf ρ∈P(θ) { ES [ Eθ∼ρ [r(θ)] + λC2 8n + KL[ρ∥π] λ ]} = inf ρ∈P(θ) { ES [ Eθ∼ρ [r(θ)] ] + λC2 8n + KL[ρ∥π] λ } = inf ρ∈P(θ) { Eθ∼ρ [ ES [r(θ)] ] + λC2 8n + KL[ρ∥π] λ } (∵ Fubini の定理) = inf ρ∈P(θ) { Eθ∼ρ [ R(θ) ] + λC2 8n + KL[ρ∥π] λ } (∵ ES [r(θ)] = R(θ)). 25/30
26. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Oracle bound in probability Theorem 任意の λ > 0 と ϵ ∈ (0, 1) について， PS ( Eθ∼ˆ ρλ [R(θ)] ≤ inf ρ∈P(Θ) { Eθ∼ρ [R(θ)] + 2 λC2 4n + KL[ρ∥π] + log 2 ϵ λ }) ≥ 1 − ϵ. (30) 26/30
27. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Conclusion 27/30
28. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References Other topics ▶ log の項の削除（Catoni’s localization trick） ▶ 非有界な損失関数についての議論 ▶ non-i.i.d な設定における議論 28/30
29. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References References I Olivier Catoni. Pac-bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248, 2007. Monroe D Donsker and SR Srinivasa Varadhan. Asymptotic evaluation of certain markov process expectations for large time―iii. Communications on pure and applied Mathematics, 29(4):389–461, 1976. Pascal Germain, Alexandre Lacasse, François Laviolette, and Mario Marchand. Pac-bayesian learning of linear classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 353–360, 2009. Andreas Maurer. A note on the pac bayesian theorem. arXiv preprint cs/0411099, 2004. David A McAllester. Some pac-bayesian theorems. Machine Learning, 37(3):355–363, 1999. Matthias Seeger. Pac-bayesian generalisation error bounds for gaussian process classification. Journal of machine learning research, 3(Oct):233–269, 2002. 29/30
30. ### Intro PAC-Bayes bounds Tight and non-vacuous PAC-Bayes bounds PAC-Bayes oracle

inequalities and fast rates Conclusion References References II Niklas Thiemann, Christian Igel, Olivier Wintenberger, and Yevgeny Seldin. A strongly quasiconvex pac-bayesian bound. In International Conference on Algorithmic Learning Theory, pages 466–492. PMLR, 2017. Ilya O Tolstikhin and Yevgeny Seldin. Pac-bayes-empirical-bernstein inequality. Advances in Neural Information Processing Systems, 26, 2013. 30/30