Upgrade to Pro — share decks privately, control downloads, hide ads and more …

論文紹介:On the Importance of Gradients for Detecting Distributional Shifts in the Wild

Masanari Kimura
November 05, 2022

論文紹介:On the Importance of Gradients for Detecting Distributional Shifts in the Wild

Masanari Kimura

November 05, 2022
Tweet

More Decks by Masanari Kimura

Other Decks in Research

Transcript

  1. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References 論文紹介:On the Importance of Gradients for Detecting Distributional Shifts in the Wild Masanari Kimura 総研大 統計学専攻 日野研究室 [email protected]
  2. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Intro 2/19
  3. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Introduction 3/19
  4. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References TL;DR ▶ Huang et al. [2021]; ▶ 勾配空間の情報を用いて Out-of-Distribution(OOD)な入力を検出する GradNorm を提案; ▶ GradNorm は,softmax の出力と一様確率分布との間の KL-divergence から誤差逆伝 播された勾配のベクトルノルムを使用; ▶ In-Distribution(ID)データについての勾配の方が OOD データについての勾配より も大きくなりやすいという観測; ▶ GitHub:https://github.com/deeplearning-wisc/gradnorm_ood. 4/19
  5. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Preliminaries ▶ 入力空間:X = Rd; ▶ 出力空間:Y = {1, 2, . . . , C}; ▶ 未知の分布 P から生成される学習データ:D = {(xi, yi)}N i=1 ; ▶ ニューラルネットワーク:f(x; θ) : X → RC; ▶ 経験誤差:RL (f) = ED (LCE(f(x; θ), y)); ▶ 温度 T のクロスエントロピー損失:LCE(f(x), y) = − log efy(x)/T ∑ C c=1 efc(x)/T . 5/19
  6. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Problem statement ▶ Out-of-distriution (OOD) detection は 2 値分類問題として捉えることができる; ▶ ゴールは次のような分類器 g(x) を得ること: g(x) = in, if S(x) ≥ γ out, if S(x) < γ, (1) ▶ 問題は,どのようにスコアリング関数 S(x を作るかということ: ▶ 既存の多くの手法は,モデルの出力や特徴量に注目していた; ▶ 本手法では,勾配空間の情報を用いることを検討する. 6/19
  7. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Gradient-based OOD Detection 7/19
  8. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Gradient-based OOD Detection ▶ softmax 出力と一様分布の間の KL-divergence の誤差逆伝播によって勾配を計算 する: DKL[u∥softmax(f(x))] = − 1 C C c=1 log efc(x)/T C j=1 efj(x)/T − H(u), (2) ▶ 第 1 項はクロスエントロピー; ▶ 第 2 項は定数. 8/19
  9. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References GradNorm as OOD score ▶ パラメータ θ の要素 w について,KL-divergence の勾配は ∂DKL[u∥softmax(f(x))] ∂w = 1 C C i=1 ∂LCE(f(x), i) ∂w . (3) ▶ ⇐ KL-divergence の勾配は全ラベルについてのカテゴリカルクロスエントロピーの微 分の平均に等しい. ▶ これを用いて,OOD スコアを次のように定義(GradNorm) : S(x) = ∂DKL(u∥softmax(f(x))) ∂w p (4) 9/19
  10. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Experiments 10/19
  11. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Experiments: setup ▶ 学習データセット:ImageNet ▶ OOD 評価のためのテストデータセット: ▶ iNaturalist[Van Horn et al., 2018] ▶ SUN[Xiao et al., 2010] ▶ Places[Zhou et al., 2017] ▶ Textures[Cimpoi et al., 2014] ▶ モデル: ▶ Google Bit-S with ResNetv2-101[He et al., 2016] ▶ Google Bit-S with DenseNet-121[Huang et al., 2017] 11/19
  12. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Experiments I 12/19
  13. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Experiments II 13/19
  14. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Analysis of Gradient-based Method 14/19
  15. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References GradNorm captures joint information between feature and output DKL[u∥softmax(f(x))] = − 1 C C c=1 log efc/T C j=1 efj/T − H(u) = − 1 C 1 T C c=1 fc − C · log C j=1 efj/T − H(u) (w.r.t. output) ∂DKL ∂fc = − 1 CT  1 − CT · ∂ log C j=1 efj/T ∂fc   = − 1 CT 1 − C · efc/T C j=1 efj/T (w.r.t. weight) ∂DKL ∂W = x ∂DKL ∂f S(x) = m i=1 C j=1 ∂DKL ∂W ij = 1 CT m i=1 |xi| C j=1 1 − C · efj/T C j=1 efi/T = 1 CT m i=1 |xi| 入力空間の情報 C j=1 1 − C · efj/T C j=1 efj/T 出力空間の情報 15/19
  16. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Discussion 16/19
  17. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References Discussion ▶ 勾配の情報を用いて OOD 検出をする手法を提案; ▶ KL-divergence の微分が入力情報と出力情報を含んでいることを示唆. 17/19
  18. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References References I Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017. Rui Huang, Andrew Geng, and Yixuan Li. On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34: 677–689, 2021. 18/19
  19. . . . . . . . . . .

    Intro . . . . . . Gradient-based OOD Detection . . . . . . . . Experiments . . . . Analysis of Gradient-based Method . . . . Discussion References References II Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018. Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017. 19/19