$30 off During Our Annual Pro Sale. View Details »

論文紹介:On the Importance of Gradients for Detecting Distributional Shifts in the Wild

Masanari Kimura
November 05, 2022

論文紹介:On the Importance of Gradients for Detecting Distributional Shifts in the Wild

Masanari Kimura

November 05, 2022
Tweet

More Decks by Masanari Kimura

Other Decks in Research

Transcript

  1. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    論文紹介:On the Importance of Gradients for Detecting
    Distributional Shifts in the Wild
    Masanari Kimura
    総研大 統計学専攻 日野研究室
    [email protected]

    View Slide

  2. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Intro
    2/19

    View Slide

  3. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Introduction
    3/19

    View Slide

  4. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    TL;DR
    ▶ Huang et al. [2021];
    ▶ 勾配空間の情報を用いて Out-of-Distribution(OOD)な入力を検出する GradNorm
    を提案;
    ▶ GradNorm は,softmax の出力と一様確率分布との間の KL-divergence から誤差逆伝
    播された勾配のベクトルノルムを使用;
    ▶ In-Distribution(ID)データについての勾配の方が OOD データについての勾配より
    も大きくなりやすいという観測;
    ▶ GitHub:https://github.com/deeplearning-wisc/gradnorm_ood.
    4/19

    View Slide

  5. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Preliminaries
    ▶ 入力空間:X = Rd;
    ▶ 出力空間:Y = {1, 2, . . . , C};
    ▶ 未知の分布 P から生成される学習データ:D = {(xi, yi)}N
    i=1

    ▶ ニューラルネットワーク:f(x; θ) : X → RC;
    ▶ 経験誤差:RL
    (f) = ED (LCE(f(x; θ), y));
    ▶ 温度 T のクロスエントロピー損失:LCE(f(x), y) = − log efy(x)/T

    C
    c=1
    efc(x)/T
    .
    5/19

    View Slide

  6. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Problem statement
    ▶ Out-of-distriution (OOD) detection は 2 値分類問題として捉えることができる;
    ▶ ゴールは次のような分類器 g(x) を得ること:
    g(x) =
    in, if S(x) ≥ γ
    out, if S(x) < γ,
    (1)
    ▶ 問題は,どのようにスコアリング関数 S(x を作るかということ:
    ▶ 既存の多くの手法は,モデルの出力や特徴量に注目していた;
    ▶ 本手法では,勾配空間の情報を用いることを検討する.
    6/19

    View Slide

  7. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Gradient-based OOD Detection
    7/19

    View Slide

  8. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Gradient-based OOD Detection
    ▶ softmax 出力と一様分布の間の KL-divergence の誤差逆伝播によって勾配を計算
    する:
    DKL[u∥softmax(f(x))] = −
    1
    C
    C
    c=1
    log
    efc(x)/T
    C
    j=1
    efj(x)/T
    − H(u), (2)
    ▶ 第 1 項はクロスエントロピー;
    ▶ 第 2 項は定数.
    8/19

    View Slide

  9. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    GradNorm as OOD score
    ▶ パラメータ θ の要素 w について,KL-divergence の勾配は
    ∂DKL[u∥softmax(f(x))]
    ∂w
    =
    1
    C
    C
    i=1
    ∂LCE(f(x), i)
    ∂w
    . (3)
    ▶ ⇐ KL-divergence の勾配は全ラベルについてのカテゴリカルクロスエントロピーの微
    分の平均に等しい.
    ▶ これを用いて,OOD スコアを次のように定義(GradNorm)

    S(x) =
    ∂DKL(u∥softmax(f(x)))
    ∂w
    p
    (4)
    9/19

    View Slide

  10. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Experiments
    10/19

    View Slide

  11. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Experiments: setup
    ▶ 学習データセット:ImageNet
    ▶ OOD 評価のためのテストデータセット:
    ▶ iNaturalist[Van Horn et al., 2018]
    ▶ SUN[Xiao et al., 2010]
    ▶ Places[Zhou et al., 2017]
    ▶ Textures[Cimpoi et al., 2014]
    ▶ モデル:
    ▶ Google Bit-S with ResNetv2-101[He et al., 2016]
    ▶ Google Bit-S with DenseNet-121[Huang et al., 2017]
    11/19

    View Slide

  12. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Experiments I
    12/19

    View Slide

  13. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Experiments II
    13/19

    View Slide

  14. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Analysis of Gradient-based Method
    14/19

    View Slide

  15. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    GradNorm captures joint information between feature and output
    DKL[u∥softmax(f(x))] = −
    1
    C
    C
    c=1
    log
    efc/T
    C
    j=1
    efj/T
    − H(u) = −
    1
    C
    1
    T
    C
    c=1
    fc − C · log
    C
    j=1
    efj/T
    − H(u)
    (w.r.t. output)
    ∂DKL
    ∂fc
    = −
    1
    CT

    1 − CT ·
    ∂ log C
    j=1
    efj/T
    ∂fc

     = −
    1
    CT
    1 − C ·
    efc/T
    C
    j=1
    efj/T
    (w.r.t. weight)
    ∂DKL
    ∂W
    = x
    ∂DKL
    ∂f
    S(x) =
    m
    i=1
    C
    j=1
    ∂DKL
    ∂W
    ij
    =
    1
    CT
    m
    i=1
    |xi|
    C
    j=1
    1 − C ·
    efj/T
    C
    j=1
    efi/T
    =
    1
    CT
    m
    i=1
    |xi|
    入力空間の情報
    C
    j=1
    1 − C ·
    efj/T
    C
    j=1
    efj/T
    出力空間の情報
    15/19

    View Slide

  16. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Discussion
    16/19

    View Slide

  17. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    Discussion
    ▶ 勾配の情報を用いて OOD 検出をする手法を提案;
    ▶ KL-divergence の微分が入力情報と出力情報を含んでいることを示唆.
    17/19

    View Slide

  18. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    References I
    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea
    Vedaldi. Describing textures in the wild. In Proceedings of the IEEE conference on
    computer vision and pattern recognition, pages 3606–3613, 2014.
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep
    residual networks. In European conference on computer vision, pages 630–645. Springer,
    2016.
    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely
    connected convolutional networks. In Proceedings of the IEEE conference on computer
    vision and pattern recognition, pages 4700–4708, 2017.
    Rui Huang, Andrew Geng, and Yixuan Li. On the importance of gradients for detecting
    distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:
    677–689, 2021.
    18/19

    View Slide

  19. .
    .
    .
    .
    .
    .
    .
    .
    .
    .
    Intro
    .
    .
    .
    .
    .
    .
    Gradient-based OOD Detection
    .
    .
    .
    .
    .
    .
    .
    .
    Experiments
    .
    .
    .
    .
    Analysis of Gradient-based Method
    .
    .
    .
    .
    Discussion References
    References II
    Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard,
    Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification
    and detection dataset. In Proceedings of the IEEE conference on computer vision and
    pattern recognition, pages 8769–8778, 2018.
    Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun
    database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer
    society conference on computer vision and pattern recognition, pages 3485–3492. IEEE,
    2010.
    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places:
    A 10 million image database for scene recognition. IEEE transactions on pattern
    analysis and machine intelligence, 40(6):1452–1464, 2017.
    19/19

    View Slide