Upgrade to Pro — share decks privately, control downloads, hide ads and more …

論文紹介「ResNeSt: Split-Attention Networks」

論文紹介「ResNeSt: Split-Attention Networks」

Ristの社内Kaggle workshopで使用した「ResNeSt: Split-Attention Networks」の紹介スライドです。
arXive: https://arxiv.org/pdf/2004.08955.pdf
Github: https://github.com/zhanghang1989/ResNeSt

Inoichan

May 12, 2020
Tweet

More Decks by Inoichan

Other Decks in Technology

Transcript

  1. Rist kaggler workshop
    ResNeSt論文紹介
    いのうえ

    View Slide

  2. ResNeSt: Split-Attention Networks
    arXive: https://arxiv.org/pdf/2004.08955.pdf
    Github: https://github.com/zhanghang1989/ResNeSt

    View Slide

  3. Squeeze-and-Excitation NEtworks
    arXive: https://arxiv.org/abs/1709.01507
    Github: https://github.com/hujie-frank/SENet

    View Slide

  4. Squeeze-and-Excitation NEtworks
    arXive: https://arxiv.org/abs/1709.01507
    Github: https://github.com/hujie-frank/SENet
    チャンネルに重み付けする。
    =Attention

    View Slide

  5. Squeeze-and-Excitation NEtworks
    導入はとても簡単。SE-Moduleを加えるだけ。

    View Slide

  6. Squeeze-and-Excitation NEtworks
    参照 Cadene’s github, pretrained-models.pytorch:
    https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

    View Slide

  7. Squeeze-and-Excitation NEtworks
    参照 Cadene’s github, pretrained-models.pytorch:
    https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

    View Slide

  8. ResNeStの構成
    SE-ModuleはResidual blockなど任意の
    Blockの後ろに付け足すだけだが、
    ResNeSt
    はResNetのBottleneckのconv2がSplit
    Attention moduleに変わる。
    https://github.com/zhanghang1989/ResNeSt/blob/
    master/resnest/torch/resnet.py#L106

    View Slide

  9. Selective Kernel Networks
    arXive: https://arxiv.org/abs/1903.06586?
    Github: https://github.com/implus/SKNet

    View Slide

  10. ResNeStとは
    arXive: https://arxiv.org/pdf/2004.08955.pdf
    Github: https://github.com/zhanghang1989/ResNeSt
    この中身の説明

    View Slide

  11. ざっくり言うと...
    Cardinalの中でさらにradixの数に分
    割し、Attentionを計算する。
    Split Attention

    View Slide

  12. Split Attention
    H
    W
    C x R
    Split
    R : radix
    Element-wise
    Summation
    GAP
    FC
    BN
    ReLu
    FC rSoftmax
    Attention
    Sum
    SplAtConv2d 
    https://github.com/zhanghang1989/ResNeSt/blob/60e61bab401760b473c9
    a0aecb420e292b018d35/resnest/torch/splat.py#L11
    R=1のときはrSoftmaxはSigmoidになる(SE-Moduleと同等)
    Cardinalの数だけSplit Attentionされて
    C
    Conv2D

    Cardinal
    Concat
    Cardinal
    Split
    ② ③


    View Slide

  13. Split Attention

    View Slide

  14. Split Attention

    groupsの数がCardinalの数。
    ここはResNeXtと同じ。
    公式Modelはgroups=1, radix=2

    View Slide

  15. Split Attention
    Radixの数で分割して、Channel
    ごとに合計する。

    View Slide

  16. Split Attention
    fc1で一旦、次元を減らしてから、
    fc2で次元をchannel x radixに戻
    す。

    (Channel, 1)のサイズになる。

    View Slide

  17. Split Attention

    View Slide

  18. rSoftmax
    channel方向のSoftmax

    View Slide

  19. Split Attention

    Attentionをかけて、合計する。これ
    で次元がchannel x radixから
    channelに戻る。

    View Slide

  20. SENet, SKNetと比べると...
    Relation to Existing Attention Methods. First introduced in SE-Net [29], the idea of
    squeeze-and-attention (called excitation in the original paper) is to employ a global context to
    predict channel-wise attention factors. With radix = 1, our Split-Attention block is applying a
    squeeze-and-attention operation to each cardinal group, while the SE-Net operates on top of the
    entire block regardless of multiple groups. Previous models like SK-Net [38] introduced feature
    attention between two network branches, but their operation is not optimized for training
    efficiency and scaling to large neural networks. Our method generalizes prior work on
    feature-map attention [29, 38] within a cardinal group setting [60], and its implementation
    remains computationally efficient. Figure 1 shows an overall comparison with SE-Net and
    SK-Net blocks.
    (本文より引用)

    View Slide