論文紹介「ResNeSt: Split-Attention Networks」

Rist kaggler workshop ResNeSt論文紹介いのうえ

ResNeSt: Split-Attention Networks arXive: https://arxiv.org/pdf/2004.08955.pdf Github: https://github.com/zhanghang1989/ResNeSt

Squeeze-and-Excitation NEtworks arXive: https://arxiv.org/abs/1709.01507 Github: https://github.com/hujie-frank/SENet

Squeeze-and-Excitation NEtworks arXive: https://arxiv.org/abs/1709.01507 Github: https://github.com/hujie-frank/SENet チャンネルに重み付けする。＝Attention

Squeeze-and-Excitation NEtworks 導入はとても簡単。SE-Moduleを加えるだけ。

Squeeze-and-Excitation NEtworks 参照 Cadene’s github, pretrained-models.pytorch: https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

ResNeStの構成 SE-ModuleはResidual blockなど任意の Blockの後ろに付け足すだけだが、 ResNeSt はResNetのBottleneckのconv2がSplit Attention moduleに変わる。 https://github.com/zhanghang1989/ResNeSt/blob/ master/resnest/torch/resnet.py#L106

Selective Kernel Networks arXive: https://arxiv.org/abs/1903.06586? Github: https://github.com/implus/SKNet

ResNeStとは arXive: https://arxiv.org/pdf/2004.08955.pdf Github: https://github.com/zhanghang1989/ResNeSt この中身の説明

ざっくり言うと... Cardinalの中でさらにradixの数に分割し、Attentionを計算する。 Split Attention

Split Attention H W C x R Split R :
radix Element-wise Summation GAP FC BN ReLu FC rSoftmax Attention Sum SplAtConv2d　 https://github.com/zhanghang1989/ResNeSt/blob/60e61bab401760b473c9 a0aecb420e292b018d35/resnest/torch/splat.py#L11 R=1のときはrSoftmaxはSigmoidになる（SE-Moduleと同等） Cardinalの数だけSplit Attentionされて C Conv2D ① Cardinal Concat Cardinal Split ② ③ ④ ⑤

Split Attention

Split Attention ① groupsの数がCardinalの数。ここはResNeXtと同じ。公式Modelはgroups=1, radix=2

Split Attention Radixの数で分割して、Channel ごとに合計する。 ②

Split Attention fc1で一旦、次元を減らしてから、 fc2で次元をchannel x radixに戻す。 ③ (Channel, 1)のサイズになる。

Split Attention ④

rSoftmax channel方向のSoftmax

Split Attention ⑤ Attentionをかけて、合計する。これで次元がchannel x radixから channelに戻る。

SENet, SKNetと比べると... Relation to Existing Attention Methods. First introduced in
SE-Net [29], the idea of squeeze-and-attention (called excitation in the original paper) is to employ a global context to predict channel-wise attention factors. With radix = 1, our Split-Attention block is applying a squeeze-and-attention operation to each cardinal group, while the SE-Net operates on top of the entire block regardless of multiple groups. Previous models like SK-Net [38] introduced feature attention between two network branches, but their operation is not optimized for training efficiency and scaling to large neural networks. Our method generalizes prior work on feature-map attention [29, 38] within a cardinal group setting [60], and its implementation remains computationally efficient. Figure 1 shows an overall comparison with SE-Net and SK-Net blocks. （本文より引用）

論文紹介「ResNeSt: Split-Attention Networks」

論文紹介「ResNeSt: Split-Attention Networks」

Inoichan

More Decks by Inoichan

Other Decks in Technology

Featured

Transcript

Rist kaggler workshop ResNeSt論文紹介いのうえ

ResNeSt: Split-Attention Networks arXive: https://arxiv.org/pdf/2004.08955.pdf Github: https://github.com/zhanghang1989/ResNeSt

Squeeze-and-Excitation NEtworks arXive: https://arxiv.org/abs/1709.01507 Github: https://github.com/hujie-frank/SENet

Squeeze-and-Excitation NEtworks arXive: https://arxiv.org/abs/1709.01507 Github: https://github.com/hujie-frank/SENet チャンネルに重み付けする。＝Attention

Squeeze-and-Excitation NEtworks 導入はとても簡単。SE-Moduleを加えるだけ。

Squeeze-and-Excitation NEtworks 参照 Cadene’s github, pretrained-models.pytorch: https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

Squeeze-and-Excitation NEtworks 参照 Cadene’s github, pretrained-models.pytorch: https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

ResNeStの構成 SE-ModuleはResidual blockなど任意の Blockの後ろに付け足すだけだが、 ResNeSt はResNetのBottleneckのconv2がSplit Attention moduleに変わる。 https://github.com/zhanghang1989/ResNeSt/blob/ master/resnest/torch/resnet.py#L106

Selective Kernel Networks arXive: https://arxiv.org/abs/1903.06586? Github: https://github.com/implus/SKNet

ResNeStとは arXive: https://arxiv.org/pdf/2004.08955.pdf Github: https://github.com/zhanghang1989/ResNeSt この中身の説明

ざっくり言うと... Cardinalの中でさらにradixの数に分割し、Attentionを計算する。 Split Attention

Split Attention H W C x R Split R :

Split Attention

Split Attention ① groupsの数がCardinalの数。ここはResNeXtと同じ。公式Modelはgroups=1, radix=2

Split Attention Radixの数で分割して、Channel ごとに合計する。 ②

Split Attention fc1で一旦、次元を減らしてから、 fc2で次元をchannel x radixに戻す。 ③ (Channel, 1)のサイズになる。

Split Attention ④

rSoftmax channel方向のSoftmax

Split Attention ⑤ Attentionをかけて、合計する。これで次元がchannel x radixから channelに戻る。

SENet, SKNetと比べると... Relation to Existing Attention Methods. First introduced in