論文紹介「ResNeSt: Split-Attention Networks」

Slide 1

Slide 1 text

Rist kaggler workshop ResNeSt論文紹介いのうえ

Slide 2

Slide 2 text

ResNeSt: Split-Attention Networks arXive: https://arxiv.org/pdf/2004.08955.pdf Github: https://github.com/zhanghang1989/ResNeSt

Slide 3

Slide 3 text

Squeeze-and-Excitation NEtworks arXive: https://arxiv.org/abs/1709.01507 Github: https://github.com/hujie-frank/SENet

Slide 4

Slide 4 text

Squeeze-and-Excitation NEtworks arXive: https://arxiv.org/abs/1709.01507 Github: https://github.com/hujie-frank/SENet チャンネルに重み付けする。＝Attention

Slide 5

Slide 5 text

Squeeze-and-Excitation NEtworks 導入はとても簡単。SE-Moduleを加えるだけ。

Slide 6

Slide 6 text

Squeeze-and-Excitation NEtworks 参照 Cadene’s github, pretrained-models.pytorch: https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

Slide 7

Slide 7 text

Squeeze-and-Excitation NEtworks 参照 Cadene’s github, pretrained-models.pytorch: https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/senet.py

Slide 8

Slide 8 text

ResNeStの構成 SE-ModuleはResidual blockなど任意の Blockの後ろに付け足すだけだが、 ResNeSt はResNetのBottleneckのconv2がSplit Attention moduleに変わる。 https://github.com/zhanghang1989/ResNeSt/blob/ master/resnest/torch/resnet.py#L106

Slide 9

Slide 9 text

Selective Kernel Networks arXive: https://arxiv.org/abs/1903.06586? Github: https://github.com/implus/SKNet

Slide 10

Slide 10 text

ResNeStとは arXive: https://arxiv.org/pdf/2004.08955.pdf Github: https://github.com/zhanghang1989/ResNeSt この中身の説明

Slide 11

Slide 11 text

ざっくり言うと... Cardinalの中でさらにradixの数に分割し、Attentionを計算する。 Split Attention

Slide 12

Slide 12 text

Split Attention H W C x R Split R : radix Element-wise Summation GAP FC BN ReLu FC rSoftmax Attention Sum SplAtConv2d　 https://github.com/zhanghang1989/ResNeSt/blob/60e61bab401760b473c9 a0aecb420e292b018d35/resnest/torch/splat.py#L11 R=1のときはrSoftmaxはSigmoidになる（SE-Moduleと同等） Cardinalの数だけSplit Attentionされて C Conv2D ① Cardinal Concat Cardinal Split ② ③ ④ ⑤

Slide 13

Slide 13 text

Split Attention

Slide 14

Slide 14 text

Split Attention ① groupsの数がCardinalの数。ここはResNeXtと同じ。公式Modelはgroups=1, radix=2

Slide 15

Slide 15 text

Split Attention Radixの数で分割して、Channel ごとに合計する。 ②

Slide 16

Slide 16 text

Split Attention fc1で一旦、次元を減らしてから、 fc2で次元をchannel x radixに戻す。 ③ (Channel, 1)のサイズになる。

Slide 17

Slide 17 text

Split Attention ④

Slide 18

Slide 18 text

rSoftmax channel方向のSoftmax

Slide 19

Slide 19 text

Split Attention ⑤ Attentionをかけて、合計する。これで次元がchannel x radixから channelに戻る。

Slide 20

Slide 20 text

SENet, SKNetと比べると... Relation to Existing Attention Methods. First introduced in SE-Net [29], the idea of squeeze-and-attention (called excitation in the original paper) is to employ a global context to predict channel-wise attention factors. With radix = 1, our Split-Attention block is applying a squeeze-and-attention operation to each cardinal group, while the SE-Net operates on top of the entire block regardless of multiple groups. Previous models like SK-Net [38] introduced feature attention between two network branches, but their operation is not optimized for training efficiency and scaling to large neural networks. Our method generalizes prior work on feature-map attention [29, 38] within a cardinal group setting [60], and its implementation remains computationally efficient. Figure 1 shows an overall comparison with SE-Net and SK-Net blocks. （本文より引用）