Rist kaggler workshop ResNeSt論文紹介 いのうえ

ResNeSt: Split-Attention Networks arXive: Github:

Squeeze-and-Excitation NEtworks arXive: Github:

Squeeze-and-Excitation NEtworks arXive: Github: チャンネルに重み付けする。 =Attention

Squeeze-and-Excitation NEtworks 導入はとても簡単。SE-Moduleを加えるだけ。

Squeeze-and-Excitation NEtworks 参照 Cadene’s github, pretrained-models.pytorch:

ResNeStの構成 SE-ModuleはResidual blockなど任意の Blockの後ろに付け足すだけだが、 ResNeSt はResNetのBottleneckのconv2がSplit Attention moduleに変わる。 master/resnest/torch/

Selective Kernel Networks arXive: Github:

ResNeStとは arXive: Github: この中身の説明

ざっくり言うと... Cardinalの中でさらにradixの数に分 割し、Attentionを計算する。 Split Attention

Split Attention H W C x R Split R : radix Element-wise Summation GAP FC BN ReLu FC rSoftmax Attention Sum SplAtConv2d a0aecb420e292b018d35/resnest/torch/ R=1のときはrSoftmaxはSigmoidになる(SE-Moduleと同等) Cardinalの数だけSplit Attentionされて C Conv2D ① Cardinal Concat Cardinal Split ② ③ ④ ⑤

Split Attention

Split Attention ① groupsの数がCardinalの数。 ここはResNeXtと同じ。 公式Modelはgroups=1, radix=2

Split Attention Radixの数で分割して、Channel ごとに合計する。 ②

Split Attention fc1で一旦、次元を減らしてから、 fc2で次元をchannel x radixに戻 す。 ③ (Channel, 1)のサイズになる。

Split Attention ④

rSoftmax channel方向のSoftmax

Split Attention ⑤ Attentionをかけて、合計する。これ で次元がchannel x radixから channelに戻る。

SENet, SKNetと比べると... Relation to Existing Attention Methods. First introduced in SE-Net [29], the idea of squeeze-and-attention (called excitation in the original paper) is to employ a global context to predict channel-wise attention factors. With radix = 1, our Split-Attention block is applying a squeeze-and-attention operation to each cardinal group, while the SE-Net operates on top of the entire block regardless of multiple groups. Previous models like SK-Net [38] introduced feature attention between two network branches, but their operation is not optimized for training efficiency and scaling to large neural networks. Our method generalizes prior work on feature-map attention [29, 38] within a cardinal group setting [60], and its implementation remains computationally efficient. Figure 1 shows an overall comparison with SE-Net and SK-Net blocks. (本文より引用)