Kaggleで学ぶ系列データのための深層学習モデリング

Slide 1

Slide 1 text

AI 2024.05.09 内田祐介 GO株式会社 Kaggleで学ぶ系列データのための深層学習モデリング

Slide 2

Slide 2 text

AI 2 画像コンペだとモデリングの余地があまりないことが多い timmのバックボーンをひたすら試す宝探し画像にしてつっこむと何とかしてくれるpretrainモデルが強すぎ一方、系列データのコンペはデファクトの（pretrain）モデルが確立しておらずモデリングで勝敗が分かれる印象系列データ？時系列データ、センサデータ、RNA等のシーケンスデータテキストはそれ自体で強大な1分野なので対象外過去コンペでどういうモデリングが行われていたか見てみよう！（モデリング以外はあまり触れないよ！）背景

Slide 3

Slide 3 text

AI 3 前提知識 Conformer/Squeezeformer 系列データモデリングで良く出てくるモデルを紹介コンペ事例紹介 IceCube - Neutrinos in Deep Ice Google - American Sign Language Fingerspelling Recognition Stanford Ribonanza RNA Folding HMS - Harmful Brain Activity Classification Disclaimer モデリングとして面白いものを主観でチョイスある程度のTransformer, CNNアーキテクチャの知識を前提アウトライン

Slide 4

Slide 4 text

AI 4 音声認識分野のつよつよencoder Conformer A. Gulati, et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. of Interspeech’20. Convolution Module MHSA FFN

Slide 5

Slide 5 text

AI 5 音声認識分野のつよつよencoder Conformer A. Gulati, et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. of Interspeech’20. Convolution Module MHSA FFN • 1D CNNを含むConvolution Module • FFNが2箇所に存在（Macaron-Netとかいうのがあるらしい）

Slide 6

Slide 6 text

AI 6 Conformerのアーキテクチャを再考したモデル Squeezeformer S. Kim, et al., "Squeezeformer: An Efficient Transformer for Automatic Speech Recognition," in Proc. of NeurIPS'22.

Slide 7

Slide 7 text

AI 7 Conformerのアーキテクチャを再考したモデル Squeezeformer S. Kim, et al., "Squeezeformer: An Efficient Transformer for Automatic Speech Recognition," in Proc. of NeurIPS'22. • U-Netのようなdown/upsample （1回だけだけど…） • FMCF -> MF, CF （元のTransformerっぽく） • ActivationをSwishに統⼀（性能同等でシンプルに） • Scaling layerを導⼊して冗⻑なLNを削除

Slide 8

Slide 8 text

AI 8 Whisperは大規模学習にフォーカスしているのでモデルは敢えて plainなTransformer （参考）Whisper A. Radford, et al., "Robust Speech Recognition via Large-Scale Weak Supervision," in arXiv:2212.04356, 2022.

Slide 9

Slide 9 text

AI 9 01 IceCube - Neutrinos in Deep Ice https://www.kaggle.com/competitions/icecube-neutrinos-in-deep-ice/

Slide 10

Slide 10 text

AI 10 IceCube - Neutrinos in Deep Ice概要

Slide 11

Slide 11 text

AI 11 南極地下に配置された検出器 (DOM) の情報をもとに飛来したニュートリノの方向を推定する DOM: ニュートリノが氷を通過した際に生成されるチェレンコフ光を検出提供データ=1飛来毎のpulseイベントの系列データ各stepのデータ time sensor_id（sensor_id to (x, y, z) の位置情報が別途ある） charge：パルスに含まれる光の量 auxiliary：観測の品質フラグ的なもの IceCube - Neutrinos in Deep Ice概要

Slide 12

Slide 12 text

AI 12 IceCube - Neutrinos in Deep Ice概要

Slide 13

Slide 13 text

AI 13 Top-3までのsolutionが論文化されている IceCube - Neutrinos in Deep Ice概要 H. Bukhari, et al., "IceCube - Neutrinos in Deep Ice The Top 3 Solutions from the Public Kaggle Competition," in arXiv:2310.15674, 2023.

Slide 14

Slide 14 text

AI 14 Transformer + EdgeConv EdgeConv: 自身の情報＋近傍との差分情報をMLPでアップデート本解法では、差分ではなく近傍の生の情報もcat 全近傍について↑の結果を集約近傍はDOMの位置 (x, y, z) のkNNで定義特徴： 1st Place Solution https://www.kaggle.com/competitions/icecube- neutrinos-in-deep-ice/discussion/402976

Slide 15

Slide 15 text

AI 15 入力のスカラ値の特徴変換にpositional encodingで利用される Fourier encodingを利用各イベントの時空間の整合性の特徴ds2をattention biasに入力同じニュートリノによるイベントだとDOMの距離＝dt*光速という仮説 2nd Place Solution https://www.kaggle.com/competitions/icecube- neutrinos-in-deep-ice/discussion/402882 天才か︕

Slide 16

Slide 16 text

AI 16 02 Google - American Sign Language Fingerspelling Recognition https://www.kaggle.com/competitions/asl-fingerspelling/

Slide 17

Slide 17 text

AI 17 手話をテキストに変換手話で話している映像をMediaPipeで認識した結果のランドマークの時系列情報が入力 “There are now 1,629 spatial coordinate columns for the x, y and z coordinates for each of the 543 landmarks” TensorFlow Liteモデルとしてsubmissionする必要がある 1映像を100ms以下で処理する必要がある ASL Fingerspelling Recognition概要

Slide 18

Slide 18 text

AI 18 1st Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434485

Slide 19

Slide 19 text

AI 19 1st Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434485 • 特徴抽出に2D CNN (landmark x time x 3 (x, y, z)) • DownsampleのないSqueezeformer • 相対位置エンコーディングをRotary Position Embedding (RoPE) に置き換えることで⾼速化

Slide 20

Slide 20 text

AI 20 Conv1DBlockとTransformerBlockのシンプルなモデル Conv1DBlockはMobileNetV2のInvertedResidual的構造 ActivationはSwish、normalizationはBNを利用 2nd Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434588 Efficient Channel Attention

Slide 21

Slide 21 text

AI 21 Squeezeformer + RoPE 1st Place Solutionとかなり似ている途中でdownsamaple（upsampleはしない）各Squeezeformer blockで stochastic depth (droppath) を利用 3rd Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434393

Slide 22

Slide 22 text

AI 22 03 Stanford Ribonanza RNA Folding https://www.kaggle.com/competitions/stanford-ribonanza-rna-folding/

Slide 23

Slide 23 text

AI 23 RNA配列の化学修飾剤DMS, 2A3に対する反応性を予測 RNA配列：A, C, G, Uから構成される配列 e.g. GGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGA… それぞれの位置に対して反応性を予測 EternaFold等のRNA構造を予測するソフトウェアの解析結果もデータとして提供 Base pair probability matrix：各ヌクレオチドがどのヌクレオチドとペアを形成する可能性があるかを表す行列配列長 x 配列長で定義される対称行列 Stanford Ribonanza RNA Folding概要

Slide 24

Slide 24 text

AI 24 BPPM情報をattentionに利用するTransformer 1st Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460121

Slide 25

Slide 25 text

AI 25 Dynamic Position Bias (DPB) とBPPMを変換した特徴を attention biasに利用それぞれ後述 Self-attention Block 6 heads

Slide 26

Slide 26 text

AI 26 Relative Position Bias (RPB) いわゆる相対位置エンコーディング Token間の相対位置 (dx, dy) に応じたbiasを加える Dynamic Position Bias (DPB) 相対位置をMLPで変換した結果をbias項として利用 Dynamic Position Bias (DPB) W. Wang, et al., "CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention," in Proc. of ICLR'22.

Slide 27

Slide 27 text

AI 27 2DCNNで特徴抽出 SE blockの有無で 2パターン変換した特徴は次のレイヤへの入力に利用 BPPMを変換

Slide 28

Slide 28 text

AI 28 OpenVaccine: COVID-19 mRNA Vaccine Degradation Prediction 7th Place Solution https://www.kaggle.com/competitions/stanford-covid- vaccine/discussion/189564 “RNAdegformer” 元ネタ的なもの

Slide 29

Slide 29 text

AI 29 Predicting Molecular Properties 1st Place Solution https://www.kaggle.com/competitions/champs-scalar- coupling/discussion/106575 2019年の段階ですごい！元ネタの元ネタ Following the standard transformer architectures, at each layer of the network, we use self-attention layer that mixes the embeddings between the nodes. The "standard" scaled self-attention layer from the transformer paper would be something like (forgive the latex-esq notation formatted as code … I'm entirely unprepared to describe model architectures without being able to write some form of equation): Z' = W_1 Z softmax(Z^T W_2^T W_3 Z) where W_1, W_2, and W_3 are weights of the layer. However, following the general practice of graph transformer architectures, we instead use a term Z' = W_1 Z softmax(Z^T W_2^T W_3 Z - gamma*D) where D is a distance matrix defined by the graph.

Slide 30

Slide 30 text

AI 30 Attention mask部分にグラフ情報を活用 Attention biasのほうが “soft” で良い気が… （参考）Graph Truncated Attention S. Seo, et al., "GTA: Graph Truncated Attention for Retrosynthesis," in Proc. of AAAI'21.

Slide 31

Slide 31 text

AI 31 BPPM情報をattentionに利用するSqueezeformer 2nd Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460316 ASLの2ndと同じ⼈︕

Slide 32

Slide 32 text

AI 32 BPPM情報をattentionに利用するSqueeze（ｒｙ 3rd Place Solution

Slide 33

Slide 33 text

AI 33 BPPM情報をattentionに利用するModified RNAdegformer ALiBi positional encoding, RMSNorm, SwiGLU等の活用 4th Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460203

Slide 34

Slide 34 text

AI 34 Residual Graph Attention Transformer + BPPM 4th Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460203

Slide 35

Slide 35 text

AI 35 04 HMS - Harmful Brain Activity Classification https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification

Slide 36

Slide 36 text

AI 36 脳波（EEG）データから発作等の脳活動を検出・分類 16パターンの電位差の時系列が基本特徴上記信号から作成されたspectrogram的な特徴を画像化する解法も（のほうが）多いが本発表では紹介しない HMS - Harmful Brain Activity Classification概要

Slide 37

Slide 37 text

AI 37 16系列の信号を縦にstack して2D画像として入力縦と横のConv2Dを使い分け縦 (conv k×1)：系列間の特徴抽出横 (conv 1×k)：時系列の特徴抽出 1D CNNでも実現できるが頻繁なreshapeやtransposeが必要後半はSqueezeformer 3rd Place Solution https://www.kaggle.com/competitions/hms-harmful- brain-activity-classification/discussion/492471

Slide 38

Slide 38 text

AI 38 3rd Place Solutionと同じく縦と横の2D Convを使い分け Inverted Residual内で両方実施 3rdの手法はブロックレベルで分かれている徐々にdownsampleする階層構造のモデルも 4th Place Solution (1D CNN Part)

Slide 39

Slide 39 text

AI 39 Squeezeformer大活躍 Attention biasに構造的な情報を入力するアプローチが強力 RoPE, ALiBi, SwiGLU, RMSNorm等、Transformerの改善を取り入れているまとめ