Kaggleで学ぶ系列データのための深層学習モデリング

AI 2024.05.09 内田祐介 GO株式会社 Kaggleで学ぶ系列データのための深層学習モデリング

AI 2 画像コンペだとモデリングの余地があまりないことが多い timmのバックボーンをひたすら試す宝探し画像にしてつっこむと何とかしてくれるpretrainモデルが強すぎ一方、系列データのコンペはデファクトの（pretrain）モデルが確立しておらずモデリングで勝敗が分かれる印象系列データ？時系列データ、センサデータ、RNA等のシーケンスデータテキストはそれ自体で強大な1分野なので対象外
過去コンペでどういうモデリングが行われていたか見てみよう！（モデリング以外はあまり触れないよ！）背景

AI 3 前提知識 Conformer/Squeezeformer 系列データモデリングで良く出てくるモデルを紹介コンペ事例紹介 IceCube - Neutrinos in
Deep Ice Google - American Sign Language Fingerspelling Recognition Stanford Ribonanza RNA Folding HMS - Harmful Brain Activity Classification Disclaimer モデリングとして面白いものを主観でチョイスある程度のTransformer, CNNアーキテクチャの知識を前提アウトライン

AI 4 音声認識分野のつよつよencoder Conformer A. Gulati, et al., “Conformer: Convolution-augmented
Transformer for Speech Recognition,” in Proc. of Interspeech’20. Convolution Module MHSA FFN

AI 5 音声認識分野のつよつよencoder Conformer A. Gulati, et al., “Conformer: Convolution-augmented
Transformer for Speech Recognition,” in Proc. of Interspeech’20. Convolution Module MHSA FFN • 1D CNNを含むConvolution Module • FFNが2箇所に存在（Macaron-Netとかいうのがあるらしい）

AI 6 Conformerのアーキテクチャを再考したモデル Squeezeformer S. Kim, et al., "Squeezeformer: An
Efficient Transformer for Automatic Speech Recognition," in Proc. of NeurIPS'22.

AI 7 Conformerのアーキテクチャを再考したモデル Squeezeformer S. Kim, et al., "Squeezeformer: An
Efficient Transformer for Automatic Speech Recognition," in Proc. of NeurIPS'22. • U-Netのようなdown/upsample （1回だけだけど…） • FMCF -> MF, CF （元のTransformerっぽく） • ActivationをSwishに統⼀（性能同等でシンプルに） • Scaling layerを導⼊して冗⻑なLNを削除

AI 8 Whisperは大規模学習にフォーカスしているのでモデルは敢えて plainなTransformer （参考）Whisper A. Radford, et
al., "Robust Speech Recognition via Large-Scale Weak Supervision," in arXiv:2212.04356, 2022.

AI 9 01 IceCube - Neutrinos in Deep Ice https://www.kaggle.com/competitions/icecube-neutrinos-in-deep-ice/

AI 10 IceCube - Neutrinos in Deep Ice概要

AI 11 南極地下に配置された検出器 (DOM) の情報をもとに飛来したニュートリノの方向を推定する DOM: ニュートリノが氷を通過した際に生成されるチェレンコフ光を検出提供データ=1飛来毎のpulseイベントの系列データ
各stepのデータ time sensor_id（sensor_id to (x, y, z) の位置情報が別途ある） charge：パルスに含まれる光の量 auxiliary：観測の品質フラグ的なもの IceCube - Neutrinos in Deep Ice概要

AI 12 IceCube - Neutrinos in Deep Ice概要

AI 13 Top-3までのsolutionが論文化されている IceCube - Neutrinos in Deep Ice概要 H.
Bukhari, et al., "IceCube - Neutrinos in Deep Ice The Top 3 Solutions from the Public Kaggle Competition," in arXiv:2310.15674, 2023.

AI 14 Transformer + EdgeConv EdgeConv: 自身の情報＋近傍との差分情報をMLPでアップデート本解法では、差分ではなく近傍の生の情報もcat 全近傍について↑の結果を集約
近傍はDOMの位置 (x, y, z) のkNNで定義特徴： 1st Place Solution https://www.kaggle.com/competitions/icecube- neutrinos-in-deep-ice/discussion/402976

AI 15 入力のスカラ値の特徴変換にpositional encodingで利用される Fourier encodingを利用各イベントの時空間の整合性の特徴ds2をattention biasに入力同じニュートリノによるイベントだとDOMの距離＝dt*光速という仮説 2nd
Place Solution https://www.kaggle.com/competitions/icecube- neutrinos-in-deep-ice/discussion/402882 天才か︕

AI 16 02 Google - American Sign Language Fingerspelling Recognition
https://www.kaggle.com/competitions/asl-fingerspelling/

AI 17 手話をテキストに変換手話で話している映像をMediaPipeで認識した結果のランドマークの時系列情報が入力 “There are now 1,629 spatial
coordinate columns for the x, y and z coordinates for each of the 543 landmarks” TensorFlow Liteモデルとしてsubmissionする必要がある 1映像を100ms以下で処理する必要がある ASL Fingerspelling Recognition概要

AI 18 1st Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434485

AI 19 1st Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434485 • 特徴抽出に2D CNN
(landmark x time x 3 (x, y, z)) • DownsampleのないSqueezeformer • 相対位置エンコーディングをRotary Position Embedding (RoPE) に置き換えることで⾼速化

AI 20 Conv1DBlockとTransformerBlockのシンプルなモデル Conv1DBlockはMobileNetV2のInvertedResidual的構造 ActivationはSwish、normalizationはBNを利用 2nd Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434588
Efficient Channel Attention

AI 21 Squeezeformer + RoPE 1st Place Solutionとかなり似ている途中でdownsamaple（upsampleはしない）各Squeezeformer
blockで stochastic depth (droppath) を利用 3rd Place Solution https://www.kaggle.com/competitions/asl- fingerspelling/discussion/434393

AI 22 03 Stanford Ribonanza RNA Folding https://www.kaggle.com/competitions/stanford-ribonanza-rna-folding/

AI 23 RNA配列の化学修飾剤DMS, 2A3に対する反応性を予測 RNA配列：A, C, G, Uから構成される配列 e.g. GGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGA…
それぞれの位置に対して反応性を予測 EternaFold等のRNA構造を予測するソフトウェアの解析結果もデータとして提供 Base pair probability matrix：各ヌクレオチドがどのヌクレオチドとペアを形成する可能性があるかを表す行列配列長 x 配列長で定義される対称行列 Stanford Ribonanza RNA Folding概要

AI 24 BPPM情報をattentionに利用するTransformer 1st Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460121

AI 25 Dynamic Position Bias (DPB) とBPPMを変換した特徴を attention biasに利用それぞれ後述
Self-attention Block 6 heads

AI 26 Relative Position Bias (RPB) いわゆる相対位置エンコーディング Token間の相対位置 (dx, dy)
に応じたbiasを加える Dynamic Position Bias (DPB) 相対位置をMLPで変換した結果をbias項として利用 Dynamic Position Bias (DPB) W. Wang, et al., "CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention," in Proc. of ICLR'22.

AI 27 2DCNNで特徴抽出 SE blockの有無で 2パターン変換した特徴は次のレイヤへの入力に利用 BPPMを変換

AI 28 OpenVaccine: COVID-19 mRNA Vaccine Degradation Prediction 7th Place
Solution https://www.kaggle.com/competitions/stanford-covid- vaccine/discussion/189564 “RNAdegformer” 元ネタ的なもの

AI 29 Predicting Molecular Properties 1st Place Solution https://www.kaggle.com/competitions/champs-scalar- coupling/discussion/106575
2019年の段階ですごい！元ネタの元ネタ Following the standard transformer architectures, at each layer of the network, we use self-attention layer that mixes the embeddings between the nodes. The "standard" scaled self-attention layer from the transformer paper would be something like (forgive the latex-esq notation formatted as code … I'm entirely unprepared to describe model architectures without being able to write some form of equation): Z' = W_1 Z softmax(Z^T W_2^T W_3 Z) where W_1, W_2, and W_3 are weights of the layer. However, following the general practice of graph transformer architectures, we instead use a term Z' = W_1 Z softmax(Z^T W_2^T W_3 Z - gamma*D) where D is a distance matrix defined by the graph.

AI 30 Attention mask部分にグラフ情報を活用 Attention biasのほうが “soft” で良い気が… （参考）Graph Truncated
Attention S. Seo, et al., "GTA: Graph Truncated Attention for Retrosynthesis," in Proc. of AAAI'21.

AI 31 BPPM情報をattentionに利用するSqueezeformer 2nd Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460316 ASLの2ndと同じ⼈︕

AI 32 BPPM情報をattentionに利用するSqueeze（ｒｙ 3rd Place Solution

AI 33 BPPM情報をattentionに利用するModified RNAdegformer ALiBi positional encoding, RMSNorm, SwiGLU等の活用 4th
Place Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460203

AI 34 Residual Graph Attention Transformer + BPPM 4th Place
Solution https://www.kaggle.com/competitions/stanford- ribonanza-rna-folding/discussion/460203

AI 35 04 HMS - Harmful Brain Activity Classification https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification

AI 36 脳波（EEG）データから発作等の脳活動を検出・分類 16パターンの電位差の時系列が基本特徴上記信号から作成されたspectrogram的な特徴を画像化する解法も（のほうが）多いが本発表では紹介しない HMS - Harmful Brain
Activity Classification概要

AI 37 16系列の信号を縦にstack して2D画像として入力縦と横のConv2Dを使い分け縦 (conv k×1)：系列間の特徴抽出横 (conv
1×k)：時系列の特徴抽出 1D CNNでも実現できるが頻繁なreshapeやtransposeが必要後半はSqueezeformer 3rd Place Solution https://www.kaggle.com/competitions/hms-harmful- brain-activity-classification/discussion/492471

AI 38 3rd Place Solutionと同じく縦と横の2D Convを使い分け Inverted Residual内で両方実施 3rdの手法はブロックレベルで分かれている
徐々にdownsampleする階層構造のモデルも 4th Place Solution (1D CNN Part)

AI 39 Squeezeformer大活躍 Attention biasに構造的な情報を入力するアプローチが強力 RoPE, ALiBi, SwiGLU, RMSNorm等、Transformerの改善を取り入れている
まとめ

Kaggleで学ぶ系列データのための深層学習モデリング

Kaggleで学ぶ系列データのための深層学習モデリング

More Decks by yu4u

Other Decks in Technology

Featured

Transcript