もう一度理解するTransformer（後編）

もう一度理解する Transformer（後編）金研機械学習勉強会 2022/09/06 中村勇士

–––––––– 単語間の注目度前編の復習 • RNN ◦ 再帰型ネットワークの導入 ◦ 前の単語に着目する •
Bi-RNN ◦ 再帰型ネットワークが双方向に ◦ 前後の単語に着目する • Encoder-Decoder ◦ 文脈ベクトルの導入 ◦ 文章全体の意味をもつ • Attention ◦ 注意機構の導入 ◦ 時間の重みを考慮した文脈ベクトル ◦ 翻訳前後の単語間の関係性を表す this h s h t h~ t c a is . a pen < > これはは Attention –––––– 文脈ベクトル ––––––– 翻訳する単語の情報（隠れ状態）

• Transformer ◦ RNNからSelf-Attentionへ・単語の流れではなく、単語間の関係性を学習・系列長に左右されない・並列化が可能 ◦ Scaled Dot-Product
Attention・Muti-Head Attentionの導入 • どんなモデル？ ◦ 『Attention Is All You Need』（2017） ◦ 高性能、様々な分野で使われる・BERT → Google 翻訳・GPT-3 → 1ヶ月間ブログを書いたのにAIだと気づかれず・ViT → 画像認識後編の内容 Attention

• Multi-Head Attention ◦ Scaled Dot-Product Attentionを結合 Scaled Dot-Product /
Multi-Head Attention • Scaled Dot-Product Attention ◦ QueryとKeyから注目度を計算 ◦ Keyに対応するValueに注目度を反映 Q：Query K：Key（Valueと対応） V：Value（Keyと対応） √d k ：次元の補正 softmax：確率に変換 –––––––––––––––––––––––––––– 注目度 Concat：結合 W：重み

• Multi-Head Attentionの使い方の話 ◦ 今まで：翻訳前後の単語間の関係性に注目 ◦ Self-Attention：文章内の単語間の関係性に注目 ◦
RNNからSelf-Attentionへ Self-Attention V K Q V K Q Self-Attention Attention

Concat：結合 W：重み • Multi-Head Attention ◦ Scaled Dot-Product Attentionを結合 Scaled
Dot-Product / Multi-Head Attention • Scaled Dot-Product Attention ◦ QueryとKeyから注目度を計算 ◦ Keyに対応するValueに注目度を反映 Q：Query K：Key（Valueと対応） V：Value（Keyと対応） √d k ：次元の補正 softmax：確率に変換 Self-Attentionでは、 Q・K・Vは元々同じ値それぞれ異なる重みをかける 1つの文章を 3つの角度 × 8つの領域で認識

• Feed Forward ◦ • Add & Norm ◦ スキップコネクション
◦ 正規化 • Embedding ◦ 単語をベクトルに埋め込み • Positional Encoding ◦ ベクトルの並び順を与える • Masked Multi-Head Attention ◦ 未来の情報をマスクその他の機構 Encoder Decoder

• Positional Encoding ◦ ベクトルの並び順を与える吾輩 / は / 猫
/ で / ある or は / 猫 / ある / で / 吾輩 ◦ 埋め込みベクトル + ポジション固有の値 ◦ 三角関数だと学習しやすい ◦ Positional Encoding • Embedding ◦ 単語をベクトルに埋め込み ID 単語ベクトル 1 りんご [0, 0, 0, 1] 2 みかん [0, 0, 1, 0] …… …… …… 7 ばなな [0, 1, 1, 0] …… …… …… pos：単語の順番 i：次元 d model : 全体の次元数

• Transformer ◦ RNNからSelf-Attentionへ・単語の流れではなく、単語間の関係性を学習・系列長に左右されない・並列化が可能 ◦ Scaled Dot-Product
Attention・Muti-Head Attentionの導入 • どんなモデル？ ◦ 『Attention Is All You Need』（2017） ◦ 高性能、様々な分野で使われる・BERT → Google 翻訳・GPT-3 → 1ヶ月間ブログを書いたのにAIだと気づかれず・ViT → 画像認識まとめ Attention

もう一度理解するTransformer（後編）

もう一度理解するTransformer（後編）

winnie279

More Decks by winnie279

Other Decks in Science

Featured

Transcript

もう一度理解する Transformer（後編）金研機械学習勉強会 2022/09/06 中村勇士

–––––––– 単語間の注目度前編の復習 • RNN ◦ 再帰型ネットワークの導入 ◦ 前の単語に着目する •

• Transformer ◦ RNNからSelf-Attentionへ・単語の流れではなく、単語間の関係性を学習・系列長に左右されない・並列化が可能 ◦ Scaled Dot-Product

• Multi-Head Attention ◦ Scaled Dot-Product Attentionを結合 Scaled Dot-Product /

• Multi-Head Attentionの使い方の話 ◦ 今まで：翻訳前後の単語間の関係性に注目 ◦ Self-Attention：文章内の単語間の関係性に注目 ◦

Concat：結合 W：重み • Multi-Head Attention ◦ Scaled Dot-Product Attentionを結合 Scaled

• Feed Forward ◦ • Add & Norm ◦ スキップコネクション

• Positional Encoding ◦ ベクトルの並び順を与える吾輩 / は / 猫

• Transformer ◦ RNNからSelf-Attentionへ・単語の流れではなく、単語間の関係性を学習・系列長に左右されない・並列化が可能 ◦ Scaled Dot-Product