ACL2019網羅的サーベイ報告会-iida発表

Incorporating Syntactic and Semantic Information in Word Embeddings using Graph
Convolutional Networks meshidenn @ACL2019網羅的サーベイ報告会

自己紹介 2 The Power of PowerPoint - thepopp.com n Twitter:
@meshidenn n 所属：株式会社レトリバ l 主な仕事：顧客課題を技術要素に分解・関連する技術分野の調査・スクリプトの実装 n NLP始めるまで l 航空宇宙工学→国家研究開発プロジェクト管理団体(事務仕事)→レトリバ l 興味：構造知識をなんとかしたい（設計支援や業務知識） n 趣味 l 剣道

論文概要 3 The Power of PowerPoint - thepopp.com n 学習済み単語ベクトルをグラフコンボリューショナルニューラル
ネットワーク(GCN)で調整することによって、単語関連のタスクの大半と後続タスクについて精度を向上させた n グラフとして、依存構造解析(Dependency Parsing)の結果を使用したものとWordNetを使用したものを用いた n Dependency Parsingを使用したベクトルは、語彙数が膨大になることが問題であったが、これを解決した n Retrofittingに代表される、単語関係を事後的に単語ベクトルに反映させる手法をより発展的にさせた ※コメント：以降、図表は特に断りのない限り、論文中のものを使用しています

モデル 4 The Power of PowerPoint - thepopp.com Word Embedding
Gated GCN Softmax (negative sampling) 𝑤" 𝑤# 𝑤$ 𝑤% 𝑤& ℎ" ( ℎ# ( ℎ$ ( ℎ% ( ℎ& ( ℎ" " ℎ# " ℎ$ ( ℎ% ( ℎ& ( ℎ) *+" = 𝑓(Σ0∈2 3 𝑔567 * ×(𝑊567 * ℎ0 * + 𝑏567 * ) 𝑔5=> * = 𝜎( @ 𝑊5=> * ℎA * + B 𝑏5=> * ) 𝑢, 𝑣, 𝑙A3 ∈ 𝐺, 𝑁 𝑣 is neighbor of v 𝐸 = max(ΣMN" O log 𝑃 𝑤M 𝑤" M, 𝑤# M, … , 𝑤2U M ) 𝑃 𝑤M 𝑤" M, 𝑤# M, … , 𝑤2U M = exp 𝑣XU Y ℎM Σ MN" O exp 𝑣X6 Y ℎM

Syntax-GCN(SynGCN) 5 The Power of PowerPoint - thepopp.com n Dependency
ParsingはStanford CoreNLP parserを使用 n 文単位でグラフを作っている n Gated GCNは以下の特徴をもつ l エッジラベルごとに異なる重み行列をもつ l Gateによって、パースミスなどで信頼性が低下しているエッジラベルの重みを下げる n Overfitを避けるため、自分自身へのエッジはないものとした

Semantic-GCN(SemGCN) 6 The Power of PowerPoint - thepopp.com n コーパス単位でグラフを作っている(Wordnetに記載されている関係
をそのまま入れている) n 予測する単語ベクトルは値を固定 n 初期値からの変動が大きくなりすぎないように(元々の分脈による学習を大きく崩さないように)、自分へのエッジを入れている (RetrofittingのL2の項のような効果) n グラフラベルとして、synonym, antonym, hypernym, hyponymを使用

モデルのパラメータ 7 The Power of PowerPoint - thepopp.com 項目値
使用語彙数出現頻度上位150k 単語ベクトルの次元 300 後続タスクでの隠れ層の次元 256 GCNの層の数 1 GCNの活性化関数 ReLu ELMOの次元 128,256,512,1024 最適化 Adam 学習率 0.001 サブサンプリング 10^-4 初期化 Xavier

実験：SynGCN-単語レベルタスク 8 The Power of PowerPoint - thepopp.com n WS353Relatednessをのぞいて、性能向上
l 文法によるコンテキストはtopic的な類似度より、語の機能的な役割を見ているため Word Similarity Concept Categorization Word Analogy 単語ベクトルのcos- simとのSpearman 順位相関単語ベクトルでクラスタリングを行い、上位カテゴリ最大のカテゴリを採用。その時の、正解カテゴリをもつデータ数を全データ数で悪 a:b->c:dの関係において 𝑤Z[ = 𝑤\ + 𝑤] − 𝑤_ と𝑤Z のcos-simとのSpearman 順位相関既存の依存構造解析を考慮した単語ベクトル

具体例: topic的類似度と機能類似の違い 9 The Power of PowerPoint - thepopp.com n
Floridaは、BoWの場合はflolida内の都市で、DEPの場合はアメリカの他の州 n Dancingは、BoW5はdance系の単語、DEPSは他の-ing単語 n 日本語については、 https://ai-lab.lapras.com/nlp/japanese-word- embedding/ Target Word BoW5 BoW2 DEPS florida gainesvill fla jacksonville tampa lauderdale fla alabama gaineville tallahasee texas texas louisiana georgia california carolina dancing singing dance dances dancers tap-dancing singing dance dances breakdancing clowning singing rapping breakdancing miming busking Dependency-Based Word Embeddings Omer Levy and Yoav Goldberg 2014より

実験：SynGCN-後続タスク 10 The Power of PowerPoint - thepopp.com n 全体的に性能向上
n 特に、Q&Aにおいて、精度が向上している l 文法的な特徴がQ&Aの精度を向上させるという既存の知見と一致タスク設定 POS-Tagging • DataSet: Penn Treebank POS dataset • Model: Lee et al 2018 Q&A • DataSet: SQuAD • Model: Clark and Gardner 2018 NER(固有表現抽出) • DataSet: CoNLL2003 • Model: Lee et al 2018 Coref(共参照解析) • DataSet: CoNLL2012 • Model: Lee et al 2018

実験：SemGCM-単語レベルタスク 11 The Power of PowerPoint - thepopp.com n 外部コーパスは上位・下位語と対義語はWordNet、同義語はPPDBを使用
n 既存の辞書による事後学習手法に対して、MSR以外はどの単語ベクトルを初期値にしても良い結果 n さらに、SynGCNを使うものがもっとも良い結果 • R=1: 同義語のみ考慮 • R=2: 同義語と対義語を考慮 • R=4: 同義語・対義語・上位語・下位語を考慮

実験：SemGCN-後続タスク 12 The Power of PowerPoint - thepopp.com n SemGCNのみが、どのタスクでも精度が向上している

実験：グラフによる調整方法の是非 13 The Power of PowerPoint - thepopp.com n SynGCNをベースとして、同義語のみをの調整を実施
n SQuADで、精度を比較。SemGCNがもっとも精度を向上させている

実験：ELMOへの適用 14 The Power of PowerPoint - thepopp.com n ELMOで文脈化したベクトルにSynGCNとSemGCNを適用
n ELMOに対しても効果あり l SynGCNとSemGCNは文脈化と補完関係にある情報を得ることができていると考えられる。

SynGCNがCBOWの拡張になっている 15 The Power of PowerPoint - thepopp.com n GCNの元々の式
n 変形 ℎ) *+" = 𝑓(Σ0∈2 3 𝑔567 * ×(𝑊567 * ℎ0 * + 𝑏567 * ) 𝑔5=> * = 𝜎( @ 𝑊5=> * ℎA * + B 𝑏5=> * ) 𝑢, 𝑣, 𝑙A3 ∈ 𝐺, 𝑁 𝑣 is neighbor of v ℎ) *+" = 𝑓(Σ0∈2 3 𝑔567 * ×(𝑊567 * ℎ0 * + 𝑏567 * ) 𝑔5=> * = 1, 𝑊567 * =I, 𝑏567 * =0, 𝑁 𝑣 is sequential context, f is Identical ℎ) *+" = (Σa\b0b\,0c) 𝐼 ℎ0 * + 0)

SynGCNが語彙数の爆発を解決している理由 16 The Power of PowerPoint - thepopp.com n 以前の手法は、contextにつ
いて、係り受けの種類と方向の分だけ語彙数が増えていた。 n SynGCNは通常の語彙の数なので、語彙数の爆発を防げている Dependency-Based Word Embeddings Omer Levy and Yoav Goldberg 2014より

まとめ 17 The Power of PowerPoint - thepopp.com n 依存構造解析や単語の関係グラフの情報を活用して学習済み単語ベ
クトルを事後的に調整する手法を紹介 n WS353R以外のタスクにおいて、精度が向上 n (ただし、GPUのメモリに依存するので、語彙数に注意）

Appendix

Clark and Gardner 2018 Simple and Effective Multi-Paragraph Reading Comprehension
Clark and Gardner 2018 より

Lee 2017 20 The Power of PowerPoint - thepopp.com n
NERとpos-taggingをどうやって、これで解いているかは不明 l おそらく、各span representationのタグを当てる問題 End-to-end Neural Coreference Resolution Kenton Lee et al 2017より

ACL2019網羅的サーベイ報告会-iida発表

ACL2019網羅的サーベイ報告会-iida発表

Hiroki_Iida

More Decks by Hiroki_Iida

Other Decks in Research

Featured

Transcript

Incorporating Syntactic and Semantic Information in Word Embeddings using Graph

自己紹介 2 The Power of PowerPoint - thepopp.com n Twitter:

論文概要 3 The Power of PowerPoint - thepopp.com n 学習済み単語ベクトルをグラフコンボリューショナルニューラル

モデル 4 The Power of PowerPoint - thepopp.com Word Embedding

Syntax-GCN(SynGCN) 5 The Power of PowerPoint - thepopp.com n Dependency

Semantic-GCN(SemGCN) 6 The Power of PowerPoint - thepopp.com n コーパス単位でグラフを作っている(Wordnetに記載されている関係

モデルのパラメータ 7 The Power of PowerPoint - thepopp.com 項目値

実験：SynGCN-単語レベルタスク 8 The Power of PowerPoint - thepopp.com n WS353Relatednessをのぞいて、性能向上

具体例: topic的類似度と機能類似の違い 9 The Power of PowerPoint - thepopp.com n

実験：SynGCN-後続タスク 10 The Power of PowerPoint - thepopp.com n 全体的に性能向上

実験：SemGCM-単語レベルタスク 11 The Power of PowerPoint - thepopp.com n 外部コーパスは上位・下位語と対義語はWordNet、同義語はPPDBを使用

実験：SemGCN-後続タスク 12 The Power of PowerPoint - thepopp.com n SemGCNのみが、どのタスクでも精度が向上している

実験：グラフによる調整方法の是非 13 The Power of PowerPoint - thepopp.com n SynGCNをベースとして、同義語のみをの調整を実施

実験：ELMOへの適用 14 The Power of PowerPoint - thepopp.com n ELMOで文脈化したベクトルにSynGCNとSemGCNを適用

SynGCNがCBOWの拡張になっている 15 The Power of PowerPoint - thepopp.com n GCNの元々の式

SynGCNが語彙数の爆発を解決している理由 16 The Power of PowerPoint - thepopp.com n 以前の手法は、contextにつ

まとめ 17 The Power of PowerPoint - thepopp.com n 依存構造解析や単語の関係グラフの情報を活用して学習済み単語ベ

Appendix

Clark and Gardner 2018 Simple and Effective Multi-Paragraph Reading Comprehension

Lee 2017 20 The Power of PowerPoint - thepopp.com n