ACL2019網羅的サーベイ報告会-iida発表

Slide 1

Slide 1 text

Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks meshidenn @ACL2019網羅的サーベイ報告会

Slide 2

Slide 2 text

自己紹介 2 The Power of PowerPoint - thepopp.com n Twitter: @meshidenn n 所属：株式会社レトリバ l 主な仕事：顧客課題を技術要素に分解・関連する技術分野の調査・スクリプトの実装 n NLP始めるまで l 航空宇宙工学→国家研究開発プロジェクト管理団体(事務仕事)→レトリバ l 興味：構造知識をなんとかしたい（設計支援や業務知識） n 趣味 l 剣道

Slide 3

Slide 3 text

論文概要 3 The Power of PowerPoint - thepopp.com n 学習済み単語ベクトルをグラフコンボリューショナルニューラルネットワーク(GCN)で調整することによって、単語関連のタスクの大半と後続タスクについて精度を向上させた n グラフとして、依存構造解析(Dependency Parsing)の結果を使用したものとWordNetを使用したものを用いた n Dependency Parsingを使用したベクトルは、語彙数が膨大になることが問題であったが、これを解決した n Retrofittingに代表される、単語関係を事後的に単語ベクトルに反映させる手法をより発展的にさせた ※コメント：以降、図表は特に断りのない限り、論文中のものを使用しています

Slide 4

Slide 4 text

モデル 4 The Power of PowerPoint - thepopp.com Word Embedding Gated GCN Softmax (negative sampling) 𝑤" 𝑤# 𝑤$ 𝑤% 𝑤& ℎ" ( ℎ# ( ℎ$ ( ℎ% ( ℎ& ( ℎ" " ℎ# " ℎ$ ( ℎ% ( ℎ& ( ℎ) *+" = 𝑓(Σ0∈2 3 𝑔567 * ×(𝑊567 * ℎ0 * + 𝑏567 * ) 𝑔5=> * = 𝜎( @ 𝑊5=> * ℎA * + B 𝑏5=> * ) 𝑢, 𝑣, 𝑙A3 ∈ 𝐺, 𝑁 𝑣 is neighbor of v 𝐸 = max(ΣMN" O log 𝑃 𝑤M 𝑤" M, 𝑤# M, … , 𝑤2U M ) 𝑃 𝑤M 𝑤" M, 𝑤# M, … , 𝑤2U M = exp 𝑣XU Y ℎM Σ MN" O exp 𝑣X6 Y ℎM

Slide 5

Slide 5 text

Syntax-GCN(SynGCN) 5 The Power of PowerPoint - thepopp.com n Dependency ParsingはStanford CoreNLP parserを使用 n 文単位でグラフを作っている n Gated GCNは以下の特徴をもつ l エッジラベルごとに異なる重み行列をもつ l Gateによって、パースミスなどで信頼性が低下しているエッジラベルの重みを下げる n Overfitを避けるため、自分自身へのエッジはないものとした

Slide 6

Slide 6 text

Semantic-GCN(SemGCN) 6 The Power of PowerPoint - thepopp.com n コーパス単位でグラフを作っている(Wordnetに記載されている関係をそのまま入れている) n 予測する単語ベクトルは値を固定 n 初期値からの変動が大きくなりすぎないように(元々の分脈による学習を大きく崩さないように)、自分へのエッジを入れている (RetrofittingのL2の項のような効果) n グラフラベルとして、synonym, antonym, hypernym, hyponymを使用

Slide 7

Slide 7 text

モデルのパラメータ 7 The Power of PowerPoint - thepopp.com 項目値使用語彙数出現頻度上位150k 単語ベクトルの次元 300 後続タスクでの隠れ層の次元 256 GCNの層の数 1 GCNの活性化関数 ReLu ELMOの次元 128,256,512,1024 最適化 Adam 学習率 0.001 サブサンプリング 10^-4 初期化 Xavier

Slide 8

Slide 8 text

実験：SynGCN-単語レベルタスク 8 The Power of PowerPoint - thepopp.com n WS353Relatednessをのぞいて、性能向上 l 文法によるコンテキストはtopic的な類似度より、語の機能的な役割を見ているため Word Similarity Concept Categorization Word Analogy 単語ベクトルのcos- simとのSpearman 順位相関単語ベクトルでクラスタリングを行い、上位カテゴリ最大のカテゴリを採用。その時の、正解カテゴリをもつデータ数を全データ数で悪 a:b->c:dの関係において 𝑤Z[ = 𝑤\ + 𝑤] − 𝑤_ と𝑤Z のcos-simとのSpearman 順位相関既存の依存構造解析を考慮した単語ベクトル

Slide 9

Slide 9 text

具体例: topic的類似度と機能類似の違い 9 The Power of PowerPoint - thepopp.com n Floridaは、BoWの場合はflolida内の都市で、DEPの場合はアメリカの他の州 n Dancingは、BoW5はdance系の単語、DEPSは他の-ing単語 n 日本語については、 https://ai-lab.lapras.com/nlp/japanese-word- embedding/ Target Word BoW5 BoW2 DEPS florida gainesvill fla jacksonville tampa lauderdale fla alabama gaineville tallahasee texas texas louisiana georgia california carolina dancing singing dance dances dancers tap-dancing singing dance dances breakdancing clowning singing rapping breakdancing miming busking Dependency-Based Word Embeddings Omer Levy and Yoav Goldberg 2014より

Slide 10

Slide 10 text

実験：SynGCN-後続タスク 10 The Power of PowerPoint - thepopp.com n 全体的に性能向上 n 特に、Q&Aにおいて、精度が向上している l 文法的な特徴がQ&Aの精度を向上させるという既存の知見と一致タスク設定 POS-Tagging • DataSet: Penn Treebank POS dataset • Model: Lee et al 2018 Q&A • DataSet: SQuAD • Model: Clark and Gardner 2018 NER(固有表現抽出) • DataSet: CoNLL2003 • Model: Lee et al 2018 Coref(共参照解析) • DataSet: CoNLL2012 • Model: Lee et al 2018

Slide 11

Slide 11 text

実験：SemGCM-単語レベルタスク 11 The Power of PowerPoint - thepopp.com n 外部コーパスは上位・下位語と対義語はWordNet、同義語はPPDBを使用 n 既存の辞書による事後学習手法に対して、MSR以外はどの単語ベクトルを初期値にしても良い結果 n さらに、SynGCNを使うものがもっとも良い結果 • R=1: 同義語のみ考慮 • R=2: 同義語と対義語を考慮 • R=4: 同義語・対義語・上位語・下位語を考慮

Slide 12

Slide 12 text

実験：SemGCN-後続タスク 12 The Power of PowerPoint - thepopp.com n SemGCNのみが、どのタスクでも精度が向上している

Slide 13

Slide 13 text

実験：グラフによる調整方法の是非 13 The Power of PowerPoint - thepopp.com n SynGCNをベースとして、同義語のみをの調整を実施 n SQuADで、精度を比較。SemGCNがもっとも精度を向上させている

Slide 14

Slide 14 text

実験：ELMOへの適用 14 The Power of PowerPoint - thepopp.com n ELMOで文脈化したベクトルにSynGCNとSemGCNを適用 n ELMOに対しても効果あり l SynGCNとSemGCNは文脈化と補完関係にある情報を得ることができていると考えられる。

Slide 15

Slide 15 text

SynGCNがCBOWの拡張になっている 15 The Power of PowerPoint - thepopp.com n GCNの元々の式 n 変形 ℎ) *+" = 𝑓(Σ0∈2 3 𝑔567 * ×(𝑊567 * ℎ0 * + 𝑏567 * ) 𝑔5=> * = 𝜎( @ 𝑊5=> * ℎA * + B 𝑏5=> * ) 𝑢, 𝑣, 𝑙A3 ∈ 𝐺, 𝑁 𝑣 is neighbor of v ℎ) *+" = 𝑓(Σ0∈2 3 𝑔567 * ×(𝑊567 * ℎ0 * + 𝑏567 * ) 𝑔5=> * = 1, 𝑊567 * =I, 𝑏567 * =0, 𝑁 𝑣 is sequential context, f is Identical ℎ) *+" = (Σa\b0b\,0c) 𝐼 ℎ0 * + 0)

Slide 16

Slide 16 text

SynGCNが語彙数の爆発を解決している理由 16 The Power of PowerPoint - thepopp.com n 以前の手法は、contextについて、係り受けの種類と方向の分だけ語彙数が増えていた。 n SynGCNは通常の語彙の数なので、語彙数の爆発を防げている Dependency-Based Word Embeddings Omer Levy and Yoav Goldberg 2014より

Slide 17

Slide 17 text

まとめ 17 The Power of PowerPoint - thepopp.com n 依存構造解析や単語の関係グラフの情報を活用して学習済み単語ベクトルを事後的に調整する手法を紹介 n WS353R以外のタスクにおいて、精度が向上 n (ただし、GPUのメモリに依存するので、語彙数に注意）

Slide 18

Slide 18 text

Appendix

Slide 19

Slide 19 text

Clark and Gardner 2018 Simple and Effective Multi-Paragraph Reading Comprehension Clark and Gardner 2018 より

Slide 20

Slide 20 text

Lee 2017 20 The Power of PowerPoint - thepopp.com n NERとpos-taggingをどうやって、これで解いているかは不明 l おそらく、各span representationのタグを当てる問題 End-to-end Neural Coreference Resolution Kenton Lee et al 2017より