（最先端NLP2019）Empirical Linguistic Study of Sentence Embeddings

Empirical Linguistic Study of Sentence Embeddings (ACL2019) 発表者︓阿部⾹央莉東北⼤
乾・鈴⽊研修⼠2年 / 理研AIP 2019/9/28 1 最先端NLP2019 ※スライド中の図表は脚注がない限り元論⽂からの抜粋です

論⽂の概要 2019/9/28 2 Sentence Embeddings（⽂ベクトル）を獲得する⼿法は様々に考案されている[Cer+, 2018][Pagliardini+, 2018] 例)
⽂を構成する単語ベクトルのmax/mean poolingなどどの⼿法が⼀番⾔語学的性質を捉えられているか Probingタスクや下流タスクを解いて調べてみた Sentence Embeddings Probing & downstream Tasks ✖ 最先端NLP2019

(個⼈的)この論⽂のキモ 2019/9/28 3 Sentence Embeddings Probing & downstream Tasks ✖
「⾔語横断的に調べてみよう︕」と(暗に)⾔っている︖ Universal Dependencyのアノテーションに基づいたProbingタスクを提案 → Universal Dependencyにある⾔語なら全て同様に実験可能最先端NLP2019 Sentence Embeddings（⽂ベクトル）を獲得する⼿法は様々に考案されている[Cer+, 2018][Pagliardini+, 2018] 例) ⽂を構成する単語ベクトルのmax/mean poolingなどどの⼿法が⼀番⾔語学的性質を捉えられているか Probingタスクや下流タスクを解いて調べてみた

例えば︓英語 (English) vs. ポーランド語 (Polish) English︓語順の制限が⾮常に厳しい Polish︓⽇本語と同様、語順の制限が緩い [仮説] 同じ⼿法で⽂ベクトルを作成しても、
⾔語学的類型 (linguistic typology) が異なる⾔語では、その⼿法が有効でない可能性がある︖ 仮に英語で有効な⼿法でも… 男⼥名詞がある⾔語において、男⼥名詞の違いを捉えているか︖ 語順そのものが異なる場合は︖（VS語順の⾔語もある）アルファベットではない⽂字セットを使う⾔語では︖etc… なぜ⾔語横断的に調べる必要が︖ 2019/9/28 4 最先端NLP2019

実験で扱う⽂ベクトル⼿法たち FastText BERT COMBO [Rybak and Wróblewska, 2018] Sent2Vec (NS)
Sent2Vec (ORIG) LASER[Artet xe and Schwenk, 2018] USE [Cer+, 2018] ベクトル次元数 300? 768? 164? 100(単語)+64(⽂字) 700? 700 1024 512 機構 CBOW 12層 Transf ormer 2層 Bi-LSTM CBOWの拡張 CBOWの拡張 5層 Bi-LSTM Transform er コーパス規模 English: 16k trees Polish: 22k trees 3M 70M 223M Much Pre- trained ✔ ✔ ✔ (UDで依存関係を学習) Paralera で学習 ✔ ✔ ✔ その他依存関係情報が⼊った?ベクトル（著者らの実装） [Pagliardin i+, 2018] 同左多⾔語情報を利⽤ 2019/9/28 5 単語ベクトルの Max/Mean Pooling (In-domain) ⼩規模コーパス⽂ベクトル⼤規模コーパス⽂ベクトル Sentence Embeddings Probing & downstream Tasks ✖ 最先端NLP2019 ※発表者作

Probingタスク [Conneau+, 2018]らのProbing Taskを、 UD treebankのスキーマに準じて改変 & データ作成データはParalera（En, Poのパラレル）コーパスを使⽤[Pęzik,
2016] 実際のデータも公開済タスクは全部で9種類（太字が [Conneau+, 2018] からのmodified or 新規追加） SentLen（⽂の⻑さ）, WC（特定の単語が含まれているか）, TreeDepth（依存構造の深さ）, TopDeps（⽂構造上不可⽋な部分の品詞リスト） Passsive, Tense（⽂法的過去形）, SubjNum（Subj が加算か）, ObjNum SentType︓[平叙⽂, 命令⽂, その他] の3分類（新規追加） 2019/9/28 6 Sentence Embeddings Probing & downstream Tasks ✖ 最先端NLP2019 ※ ⾃動的にProbing データを作成してくれるスクリプトとかはない → ⽇本語でやりたい場合、論⽂のprocedureに従って⽇本語UDに合わせて作る︖ (http://git.nlp.ipipan.waw.pl/groups/Scwad)

2つの下流タスク [懸念] Probingタスクだけでは、⽂ベクトル全体のパフォーマンスを評価することができない → 下流タスクの精度も測ろう︕ Relatedness &
Entailment Task 上記タスクのデータは以下2つを使⽤ English︓The SICK corpus [Bentivogli+, 2014] 10k pairs, 10⼈のannotator Polish︓CDSCorpus [Wróblewska and Krasnowska-Kieraś, 2017] 10k pairs, 6⼈のannotator 2019/9/28 7 Sentence Embeddings Probing & downstream Tasks ✖ 最先端NLP2019 ※ 10kという規模は同じだが、そもそも元は違うコーパスである（後で伏線回収）そうなの︖

実験結果 2019/9/28 8 Sentence Embeddings Probing & downstream Tasks downstream
Probing 網羅的な分析っぽい表（Po⽳抜けは英語単⾔語 pre-trained ベクトル）分類器は全て同⼀の機構を使⽤ (SentEval’s Multilayer Perceptron classifier)

先の表をまとめると 1. COMBO（依存関係考慮, max/mean-pooling）が1番（網掛けが、全モデル中最も精度の良いもの） 2. 実験設定上有利なのでCOMBOを抜くと、 LASER（多⾔語⼤規模コーパスベース）が良い（太字が、COMBOを抜いたモデル中最も精度の良いもの）
しかし、⼤きなベクトル次元数・⼤規模かつ多⾔語データが必要と⾔う点からコスト⼤ 3. 単語ベクトルのPoolingによるモデルは、 max < meanの傾向あり実験結果（要約） 2019/9/28 9 最先端NLP2019

個⼈的にきになる WC, SentLenなど⼈間には簡単なタスクにおいて精度が圧倒的に低い⼿法がいくつか….. 例) FastText(max-pooling)×SentLen, COMBO×WCなど TreeDepthはどの⼿法も軒並み低い (でも⼈間にも少し難しい気がする) 2019/9/28
最先端NLP2019 10 ←低すぎでは…︖

個⼈的にきになる2 2019/9/28 最先端NLP2019 11 USEにはPolishを含む多⾔語USEもある (https://tfhub.dev/google/universal-sentence-encoder-multilingual/1) （論⽂で採⽤したのは英語の単⾔語USE）多⾔語USEが公開されたのは2019/07だから
仕⽅ないかもただ、LASERの⽅が high-dimentional & 学習した⾔語も多い（ 16 <<< 93⾔語）ので、仮に多⾔語USEを⼊れてもLASERが勝ちそう? （偏⾒）（わかりません）

2⾔語間 (En, Po) のProbingの相関 2019/9/28 12 Pearson Spearman 左︓モデル別の相関右︓タスク別の相関
最先端NLP2019 「[仮説] 類型の異なる⾔語で各⽂ベクトル⼿法の性能は違う、を確かめたかった図︖ モデル別相関 → ⾼い = 性能の違いなし（LASERだけ⽐較的低い︖）タスク別相関 → SentTypeは（同データ・別⾔語で）相関が低い = 違いあり（下流タスクの2つは、EnとPoの間で使っているデータがそもそも違うので、相関の低さが⼿法だけでなくデータに依存している可能性もあり）

まとめ提案されている⽂ベクトル⼿法をまとめ、どれが良いか網羅的に調べた結果の考察はもう少しほしかった「様々な類型の⾔語で良くなる」を⽬的にしたら、 LASER（多⾔語⼤規模コーパスベース）が COMBOを抜くと⼀番精度が良いことを報告 LASERを使っていくなら、より低コストにしていく必要がある
UDスキーマというデファクトに従って Probingタスクを設計した「いろんな⾔語でProbingすべき」という気持ちはあっても、結局、⾃分の⺟語以外の評価はしづらいという懸念があるならせめて他の⾔語でも実験しやすい状況（タスク設定・データ資源）を作り上げるべき（という知⾒を勝⼿に得ました） 2019/9/28 13 最先端NLP2019

追記 https://github.com/Separius/awesome-sentence- embedding 実際どれくらい網羅性あるのかな…と調べようとしたらすごい単語ベクトルと⽂ベクトルまとめが出てきた USEのリリース時期︖の情報はここを⾒て気づきました ↑によると、Sentence-BERTというものが出たらしい (https://arxiv.org/abs/1908.10084) 実装も公開済(https://github.com/UKPLab/sentence- transformers),
pipでinstall可能もし今追加で増やすなら多⾔語USEとこれも⽐較したいかも︖ 2019/9/28 最先端NLP2019 14

References [Cer+, 2018] Universal Sentence Encoder for English. In Proceedings
of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 169–174. Association for Computational Linguistics. [Pagliardini+, 2018] Unsupervised Learning of Sentence Embed- dings Using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 528–540. Association for Computational Linguistics. [Rybak and Wróblewska, 2018] Semi-Supervised Neural System for Tagging, Parsing and Lematization. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 45–54. Association for Computional Linguistics. [Artetxe and Schwenk, 2018] Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. CoRR, abs/1812.10464. [Conneau+, 2018] What you can cram into a single ¥$&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136. Association for Computational Linguistics. [Pęzik, 2016] Exploring Phraseological Equivalence with Paralela. In Polish-Language Parallel Corpora, page 67–81. Instytut Lingwistyki Stosowanej UW, Warsaw [Bentivogli+, 2014] SICK through the SemEval Glasses. Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. Journal ofLanguage Resources and Evaluation, 50:95–124. [Wróblewska and Krasnowska-Kieraś, 2017] Polish evaluation dataset for compositional distributional semantics models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 784–792. Association for Computational Linguistics. 2019/9/28 最先端NLP2019 15

（最先端NLP2019）Empirical Linguistic Study of Sente...

（最先端NLP2019）Empirical Linguistic Study of Sentence Embeddings

Kaori Abe

More Decks by Kaori Abe

Other Decks in Research

Featured

Transcript

Empirical Linguistic Study of Sentence Embeddings (ACL2019) 発表者︓阿部⾹央莉東北⼤

論⽂の概要 2019/9/28 2 Sentence Embeddings（⽂ベクトル）を獲得する⼿法は様々に考案されている[Cer+, 2018][Pagliardini+, 2018] 例)

(個⼈的)この論⽂のキモ 2019/9/28 3 Sentence Embeddings Probing & downstream Tasks ✖

例えば︓英語 (English) vs. ポーランド語 (Polish) English︓語順の制限が⾮常に厳しい Polish︓⽇本語と同様、語順の制限が緩い [仮説] 同じ⼿法で⽂ベクトルを作成しても、

実験で扱う⽂ベクトル⼿法たち FastText BERT COMBO [Rybak and Wróblewska, 2018] Sent2Vec (NS)

Probingタスク [Conneau+, 2018]らのProbing Taskを、 UD treebankのスキーマに準じて改変 & データ作成データはParalera（En, Poのパラレル）コーパスを使⽤[Pęzik,

2つの下流タスク [懸念] Probingタスクだけでは、⽂ベクトル全体のパフォーマンスを評価することができない → 下流タスクの精度も測ろう︕ Relatedness &

実験結果 2019/9/28 8 Sentence Embeddings Probing & downstream Tasks downstream

個⼈的にきになる WC, SentLenなど⼈間には簡単なタスクにおいて精度が圧倒的に低い⼿法がいくつか….. 例) FastText(max-pooling)×SentLen, COMBO×WCなど TreeDepthはどの⼿法も軒並み低い (でも⼈間にも少し難しい気がする) 2019/9/28

個⼈的にきになる2 2019/9/28 最先端NLP2019 11 USEにはPolishを含む多⾔語USEもある (https://tfhub.dev/google/universal-sentence-encoder-multilingual/1) （論⽂で採⽤したのは英語の単⾔語USE）多⾔語USEが公開されたのは2019/07だから

2⾔語間 (En, Po) のProbingの相関 2019/9/28 12 Pearson Spearman 左︓モデル別の相関右︓タスク別の相関

References [Cer+, 2018] Universal Sentence Encoder for English. In Proceedings