文献紹介: A Document Descriptor using Covariance of Word Vectors

A Document Descriptor using Covariance of Word Vectors 文献紹介 2019/02/27
長岡技術科学大学自然言語処理研究室稲岡夢人

Literature 2 Title A Document Descriptor using Covariance of Word
Vectors Author Marwan Torki Volume Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 527-532, 2018.

Abstract  単語ベクトルを用いた固定長の文書表現を提案 (Document-Covariance Descriptor; DoCoV) → Supervised, Unsupervisedのアプリケーションで簡単に利用できる
 様々なタスクでSoTAに匹敵する性能 3

Introduction  ベクトルを利用した文書検索には長い歴史がある ← Bag-of-Words, Latent Semantic Indexing(LSI)  近年はニューラル言語モデルで単語埋め込みを学習
 単語ではなく文, 段落, 文書の分散表現も注目されている 4

vs. DoCoV  doc2vecやFastSentは単語と共通の空間  共分散は単語の密度の形状を符号化 5

vs. DoCoV  doc2vecやFastSentは学習に時間がかかる  DoCoV(共分散)の計算は並列性が高く高速に行える 6

DoCoV  Document Observation Matrix d次元の単語埋め込みとn単語の文書において ∈ ×と定義 (行は単語、列は埋め込みの各次元) 7

DoCoV  Covariance Matrix 8

DoCoV  Vectorized representation 9

Evaluation  IMDB movie reviewsの分類性能によって単語ベクトルによる変化を評価  ベクトルを線形SVMで分類  1つのレビューは複数の文で構成される
 Train/Test/Unlabeled : 25K/25K/50K  事前学習済みのword2vec, GloVeと、 TrainとUnlabeledで学習したword2vecで比較 10

Result 11

Result 12

Result 13

Result 14

Evaluation  文の意味関連性データセットSICK, STS 2014で文書ベクトルを評価  事前学習済みの単語埋め込みを使用 (dim=300) 
Pearson correlationとSpearman correlationで評価 15

Result 学習が必要な他手法と匹敵するような結果 16

Evaluation  Google newsで事前学習済みの単語埋め込みを使用  Movie Reviews(MR), Subjectivity(Subj), Customer Reviews(CR),
TREC Question(TREC)をデータセットとして使用 17

Result 18

Result 19

Result 20

Result 21

Conclusions  文、段落、文書の新たなベクトル表現方法を提案  他手法のような反復の学習を必要としない  Supervised, Unsupervisedのタスクにおいてその有用性を確認 22

文献紹介: A Document Descriptor using Covariance of...

文献紹介: A Document Descriptor using Covariance of Word Vectors

Yumeto Inaoka

More Decks by Yumeto Inaoka

Other Decks in Research

Featured

Transcript

A Document Descriptor using Covariance of Word Vectors 文献紹介 2019/02/27

Literature 2 Title A Document Descriptor using Covariance of Word

Abstract  単語ベクトルを用いた固定長の文書表現を提案 (Document-Covariance Descriptor; DoCoV) → Supervised, Unsupervisedのアプリケーションで簡単に利用できる

Introduction  ベクトルを利用した文書検索には長い歴史がある ← Bag-of-Words, Latent Semantic Indexing(LSI)  近年はニューラル言語モデルで単語埋め込みを学習