Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Using millions of emoji occurrences to learn any-domain representations for
detecting sentiment, emotion and sarcasm Nagaoka University of Technology Yuto Kamiwaki Literature Review

Literature • Using millions of emoji occurrences to learn any-domain
representations for detecting sentiment, emotion and sarcasm • Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann • EMNLP 2017 2

Abstract • sentiment analysis, emotion analysis and sarcasm classificationにおける8つのbenchmarkでSoTA達成 •
感情ラベルの多様性が以前のdistant supervisonのアプローチよりもパフォーマンスの向上をもたらすことを確認 3

Introduction • NLPのタスクでは，アノテーション済み(感情が付与された)のデータは少ない． • Distant supervisionを用いてSoTAを達成している研究がある． Distant supervision
: (http://web.stanford.edu/~jurafsky/mintz.pdf) ラベル付きデータの情報を手がかりに全く別のラベルなしデータからラベル付きの学習データを生成し、モデルを学習する手法 4

Related work • Ekman, Plutchikなどの感情の理論を用いて手作業によって分類 ◦ 感情の理解が難しく，時間がかかる． • official
emoji tables (Eisner et al., 2016)からembeddingする手法 ◦ emojiの使われ方を考慮しない． • マルチタスク学習 ◦ データストレージの観点から問題あり． 5

Pretraining • 2013年1月から2017年6月までのTweet data(emojiあり) • Only English tweets without URL’s
are used for the pretraining dataset. • All tweets are tokenized on a word-by-word basis. 6

Model 7

Transfer Learning(ChainThaw) 8

Emoji Prediction 9

Benchmarking 10 8 Benchmarks(3tasks，5domains)

Benchmarking 11

Importance of emoji diversity 12 Pos/Neg Emoji：8 types DeepMoji：64 types
感情ラベルの多様性が重要 64種類のemojiの細かいニュアンスを学習できている．（次ページの図を参照）

Importance of emoji diversity 13

Model architecture 14 Pretraining時点では，差がない benchmark時点では，Attention ありの方が精度が高い低層の特徴へのアクセスが簡単勾配消失がなく，学習可能

Analyzing the effect of pretraining 15 Pretraining+chainthawで語彙が増加 ->word coverageが改善

Comparing with human-level agreement 16 Human:76.1% Deepmoji:82.4% Deepmojiの方が，精度が高い (実験内容については，論文
を参照)

Conclusion • sentiment analysis, emotion analysis and sarcasm classificationにおける8つのbenchmarkでSoTA達成 •
感情ラベルの多様性が以前のdistant supervisonのアプローチよりもパフォーマンスの向上をもたらすことを確認 • Pretraining済みモデルを公開 ◦ (Demo : https://deepmoji.mit.edu/) 17

Using millions of emoji occurrences to learn an...

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Yuto Kamiwaki

More Decks by Yuto Kamiwaki

Other Decks in Research

Featured

Transcript