Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Search
Yuto Kamiwaki
December 16, 2018
Research
0
96
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
2018/12/17 文献紹介の発表内容
Yuto Kamiwaki
December 16, 2018
Tweet
Share
More Decks by Yuto Kamiwaki
See All by Yuto Kamiwaki
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training
yuto_kamiwaki
0
96
Modeling Naive Psychology of Characters in Simple Commonsense Stories
yuto_kamiwaki
1
190
Epita at SemEval-2018 Task 1: Sentiment Analysis Using Transfer Learning Approach
yuto_kamiwaki
0
130
Tensor Fusion Network for Multimodal Sentiment Analysis
yuto_kamiwaki
0
190
Sentiment Analysis: It’s Complicated!
yuto_kamiwaki
0
69
ADAPT at IJCNLP-2017 Task 4: A Multinomial Naive Bayes Classification Approach for Customer Feedback Analysis task
yuto_kamiwaki
0
110
EmoWordNet: Automatic Expansion of Emotion Lexicon Using English WordNet
yuto_kamiwaki
0
88
ATTENTION-BASED LSTM FOR PSYCHOLOGICAL STRESS DETECTION FROM SPOKEN LANGUAGE USING DISTANT SUPERVISION
yuto_kamiwaki
0
130
BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
yuto_kamiwaki
0
230
Other Decks in Research
See All in Research
Alternative Photographic Processes Reimagined: The Role of Digital Technology in Revitalizing Classic Printing Techniques【SIGGRAPH Asia 2023】
toremolo72
0
410
床面圧力センサ開発における感圧導電シート分離方式の検討 / WISS2023
yumulab
0
250
近似最近傍探索とVector DBの理論的背景
matsui_528
2
780
メタ動画データセットによる動作認識の現状と可能性
yuyay
0
120
Experiments on ROP Attack with Various Instruction Set Architectures
yumulab
0
310
音声処理ツールキットESPnetの現在と未来
kanbayashi1125
2
480
[KDD2023論文読み会] BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction / KDD2023 LY Tech Reading
shunk031
0
380
一般化ランダムフォレストの理論と統計的因果推論への応用
tomoshige_n
9
1.7k
Trezor Safe 3 ファーストインプレッション
toshihr
0
160
2024-01-23-az
sofievl
1
610
機械学習における重要度重み付けとその応用
mkimura
4
1.6k
Cross-Media Information Spaces and Architectures
signer
PRO
0
120
Featured
See All Featured
BBQ
matthewcrist
78
8.7k
The World Runs on Bad Software
bkeepers
PRO
60
6.6k
Agile that works and the tools we love
rasmusluckow
323
20k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
225
51k
Gamification - CAS2011
davidbonilla
76
4.5k
Product Roadmaps are Hard
iamctodd
43
9.6k
Navigating Team Friction
lara
177
13k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.8k
It's Worth the Effort
3n
180
27k
Web Components: a chance to create the future
zenorocha
304
41k
A better future with KSS
kneath
230
16k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
319
20k
Transcript
Using millions of emoji occurrences to learn any-domain representations for
detecting sentiment, emotion and sarcasm Nagaoka University of Technology Yuto Kamiwaki Literature Review
Literature • Using millions of emoji occurrences to learn any-domain
representations for detecting sentiment, emotion and sarcasm • Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann • EMNLP 2017 2
Abstract • sentiment analysis, emotion analysis and sarcasm classificationにおける8つのbenchmarkでSoTA達成 •
感情ラベルの多様性が以前のdistant supervisonのアプ ローチよりもパフォーマンスの向上をもたらすことを確認 3
Introduction • NLPのタスクでは,アノテーション済み(感情が付与された)の データは少ない. • Distant supervisionを用いてSoTAを達成している研究があ る. Distant supervision
: (http://web.stanford.edu/~jurafsky/mintz.pdf) ラベル付きデータの情報を手がかりに全く別のラベルなしデータからラベル付きの学 習データを生成し、モデルを学習する手法 4
Related work • Ekman, Plutchikなどの感情の理論を用いて手作業によって 分類 ◦ 感情の理解が難しく,時間がかかる. • official
emoji tables (Eisner et al., 2016)からembeddingす る手法 ◦ emojiの使われ方を考慮しない. • マルチタスク学習 ◦ データストレージの観点から問題あり. 5
Pretraining • 2013年1月から2017年6月までのTweet data(emojiあり) • Only English tweets without URL’s
are used for the pretraining dataset. • All tweets are tokenized on a word-by-word basis. 6
Model 7
Transfer Learning(ChainThaw) 8
Emoji Prediction 9
Benchmarking 10 8 Benchmarks(3tasks,5domains)
Benchmarking 11
Importance of emoji diversity 12 Pos/Neg Emoji:8 types DeepMoji:64 types
感情ラベルの多様性が重要 64種類のemojiの細かい ニュアンスを学習できている. (次ページの図を参照)
Importance of emoji diversity 13
Model architecture 14 Pretraining時点では,差がない benchmark時点では,Attention ありの方が精度が高い 低層の特徴へのアクセスが簡単 勾配消失がなく,学習可能
Analyzing the effect of pretraining 15 Pretraining+chainthawで語彙が 増加 ->word coverageが改善
Comparing with human-level agreement 16 Human:76.1% Deepmoji:82.4% Deepmojiの方が,精度 が高い (実験内容については,論文
を参照)
Conclusion • sentiment analysis, emotion analysis and sarcasm classificationにおける8つのbenchmarkでSoTA達成 •
感情ラベルの多様性が以前のdistant supervisonのアプ ローチよりもパフォーマンスの向上をもたらすことを確認 • Pretraining済みモデルを公開 ◦ (Demo : https://deepmoji.mit.edu/) 17