Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Using millions of emoji occurrences to learn an...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yuto Kamiwaki
December 16, 2018
Research
0
110
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
2018/12/17 文献紹介の発表内容
Yuto Kamiwaki
December 16, 2018
Tweet
Share
More Decks by Yuto Kamiwaki
See All by Yuto Kamiwaki
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training
yuto_kamiwaki
0
120
Modeling Naive Psychology of Characters in Simple Commonsense Stories
yuto_kamiwaki
1
220
Epita at SemEval-2018 Task 1: Sentiment Analysis Using Transfer Learning Approach
yuto_kamiwaki
0
140
Tensor Fusion Network for Multimodal Sentiment Analysis
yuto_kamiwaki
0
270
Sentiment Analysis: It’s Complicated!
yuto_kamiwaki
0
86
ADAPT at IJCNLP-2017 Task 4: A Multinomial Naive Bayes Classification Approach for Customer Feedback Analysis task
yuto_kamiwaki
0
180
EmoWordNet: Automatic Expansion of Emotion Lexicon Using English WordNet
yuto_kamiwaki
0
110
ATTENTION-BASED LSTM FOR PSYCHOLOGICAL STRESS DETECTION FROM SPOKEN LANGUAGE USING DISTANT SUPERVISION
yuto_kamiwaki
0
160
BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
yuto_kamiwaki
0
250
Other Decks in Research
See All in Research
さまざまなAgent FrameworkとAIエージェントの評価
ymd65536
1
420
Pythonでジオを使い倒そう! 〜それとFOSS4G Hiroshima 2026のご紹介を少し〜
wata909
0
1.3k
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
4
670
情報技術の社会実装に向けた応用と課題:ニュースメディアの事例から / appmech-jsce 2025
upura
0
310
[IBIS 2025] 深層基盤モデルのための強化学習驚きから理論にもとづく納得へ
akifumi_wachi
19
9.6k
HoliTracer:Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
satai
3
630
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
110
ウェブ・ソーシャルメディア論文読み会 第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)
hkefka385
0
160
Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities
satai
3
510
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
500
OWASP KansaiDAY 2025.09_文系OSINTハンズオン
owaspkansai
0
110
A History of Approximate Nearest Neighbor Search from an Applications Perspective
matsui_528
1
160
Featured
See All Featured
[SF Ruby Conf 2025] Rails X
palkan
1
760
Faster Mobile Websites
deanohume
310
31k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
780
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Docker and Python
trallard
47
3.7k
Balancing Empowerment & Direction
lara
5
900
The Limits of Empathy - UXLibs8
cassininazir
1
220
Designing for Performance
lara
610
70k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.2k
The World Runs on Bad Software
bkeepers
PRO
72
12k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
53
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
60
42k
Transcript
Using millions of emoji occurrences to learn any-domain representations for
detecting sentiment, emotion and sarcasm Nagaoka University of Technology Yuto Kamiwaki Literature Review
Literature • Using millions of emoji occurrences to learn any-domain
representations for detecting sentiment, emotion and sarcasm • Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann • EMNLP 2017 2
Abstract • sentiment analysis, emotion analysis and sarcasm classificationにおける8つのbenchmarkでSoTA達成 •
感情ラベルの多様性が以前のdistant supervisonのアプ ローチよりもパフォーマンスの向上をもたらすことを確認 3
Introduction • NLPのタスクでは,アノテーション済み(感情が付与された)の データは少ない. • Distant supervisionを用いてSoTAを達成している研究があ る. Distant supervision
: (http://web.stanford.edu/~jurafsky/mintz.pdf) ラベル付きデータの情報を手がかりに全く別のラベルなしデータからラベル付きの学 習データを生成し、モデルを学習する手法 4
Related work • Ekman, Plutchikなどの感情の理論を用いて手作業によって 分類 ◦ 感情の理解が難しく,時間がかかる. • official
emoji tables (Eisner et al., 2016)からembeddingす る手法 ◦ emojiの使われ方を考慮しない. • マルチタスク学習 ◦ データストレージの観点から問題あり. 5
Pretraining • 2013年1月から2017年6月までのTweet data(emojiあり) • Only English tweets without URL’s
are used for the pretraining dataset. • All tweets are tokenized on a word-by-word basis. 6
Model 7
Transfer Learning(ChainThaw) 8
Emoji Prediction 9
Benchmarking 10 8 Benchmarks(3tasks,5domains)
Benchmarking 11
Importance of emoji diversity 12 Pos/Neg Emoji:8 types DeepMoji:64 types
感情ラベルの多様性が重要 64種類のemojiの細かい ニュアンスを学習できている. (次ページの図を参照)
Importance of emoji diversity 13
Model architecture 14 Pretraining時点では,差がない benchmark時点では,Attention ありの方が精度が高い 低層の特徴へのアクセスが簡単 勾配消失がなく,学習可能
Analyzing the effect of pretraining 15 Pretraining+chainthawで語彙が 増加 ->word coverageが改善
Comparing with human-level agreement 16 Human:76.1% Deepmoji:82.4% Deepmojiの方が,精度 が高い (実験内容については,論文
を参照)
Conclusion • sentiment analysis, emotion analysis and sarcasm classificationにおける8つのbenchmarkでSoTA達成 •
感情ラベルの多様性が以前のdistant supervisonのアプ ローチよりもパフォーマンスの向上をもたらすことを確認 • Pretraining済みモデルを公開 ◦ (Demo : https://deepmoji.mit.edu/) 17