論文紹介 / It is Okay to Not be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

It is Okay to Not be Okay: Overcoming Emotional Bias
in Affective Image Captioning by Contrastive Data Collection (ArtEmis Dataset v2) shade-tree Twitter: @shade_tree2112 Website: https://www.mori.ai [Project Page] 第11回全⽇本コンピュータビジョン勉強会「CVPR2022読み会（前編）」 2022/08/07 1

前⼝上 2022/08/07 2

発表者の⽴場・視点 • shade-tree • 博⼠（情報理⼯学） • Keywords: Creative Support, Natural
Language Processing, Storytelling, Emotions • 2021/10 - 某社勤務、リサーチャー • エンタテインメントAI • 2021/11 - 某⼤某研究室、⾮常勤研究員（兼業） • ⾃然⾔語処理、マルチモーダル • 好きなもの：『ドラえもん』 2022/08/07 3 本⽇08/07は、野⽐のび太くんのお誕⽣⽇です！おめでとうございます！（てんとう⾍コミックス・２巻所収「ぼくの⽣まれた⽇」より）

これまでの shade-tree @ CV勉強会 2022/08/07 4 A Hierarchical Approach for
Generating Descriptive Image Paragraphs 2017-08-19 Å41kCVï}k CVPRÛ,kî Presenter : shade-tree 8/19/2017 1 %53A21 $6 • Fine-grained $i¿Ąl • &ÎxÒ$_Àv&Ĉ • CVï}kÒ%'²&) ©Ā$y.+&áä!'e&) 6©ÕÒ%¹ø + 8/19/2017 19 °jďì&9PZF 「CV勉強会」ですが、⾔語メインで話します 2017 CVPR読み会（後編） 2021 CVPR読み会（後編）「CVはよく分からないのですが」と素⼈質問（原義）する⼈ 2021 ICCV読み会（後編）「やっぱり⾔語の話をしよう」 2022 「⽣成モデル縛り」⾔語の話をしません！なお、コンピュータビジョンの話もしません

• shade-tree • 博⼠（情報理⼯学） • Keywords: Creative Support, Natural Language
Processing, Storytelling, Emotions • 2021/10 - 某社勤務、リサーチャー • エンタテインメントAI • 2021/11 - 某⼤某研究室、⾮常勤研究員（兼業） • ⾃然⾔語処理、マルチモーダルそして、今回…… 2022/08/07 5 Vision & Language + Emotions の話をします

今回ご紹介する論⽂ 2022/08/07 6

今回ご紹介する論⽂（[Project Page]） 2022/08/07 7

本編個別の注記がない限り、図版は、紹介論⽂とプロジェクトページ、データセットの旧バージョンの論⽂から引⽤しています。 2022/08/07 8

どんな論⽂？ • 絵画に「感情を含む説明⽂」を付けたデータセット “ArtEmis” にバイアスがあることを指摘し、これを改善した v2 を提案 • v1 のデータセットには、感情のラベルに偏りがあったため、これを
contrastive なデータ収集により是正 • Contrastive なデータ収集の有⽤性を確認 2022/08/07 9

感情を含む画像説明⽂ Affective Image Captioning • ArtEmis [Achlioptas et al., CVPR2021]
は、画像に対してどんな感情を抱いたかを⾔語で説明した、⼤規模なデータセット • ⼈間の知性を感情の側⾯から理解するための V&L + Emotions 2022/08/07 10 “Positive な感情” の例 from ArtEmis v1

感情を含む画像説明⽂ Affective Image Captioning • ArtEmis [Achlioptas et al., CVPR2021]
は、画像に対してどんな感情を抱いたかを⾔語で説明した、⼤規模なデータセット • ⼈間の知性を感情の側⾯から理解するための V&L + Emotions 2022/08/07 11 “Negative な感情” の例 from ArtEmis v1

なぜ ArtEmis が作られたか • 感情は⼈間の内的状態を決定づける上で中⼼的な役割を果たし、その結果、⾏動にも影響を与える • 感情は外部刺激、特に vision and
language に強い影響を受ける 2022/08/07 12 マルチモーダルで、感覚刺激と感情との関わりを研究するためのデータセットが重要

Visual Art を対象としたデータセット • ⼆つの理由が、v1 の論⽂で述べられている • アートは鑑賞者の感情を刺激することを意図して作られることが多い • アート、特に抽象的なものは、単純な説明はつけられないことが多く、
絵画の内容や鑑賞者への影響について詳細な分析が必要になる • The ArtEmis dataset is built on top of the publicly available WikiArt dataset which contains 80,031 unique and carefully curated artworks from 1,119 artists (as downloaded in 2015). The artworks cover 27 art-styles (abstract, baroque, cubism, impressionism, etc.) and 45 genres (cityscape, landscape, portrait, still life, etc.) 2022/08/07 13

しかし、ArtEmis v1 にはバイアスがあった Positive な Emotions に偏っている Positive 62 %
Negative 26% 残りは ”something else” 2022/08/07 14 Positive Negative

データセットのバイアスとその影響 • ⼈間がデータセットにアノテーションを付ける際に、バイアスを完全に排除することは難しい • Plous [2003] は、”biases and prejudices”
は脳機能を最適化するもので、⼈間の進化に不可⽋であると⽰唆 • バイアスの存在が、そのデータセットで学習したモデルに影響 • 例：VQA dataset のバイアス • VQA1.0 では、画像情報ではなくテキスト情報のみに着⽬してしまう学習 2022/08/07 15

ArtEmis v1 のバイアス • ArtEmis を⽤いた学習で、ナイーブな近傍探索が⾼性能 • ⼈間は絵画に対して positive な感情を抱きやすい
→ カテゴリの分布に偏りがあり、これが学習結果に影響 • キャプションのラベルにおいて、 • Positve 62 %, Negative 26% （残りは ”something else”） • ArtEmis の Affective Image Captioning は、Subjective なもの • ⼈によって感じ⽅は違うはず。それにも関わらず、「反対の感情」が引き起こされないものが多数あった 2022/08/07 16

ArtEmis v1 のバイアス • 画像ごとの emotional score を⽤いて偏りを抽出 • 𝑖
は 𝑖 番⽬の画像を指す。𝑁! はその画像に付与されたラベルの総数 • 画像ごとに計算、絶対値が 0.3 を超えるものはバイアスがあるとする • 52,933 emotionally biased paintings • これをもとに、ArtEmis v1 を補完するような、追加分を収集する 2022/08/07 17

Contrastive Dataset • データセットのバイアスを是正するインターフェースを提案 • Amazon Mechanical Turk (AMT) により、以下のようなデータを収集
• Old: v1, New: Contrastive 2022/08/07 18

Data Collection Interface • あるクエリ画像に対し、近傍の24枚の画像を提⽰ • Worker は 24
枚の画像の中から、 opposite emotion が引き起こされるものを選ぶ • 「どれも適当でない」と回答する場合、 “No Image Available” が選択肢にある • 画像の近さは、VGG16 の fc7 層から得られた特徴量で計算 2022/08/07 19

Data Collection Interface • Opposite emotion について、 4 種類から選択 •
その理由を書いてもらう • クオリティコントロールのため、全回答をレビューしている 2022/08/07 20

Collected Data Statistics • 感情のバイアスがある 52,933 枚の Visual Art •
5 submissions / painting で、260,533 のインスタンスを収集 • うち、”No Image Available” が選ばれたのは 3% (7,752) のみ • 「Worker は、よく観察することで、ほとんどの絵画から、相反する感情を引き起こすようなディテールを抽出できる」と著者らは主張 • 多様性 • K visual neighbors (K=20) の感情ラベルのエントロピーを計算 • ArtEmis v1: 0.805 • ArtEmis v2: 0.855 (6% increase) 2022/08/07 21

ArtEmis v2 と v1、新たに収集したデータの関係 • 新たに収集された補完⽤データ： Contrastive • Contrastive を、ArtEmis
v1 のランダムなサブセット（補完と同数）と合わせたものを、Combined とする • この Combined を、ArtEmis v2 と命名 • 公平性を担保するため、実験の際に、データセットは同サイズになるよう調整 2022/08/07 22

Qualitative analysis • Query image が左、選択された nearest painting が右 •
Old: v1 の utterance (explanation), New: 新しく収集したもの • the query painting evoked contentment emotion, and the nearest painting originally evoked contentment as well. However, by observing the painting, an annotator feels disgusted because of the green tone, which resembles mold. 2022/08/07 23

Qualitative analysis • v1 の caption は generic で、どんな画像にも合いそうなもの •
v2 では、その画像に特有の詳細な説明が付与されている 2022/08/07 24 Old New

Quantitative Analysis • Old (v1) • Positive 62% • Negative
26% • Combined • Positive 47% • Negative 45% 2022/08/07 25 Contrastive な収集では、「クエリ画像と逆の感情」が集まる

Quantitative Analysis • Old (v1) • Positive 62% • Negative
26% • Combined • Positive 47% • Negative 45% 2022/08/07 26 Contrastive な収集では、「クエリ画像と逆の感情」が集まる

Further Analysis – Fine-grained Emotion Set • “Semantic Space Theory”
[Kowen & Keltner, 2020] を参考にして、 fine-grained な感情についても分析 • 伝統的な six emotions (anger, disgust, fear, happiness, sadness, and surprise) が⻑らく使われてきたが、主観性を考慮し、computational かつ open-ended な分析を⾏うと、25種類以上の⾼次元なものであると分かった • 感情反応の多くは、離散的ではなく、系統的に混ざり合っている • この理論に基づいて作られたテキストデータセット GoEmotions [Demszky et al., ACL2020] を⽤いて、 RoBERTa を fine-tune し、Combined と ArtEmis v1 を評価 2022/08/07 27

Further Analysis – Fine-grained Emotion Set • Combined では感情のバランスが良くなっている（上図）
• ラベル間の相関を調べると、 Combined では相関が⼩さい、すなわち各感情が独⽴している 2022/08/07 28

Experiments – Methods • Neural Speakers (Affective Image Captioning Models)
の⽐較 • NN: K-nearest neighbors • K=3 とし、近傍の 3 つから 1 つのキャプションをランダムに選ぶ • SAT: Show, Attend and Tell • LSTM • Meshed-Memory Transformers ( 𝑀" ) • Modified version of 𝑀" Transformer • 抽象画などに対応するため、Object 特徴量ではなくパッチを⽤いる 2022/08/07 29

Experiments – Datasets • Training Sets • Contrastive • ArtEmis
(v1) • Combined • Contrastive + 260,533 random samples from ArtEmis • ランダムに 65K captions を除き, ArtEmis の 455K と同じサイズに • Test Sets • Combined のサブセット（10% サイズ） • ArtEmisC40 • ArtEmis (v1) に含まれないが、同様の⽅法で集められたデータ • 703 枚の画像のそれぞれに、少なくとも40の Affective Caption 2022/08/07 30

Experiments – Results (Combined test set) • NN の性能は v1
の時より下がっている→バイアスの影響が低下 • SAT の NN に対する性能⽐（METEOR, ROUGE-L）が、 +28%, +29% から +65%, +63%に向上 2022/08/07 31

Experiments – Results (different training sets) • 各データセットで学習した SAT の性能を
ArtEmisC40 で⽐較 • Combined で学習したものが最も良い性能 →Contrastive data を加えてバランスを取ることでの性能向上を確認 2022/08/07 32

Experiments – Results (per emotion analysis) • どの感情でも性能が向上 • 特に、ArtEmis
(v1) で数の少なかった感情における性能向上が顕著 2022/08/07 33

2022/08/07 34 SAT での⽣成⼊⼒の感情を考慮した SAT での⽣成

Conclusion • 絵画に「感情を含む説明⽂」を付けたデータセット “ArtEmis” にバイアスがあることを指摘し、これを改善した v2 を提案 • v1 のデータセットには、感情のラベルに偏りがあったため、これを
contrastive なデータ収集により是正 • Contrastive なデータ収集の有⽤性を確認 • 将来展望 • Affective dataset は、今回のもの以外にも偏り（⺠族やマイノリティに対するものなど）を持つ可能性がある • Contrastive data collection approach により、これらを改善 2022/08/07 35

発表者が思ったこと • “感情” の扱いに関して、⼼理学や神経科学、認知科学における新しい知⾒が参照されている • A. S. Cowen や
L. F. Barrett • NLP、特に創造性に関するドメインの感情理解では、古典的な理論が引かれることが多かった。学際的な取り組みで、これがアップデートされようとしている 2022/08/07 36

発表者が思ったこと • 感情は外部刺激、特に vision and language に強い影響を受ける ↑ ⾳楽など、(language 以外の)
⾳の影響も⼤きいのでは？ • やや読みづらい部分があると感じた（語学⼒のせいでは？） • ”ArtEmis v2” の名称が Abstract と Conclusion にしかなく、説明がない • 他の箇所では Combined や Contrastive が⽤いられている • ArtEmis v1 が同グループの取り組みであると伝わらない状態。もしかすると、 Camera Ready の際に匿名化の解除をあまりしていなかった？ • Figure によって、Old (v1) と New (Contrastive) の上下が違う • 本⽂と Figure に対応が取れていないところがある 2022/08/07 37

感情のデータセットにおける「偏り」 • Story Cloze Test においても、 v1.0 において感情に偏りがあることが指摘されている [Sharma
et al., ACL2018] • worker に Story Ending の「正解／不正解」を書いてもらうと、正解⽂は Positive に偏る (VADER Sentiment >= 0.05) • v1.5 では、worker の執筆時の instruction を増やし、分布の差を是正 • The new restrictions were: ‘Each sentence should stay within the same subject area of the story,’ and ‘The number of words in the Right and Wrong sentences should not differ by more than 2 words,’ and ‘When possible, the Right and Wrong sentences should try to keep a similar tone/sentiment as one another.’ 2022/08/07 38

感情のデータセットにおける「偏り」 • データセット内の分布を恣意的に誘導することが、逆に偏りになってしまわないか？ • 著者らは「機械学習モデルは、⼈間のようにバイアスを理解・活⽤できない」と説明し、「バイアスは無くすべき」と主張 • Humans are
usually capable of recognizing biases when they cause more harm than good. However, machine learning models do not have a similar ability to detect and reason about biases. Therefore, if models learn from a biased dataset, they will make biased decisions. Consequently, reducing biases from datasets is crucial in increasing acceptance and trust in machine learning models. It is equally important to detect biases in datasets, especially in affective datasets, used to train models that emulate human affect or interact directly with them. 2022/08/07 39

今回ご紹介した論⽂ 2022/08/07 40 [Project Page]

Appendix 2022/08/07 41

クラウドソーシング • どのように Worker を選んだか？ • Appendix の 5 Dataset
Statistics に説明あり • よい worker にはボーナスを⽀払う • よくない worker には、これ以降このタスクを受けないように連絡 • 報酬は⾼めに設定（2.4ドル / h） 2022/08/07 42

(Text) Dataset における Emotions の扱い (GoEmotions の論⽂より) • 2.2 Emotion
Taxonomy • One of the main aspects distinguishing our dataset is its emotion taxonomy. The vast majority of existing datasets contain annotations for minor variations of the 6 basic emotion categories (joy, anger, fear, sadness, disgust, and surprise) proposed by Ekman (1992a) and/or along affective dimensions (valence and arousal) that underpin the circumplex model of affect (Russell, 2003; Buechel and Hahn, 2017). • Recent advances in psychology have offered new conceptual and methodological approaches to capturing the more complex “semantic space” of emotion (Cowen et al., 2019a) by studying the distribution of emotion responses to a diverse array of stimuli via computational techniques. Studies guided by these principles have identified 27 distinct varieties of emotional experience conveyed by short videos (Cowen and Keltner, 2017), 13 by music (Cowen et al., in press), 28 by facial expression (Cowen and Keltner, 2019), 12 by speech prosody (Cowen et al., 2019b), and 24 by nonverbal vocalization (Cowen et al., 2018). 2022/08/07 43

Art の Genres, Styles と感情 • v1 の論⽂の supplement では、art
の genre や style と感情との関わりについて分析が⾏われている • 27 art-styles (abstract, baroque, cubism, impressionism, etc.) • 右図 • 45 genres (cityscape, landscape, portrait, still life, etc.) • “landscape” は annotator の agreement が最も強く、そして、 positive な感情と結び付けられる傾向にある 2022/08/07 44

論文紹介 / It is Okay to Not be Okay: Overcoming E...

論文紹介 / It is Okay to Not be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

More Decks by Yusuke Mori

Other Decks in Research

Featured

Transcript