論文紹介 / It is Okay to Not be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

by Yusuke Mori

Slide 1

Slide 1 text

It is Okay to Not be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection (ArtEmis Dataset v2) shade-tree Twitter: @shade_tree2112 Website: https://www.mori.ai [Project Page] 第11回全⽇本コンピュータビジョン勉強会「CVPR2022読み会（前編）」 2022/08/07 1

Slide 2

Slide 2 text

前⼝上 2022/08/07 2

Slide 3

Slide 3 text

発表者の⽴場・視点 • shade-tree • 博⼠（情報理⼯学） • Keywords: Creative Support, Natural Language Processing, Storytelling, Emotions • 2021/10 - 某社勤務、リサーチャー • エンタテインメントAI • 2021/11 - 某⼤某研究室、⾮常勤研究員（兼業） • ⾃然⾔語処理、マルチモーダル • 好きなもの：『ドラえもん』 2022/08/07 3 本⽇08/07は、野⽐のび太くんのお誕⽣⽇です！おめでとうございます！（てんとう⾍コミックス・２巻所収「ぼくの⽣まれた⽇」より）

Slide 4

Slide 4 text

これまでの shade-tree @ CV勉強会 2022/08/07 4 A Hierarchical Approach for Generating Descriptive Image Paragraphs 2017-08-19 Å41kCVï}k CVPRÛ,kî Presenter : shade-tree 8/19/2017 1 %53A21 $6 • Fine-grained $i¿Ąl • &ÎxÒ$_Àv&Ĉ • CVï}kÒ%'²&) ©Ā$y.+&áä!'e&) 6©ÕÒ%¹ø + 8/19/2017 19 °jďì&9PZF 「CV勉強会」ですが、⾔語メインで話します 2017 CVPR読み会（後編） 2021 CVPR読み会（後編）「CVはよく分からないのですが」と素⼈質問（原義）する⼈ 2021 ICCV読み会（後編）「やっぱり⾔語の話をしよう」 2022 「⽣成モデル縛り」⾔語の話をしません！なお、コンピュータビジョンの話もしません

Slide 5

Slide 5 text

• shade-tree • 博⼠（情報理⼯学） • Keywords: Creative Support, Natural Language Processing, Storytelling, Emotions • 2021/10 - 某社勤務、リサーチャー • エンタテインメントAI • 2021/11 - 某⼤某研究室、⾮常勤研究員（兼業） • ⾃然⾔語処理、マルチモーダルそして、今回…… 2022/08/07 5 Vision & Language + Emotions の話をします

Slide 6

Slide 6 text

今回ご紹介する論⽂ 2022/08/07 6

Slide 7

Slide 7 text

今回ご紹介する論⽂（[Project Page]） 2022/08/07 7

Slide 8

Slide 8 text

本編個別の注記がない限り、図版は、紹介論⽂とプロジェクトページ、データセットの旧バージョンの論⽂から引⽤しています。 2022/08/07 8

Slide 9

Slide 9 text

どんな論⽂？ • 絵画に「感情を含む説明⽂」を付けたデータセット “ArtEmis” にバイアスがあることを指摘し、これを改善した v2 を提案 • v1 のデータセットには、感情のラベルに偏りがあったため、これを contrastive なデータ収集により是正 • Contrastive なデータ収集の有⽤性を確認 2022/08/07 9

Slide 10

Slide 10 text

感情を含む画像説明⽂ Affective Image Captioning • ArtEmis [Achlioptas et al., CVPR2021] は、画像に対してどんな感情を抱いたかを⾔語で説明した、⼤規模なデータセット • ⼈間の知性を感情の側⾯から理解するための V&L + Emotions 2022/08/07 10 “Positive な感情” の例 from ArtEmis v1

Slide 11

Slide 11 text

感情を含む画像説明⽂ Affective Image Captioning • ArtEmis [Achlioptas et al., CVPR2021] は、画像に対してどんな感情を抱いたかを⾔語で説明した、⼤規模なデータセット • ⼈間の知性を感情の側⾯から理解するための V&L + Emotions 2022/08/07 11 “Negative な感情” の例 from ArtEmis v1

Slide 12

Slide 12 text

なぜ ArtEmis が作られたか • 感情は⼈間の内的状態を決定づける上で中⼼的な役割を果たし、その結果、⾏動にも影響を与える • 感情は外部刺激、特に vision and language に強い影響を受ける 2022/08/07 12 マルチモーダルで、感覚刺激と感情との関わりを研究するためのデータセットが重要

Slide 13

Slide 13 text

Visual Art を対象としたデータセット • ⼆つの理由が、v1 の論⽂で述べられている • アートは鑑賞者の感情を刺激することを意図して作られることが多い • アート、特に抽象的なものは、単純な説明はつけられないことが多く、絵画の内容や鑑賞者への影響について詳細な分析が必要になる • The ArtEmis dataset is built on top of the publicly available WikiArt dataset which contains 80,031 unique and carefully curated artworks from 1,119 artists (as downloaded in 2015). The artworks cover 27 art-styles (abstract, baroque, cubism, impressionism, etc.) and 45 genres (cityscape, landscape, portrait, still life, etc.) 2022/08/07 13

Slide 14

Slide 14 text

しかし、ArtEmis v1 にはバイアスがあった Positive な Emotions に偏っている Positive 62 % Negative 26% 残りは ”something else” 2022/08/07 14 Positive Negative

Slide 15

Slide 15 text

データセットのバイアスとその影響 • ⼈間がデータセットにアノテーションを付ける際に、バイアスを完全に排除することは難しい • Plous [2003] は、”biases and prejudices” は脳機能を最適化するもので、⼈間の進化に不可⽋であると⽰唆 • バイアスの存在が、そのデータセットで学習したモデルに影響 • 例：VQA dataset のバイアス • VQA1.0 では、画像情報ではなくテキスト情報のみに着⽬してしまう学習 2022/08/07 15

Slide 16

Slide 16 text

ArtEmis v1 のバイアス • ArtEmis を⽤いた学習で、ナイーブな近傍探索が⾼性能 • ⼈間は絵画に対して positive な感情を抱きやすい → カテゴリの分布に偏りがあり、これが学習結果に影響 • キャプションのラベルにおいて、 • Positve 62 %, Negative 26% （残りは ”something else”） • ArtEmis の Affective Image Captioning は、Subjective なもの • ⼈によって感じ⽅は違うはず。それにも関わらず、「反対の感情」が引き起こされないものが多数あった 2022/08/07 16

Slide 17

Slide 17 text

ArtEmis v1 のバイアス • 画像ごとの emotional score を⽤いて偏りを抽出 • 𝑖 は 𝑖 番⽬の画像を指す。𝑁! はその画像に付与されたラベルの総数 • 画像ごとに計算、絶対値が 0.3 を超えるものはバイアスがあるとする • 52,933 emotionally biased paintings • これをもとに、ArtEmis v1 を補完するような、追加分を収集する 2022/08/07 17

Slide 18

Slide 18 text

Contrastive Dataset • データセットのバイアスを是正するインターフェースを提案 • Amazon Mechanical Turk (AMT) により、以下のようなデータを収集 • Old: v1, New: Contrastive 2022/08/07 18

Slide 19

Slide 19 text

Data Collection Interface • あるクエリ画像に対し、近傍の24枚の画像を提⽰ • Worker は 24 枚の画像の中から、 opposite emotion が引き起こされるものを選ぶ • 「どれも適当でない」と回答する場合、 “No Image Available” が選択肢にある • 画像の近さは、VGG16 の fc7 層から得られた特徴量で計算 2022/08/07 19

Slide 20

Slide 20 text

Data Collection Interface • Opposite emotion について、 4 種類から選択 • その理由を書いてもらう • クオリティコントロールのため、全回答をレビューしている 2022/08/07 20

Slide 21

Slide 21 text

Collected Data Statistics • 感情のバイアスがある 52,933 枚の Visual Art • 5 submissions / painting で、260,533 のインスタンスを収集 • うち、”No Image Available” が選ばれたのは 3% (7,752) のみ • 「Worker は、よく観察することで、ほとんどの絵画から、相反する感情を引き起こすようなディテールを抽出できる」と著者らは主張 • 多様性 • K visual neighbors (K=20) の感情ラベルのエントロピーを計算 • ArtEmis v1: 0.805 • ArtEmis v2: 0.855 (6% increase) 2022/08/07 21

Slide 22

Slide 22 text

ArtEmis v2 と v1、新たに収集したデータの関係 • 新たに収集された補完⽤データ： Contrastive • Contrastive を、ArtEmis v1 のランダムなサブセット（補完と同数）と合わせたものを、Combined とする • この Combined を、ArtEmis v2 と命名 • 公平性を担保するため、実験の際に、データセットは同サイズになるよう調整 2022/08/07 22

Slide 23

Slide 23 text

Qualitative analysis • Query image が左、選択された nearest painting が右 • Old: v1 の utterance (explanation), New: 新しく収集したもの • the query painting evoked contentment emotion, and the nearest painting originally evoked contentment as well. However, by observing the painting, an annotator feels disgusted because of the green tone, which resembles mold. 2022/08/07 23

Slide 24

Slide 24 text

Qualitative analysis • v1 の caption は generic で、どんな画像にも合いそうなもの • v2 では、その画像に特有の詳細な説明が付与されている 2022/08/07 24 Old New

Slide 25

Slide 25 text

Quantitative Analysis • Old (v1) • Positive 62% • Negative 26% • Combined • Positive 47% • Negative 45% 2022/08/07 25 Contrastive な収集では、「クエリ画像と逆の感情」が集まる

Slide 26

Slide 26 text

Quantitative Analysis • Old (v1) • Positive 62% • Negative 26% • Combined • Positive 47% • Negative 45% 2022/08/07 26 Contrastive な収集では、「クエリ画像と逆の感情」が集まる

Slide 27

Slide 27 text

Further Analysis – Fine-grained Emotion Set • “Semantic Space Theory” [Kowen & Keltner, 2020] を参考にして、 fine-grained な感情についても分析 • 伝統的な six emotions (anger, disgust, fear, happiness, sadness, and surprise) が⻑らく使われてきたが、主観性を考慮し、computational かつ open-ended な分析を⾏うと、25種類以上の⾼次元なものであると分かった • 感情反応の多くは、離散的ではなく、系統的に混ざり合っている • この理論に基づいて作られたテキストデータセット GoEmotions [Demszky et al., ACL2020] を⽤いて、 RoBERTa を fine-tune し、Combined と ArtEmis v1 を評価 2022/08/07 27

Slide 28

Slide 28 text

Further Analysis – Fine-grained Emotion Set • Combined では感情のバランスが良くなっている（上図） • ラベル間の相関を調べると、 Combined では相関が⼩さい、すなわち各感情が独⽴している 2022/08/07 28

Slide 29

Slide 29 text

Experiments – Methods • Neural Speakers (Affective Image Captioning Models) の⽐較 • NN: K-nearest neighbors • K=3 とし、近傍の 3 つから 1 つのキャプションをランダムに選ぶ • SAT: Show, Attend and Tell • LSTM • Meshed-Memory Transformers ( 𝑀" ) • Modified version of 𝑀" Transformer • 抽象画などに対応するため、Object 特徴量ではなくパッチを⽤いる 2022/08/07 29

Slide 30

Slide 30 text

Experiments – Datasets • Training Sets • Contrastive • ArtEmis (v1) • Combined • Contrastive + 260,533 random samples from ArtEmis • ランダムに 65K captions を除き, ArtEmis の 455K と同じサイズに • Test Sets • Combined のサブセット（10% サイズ） • ArtEmisC40 • ArtEmis (v1) に含まれないが、同様の⽅法で集められたデータ • 703 枚の画像のそれぞれに、少なくとも40の Affective Caption 2022/08/07 30

Slide 31

Slide 31 text

Experiments – Results (Combined test set) • NN の性能は v1 の時より下がっている→バイアスの影響が低下 • SAT の NN に対する性能⽐（METEOR, ROUGE-L）が、 +28%, +29% から +65%, +63%に向上 2022/08/07 31

Slide 32

Slide 32 text

Experiments – Results (different training sets) • 各データセットで学習した SAT の性能を ArtEmisC40 で⽐較 • Combined で学習したものが最も良い性能 →Contrastive data を加えてバランスを取ることでの性能向上を確認 2022/08/07 32

Slide 33

Slide 33 text

Experiments – Results (per emotion analysis) • どの感情でも性能が向上 • 特に、ArtEmis (v1) で数の少なかった感情における性能向上が顕著 2022/08/07 33

Slide 34

Slide 34 text

2022/08/07 34 SAT での⽣成⼊⼒の感情を考慮した SAT での⽣成

Slide 35

Slide 35 text

Conclusion • 絵画に「感情を含む説明⽂」を付けたデータセット “ArtEmis” にバイアスがあることを指摘し、これを改善した v2 を提案 • v1 のデータセットには、感情のラベルに偏りがあったため、これを contrastive なデータ収集により是正 • Contrastive なデータ収集の有⽤性を確認 • 将来展望 • Affective dataset は、今回のもの以外にも偏り（⺠族やマイノリティに対するものなど）を持つ可能性がある • Contrastive data collection approach により、これらを改善 2022/08/07 35

Slide 36

Slide 36 text

発表者が思ったこと • “感情” の扱いに関して、⼼理学や神経科学、認知科学における新しい知⾒が参照されている • A. S. Cowen や L. F. Barrett • NLP、特に創造性に関するドメインの感情理解では、古典的な理論が引かれることが多かった。学際的な取り組みで、これがアップデートされようとしている 2022/08/07 36

Slide 37

Slide 37 text

発表者が思ったこと • 感情は外部刺激、特に vision and language に強い影響を受ける ↑ ⾳楽など、(language 以外の) ⾳の影響も⼤きいのでは？ • やや読みづらい部分があると感じた（語学⼒のせいでは？） • ”ArtEmis v2” の名称が Abstract と Conclusion にしかなく、説明がない • 他の箇所では Combined や Contrastive が⽤いられている • ArtEmis v1 が同グループの取り組みであると伝わらない状態。もしかすると、 Camera Ready の際に匿名化の解除をあまりしていなかった？ • Figure によって、Old (v1) と New (Contrastive) の上下が違う • 本⽂と Figure に対応が取れていないところがある 2022/08/07 37

Slide 38

Slide 38 text

感情のデータセットにおける「偏り」 • Story Cloze Test においても、 v1.0 において感情に偏りがあることが指摘されている [Sharma et al., ACL2018] • worker に Story Ending の「正解／不正解」を書いてもらうと、正解⽂は Positive に偏る (VADER Sentiment >= 0.05) • v1.5 では、worker の執筆時の instruction を増やし、分布の差を是正 • The new restrictions were: ‘Each sentence should stay within the same subject area of the story,’ and ‘The number of words in the Right and Wrong sentences should not differ by more than 2 words,’ and ‘When possible, the Right and Wrong sentences should try to keep a similar tone/sentiment as one another.’ 2022/08/07 38

Slide 39

Slide 39 text

感情のデータセットにおける「偏り」 • データセット内の分布を恣意的に誘導することが、逆に偏りになってしまわないか？ • 著者らは「機械学習モデルは、⼈間のようにバイアスを理解・活⽤できない」と説明し、「バイアスは無くすべき」と主張 • Humans are usually capable of recognizing biases when they cause more harm than good. However, machine learning models do not have a similar ability to detect and reason about biases. Therefore, if models learn from a biased dataset, they will make biased decisions. Consequently, reducing biases from datasets is crucial in increasing acceptance and trust in machine learning models. It is equally important to detect biases in datasets, especially in affective datasets, used to train models that emulate human affect or interact directly with them. 2022/08/07 39

Slide 40

Slide 40 text

今回ご紹介した論⽂ 2022/08/07 40 [Project Page]

Slide 41

Slide 41 text

Appendix 2022/08/07 41

Slide 42

Slide 42 text

クラウドソーシング • どのように Worker を選んだか？ • Appendix の 5 Dataset Statistics に説明あり • よい worker にはボーナスを⽀払う • よくない worker には、これ以降このタスクを受けないように連絡 • 報酬は⾼めに設定（2.4ドル / h） 2022/08/07 42

Slide 43

Slide 43 text

(Text) Dataset における Emotions の扱い (GoEmotions の論⽂より) • 2.2 Emotion Taxonomy • One of the main aspects distinguishing our dataset is its emotion taxonomy. The vast majority of existing datasets contain annotations for minor variations of the 6 basic emotion categories (joy, anger, fear, sadness, disgust, and surprise) proposed by Ekman (1992a) and/or along affective dimensions (valence and arousal) that underpin the circumplex model of affect (Russell, 2003; Buechel and Hahn, 2017). • Recent advances in psychology have offered new conceptual and methodological approaches to capturing the more complex “semantic space” of emotion (Cowen et al., 2019a) by studying the distribution of emotion responses to a diverse array of stimuli via computational techniques. Studies guided by these principles have identified 27 distinct varieties of emotional experience conveyed by short videos (Cowen and Keltner, 2017), 13 by music (Cowen et al., in press), 28 by facial expression (Cowen and Keltner, 2019), 12 by speech prosody (Cowen et al., 2019b), and 24 by nonverbal vocalization (Cowen et al., 2018). 2022/08/07 43

Slide 44

Slide 44 text

Art の Genres, Styles と感情 • v1 の論⽂の supplement では、art の genre や style と感情との関わりについて分析が⾏われている • 27 art-styles (abstract, baroque, cubism, impressionism, etc.) • 右図 • 45 genres (cityscape, landscape, portrait, still life, etc.) • “landscape” は annotator の agreement が最も強く、そして、 positive な感情と結び付けられる傾向にある 2022/08/07 44