Neural Networks with Natural Language Explanations

Slide 1

Slide 1 text

Neural Networks with Natural Language Explanations Oana-Maria Camburu Postdoctoral Researcher University of Oxford Talk at National University of Singapore, Thursday 28th of October 2021

Slide 2

Slide 2 text

Outline 1. Introduction 2. e-SNLI: Natural Language Inference with Natural Language Explanations (NeurIPS’18) 3. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (ICCV’21) 4. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations (ACL’20) 5. Summary and Open Questions 6. Q&A

Slide 3

Slide 3 text

Introduction Deep neural networks have been responsible for SOTA in many areas, but are still typically black-boxes. Even when they have high performance on test sets, they are notoriously prone to ● relying on spurious correlations in datasets (Chen et al., 2016; Gururangan et al., 2018; McCoy et al., 2019) ● adversarial attacks (Szegedy et al., 2014; Moosavi-Dezfooli et al., 2017; Jia and Liang, 2017) ● exacerbating discrimination (Bolukbasi et al., 2016; Buolamwini and Gebru, 2018) https://www.wired.com/2016/10/understanding-artiﬁcial-intelligence-decisions/ D. Chen et al., A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task, ACL, 2016. T. McCoy et al., Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference, ACL, 2019. S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. C. Szegedy et al., Intriguing Properties of Neural Networks, ICLR, 2014. S. Moosavi-Dezfooli et al., Universal Adversarial Perturbations, CVPR, 2017. R. Jia and P. Liang, Adversarial Examples for Evaluating Reading Comprehension Systems, EMNLP, 2017. T. Bolukbasi et al., Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS, 2016. J. Buolamwini and T. Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classiﬁcation, FAT, 2018. Debugging and Improvement Fairness and Accountability Trust Acceptance

Slide 4

Slide 4 text

Introduction Types of explanations

Slide 5

Slide 5 text

Introduction Types of explanations 1. Feature-based “The plot was not interesting, but the actors were great.” M. Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classiﬁer, KDD, 2016. S. Lundberg and S. Lee, A Uniﬁed Approach to Interpreting Model Predictions, NeurIPS, 2017. M. Sundararajan, Axiomatic Attribution for Deep Networks, ICML, 2017.

Slide 6

Slide 6 text

Introduction Types of explanations 1. Feature-based 2. Training-based Training set AI prediction P. Koh and P. Liang, Understanding Black-box Predictions via Inﬂuence Functions, ICML, 2017.

Slide 7

Slide 7 text

Introduction Types of explanations 1. Feature-based 2. Training-based 3. Concept-based https://medium.com/intuit-engineering/navigating-the-sea-of-explainability-f6cc4631f473 B. Kim et al., Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), ICML, 2018

Slide 8

Slide 8 text

Introduction Types of explanations 1. Feature-based 2. Training-based 3. Concept-based 4. Surrogate models A. Alaa and M. van der Shaar, Demystifying Black-box Models with Symbolic Metamodels, NeurIPS, 2019

Slide 9

Slide 9 text

Introduction Types of explanations 1. Feature-based 2. Training-based 3. Concept-based 4. Surrogate models 5. Natural language (in this talk) . . .

Slide 10

Slide 10 text

Introduction I am stopping because there is a person crossing. Models that ● learn from natural language explanations that justify the ground-truth labels at training time ● generate natural language explanations for their predictions at testing time Why are you stopping?

Slide 11

Slide 11 text

Introduction Motivation ● Humans do not learn just from labeled examples. Heider (1958): people look for explanations to improve their understanding of someone or something so that they can derive a stable model that can be used for prediction and control. ● ● Human-friendly explanations. Kaur et al. (2020): “data scientists over-trust and misuse interpretability tools” and “few of our participants [197 data scientists] were able to accurately describe the visualizations output by these tools. F. Heider, The psychology of interpersonal relations, New York: Wiley, 1958 H. Kaur et al. ,Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning, CHI 2020. Explaining already trained AI systems may help us spot problems, but there is no generic solution to guide the systems into learning correct decision-making process.

Slide 12

Slide 12 text

Introduction Ingredients Natural language explanations (NLEs) Models that can learn from natural language explanations and generate such explanations

Slide 13

Slide 13 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. e-SNLI: one of the first and largest datasets of NLEs Two types of architectures for models with NLEs A glimpse into spurious correlations and NLEs

Slide 14

Slide 14 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. SNLI (Bowman et al., 2015) S. Bowman et al., A large annotated corpus for learning natural language inference, EMNLP, 2015.

Slide 15

Slide 15 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. e-SNLI

Slide 16

Slide 16 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. e-SNLI ● Train (~550k): 1 explanation per instance ● Dev and Test (~10k): 3 explanations per instance ● Quality control ○ require annotators to highlight salient tokens and use them in the explanation ○ several in-browser checks and re-annotation of trivial explanations Premise: A man in a blue shirt standing in front of a garage-like structure painted with geometric designs. Hypothesis: A man is repainting a garage Label: Neutral Explanation: It is not clear whether the man is repainting the garage or not. Premise: A black race car starts up in front of a crowd of people. Hypothesis: A man is driving down a lonely road. Label: Contradiction Explanation: A road can’t be lonely if there is a crowd of people. Premise: Two women are embracing while holding to go packages. Hypothesis: Two women are holding food in their hands. Label: Entailment Explanation: Holding to go packages implies that there is food in it.

Slide 17

Slide 17 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Typical SNLI architecture Sentence Encoder Sentence Encoder Premise Hypothesis u v Fully-Connected Layers Label (u, v, |u - v|, u * v)

Slide 18

Slide 18 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Predict-then-Explain Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label

Slide 19

Slide 19 text

Slide 20

Slide 20 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Explain-then-Predict Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation Explanation Sentence Encoder

Slide 21

Slide 21 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Sentence Encoder Explanation Generator = BiLSTM-Max = LSTM or LSTM with Attention Sentence Encoder Sentence Encoder Premise Hypothesis u v Fully-Connected Layers Label (u, v, |u - v|, u * v) No-Expl Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation Predict-then-Explain Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation Explanation Sentence Encoder Sentence Encoder Sentence Encoder Explain-then-Predict

Slide 22

Slide 22 text

e-SNLI: Natural Language Inference with Natural Language Explanations @ NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Inter-annotator BLEU: 22.51 Results

Slide 23

Slide 23 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Results

Slide 24

Slide 24 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Spurious correlations SNLI is notorious for spurious correlations ● Hypothesis → Label 67% (Gururangan et al., 2018) ○ “tall”, “sad” → neutral ○ “animal”, “outside” → entailment ○ “sleeping”, negations → contradiction S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. Sentence Encoder Sentence Encoder Premise Hypothesis u v Fully-Connected Layers Label 67% !!

Slide 25

Slide 25 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Spurious correlations SNLI is notorious for spurious correlations ● Hypothesis → Label 67% (Gururangan et al., 2018) ○ “tall”, “sad” → neutral ○ “animal”, “outside” → entailment ○ “sleeping”, negations → contradiction S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. Sentence Encoder Sentence Encoder Hypothesis u v Fully-Connected Layers Label 67% !! Can explanations rely on the same spurious correlations? Sentence Encoder Hypothesis v Explanation ? Explanation Generator Premise

Slide 26

Slide 26 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Spurious correlations SNLI is notorious for spurious correlations ● Hypothesis → Label 67% (Gururangan et al., 2018) ○ “tall”, “sad” → neutral ○ “animal”, “outside” → entailment ○ “sleeping”, negations → contradiction S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. Sentence Encoder Sentence Encoder Hypothesis u v Fully-Connected Layers Label 67% !! Can explanations rely on the same spurious correlations? Far less! Sentence Encoder Hypothesis v 6% Explanation Generator Premise Explanation

Slide 27

Slide 27 text

e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Dataset and Code are available at https://github.com/OanaMariaCamburu/e-SNLI

Slide 28

Slide 28 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 🗃 e-SNLI-VE: the largest vision-language with NLEs dataset 📏 e-ViL: The first benchmark for vision-language tasks with NLEs ⚖ Evaluation of automatic metrics for NLEs 🏅 e-UG: State-of-the-art across 3 datasets

Slide 29

Slide 29 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz SNLI Premise: A man and woman getting married. Hypothesis: A man and a woman inside a church. Label: Neutral Flickr30k Caption: A man and woman getting married. Xie. et al., A novel task for ﬁne-grained image understanding, 2019 (Xie et al., 2019)

Slide 30

Slide 30 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz SNLI-VE (Xie et al., 2019) Premise: Hypothesis: A man is driving down a lonely road. Label: Contradiction Premise: Hypothesis: Two women are holding food in their hands. Label: Entailment Xie. et al., A novel task for ﬁne-grained image understanding, 2019 Premise: Hypothesis: A man is repainting a garage Label: Neutral

Slide 31

Slide 31 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Premise: Hypothesis: A man is driving down a lonely road. Label: Contradiction Explanation: A road can’t be lonely if there is a crowd of people. Premise: Hypothesis: Two women are holding food in their hands. Label: Entailment Explanation: Holding to go packages implies that there is food in it. Premise: Hypothesis: A man is repainting a garage Label: Neutral Contradiction Explanation: The man is just staying in front of the garage with no signs of repairing being done.

Slide 32

Slide 32 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter

Slide 33

Slide 33 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Premise: Hypothesis: A man and women inside a church. Original Label: Neutral Caption 2/5: A man and a woman that is holding flowers smile in the sunlight. Caption 4/5: A happy couple enjoying their open air wedding. Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter

Slide 34

Slide 34 text

Slide 35

Slide 35 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter Premise: Hypothesis: A woman is painting a mural while another woman supervises. Original Label: Entailment Explanation: A woman is painting a mural on the wall and there is another woman who supervises. Textual Premise: A woman painting a mural on the wall while another woman supervises.

Slide 36

Slide 36 text

Slide 37

Slide 37 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 How do we evaluate NLEs? ● Automatic metrics? ● How many annotators? ● How many samples? ● What kind of annotators? ● correct/incorrect ● Scale from 1 to 5 ● better/same/worse than ground truth ● … ❌ Lack of unified evaluation framework Q: Is the woman happy? Answer: Yes Predicted NLE: She is throwing her hands in the air in celebration. Ground-truth NLE: She has a big smile on her face.

Slide 38

Slide 38 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 e-ViL: The Benchmark ● A re-usable framework for evaluating NLEs ○ Based on human evaluation ○ 300 samples per model-dataset pair ○ 3 annotators per example ○ For every predicted explanation, ground-truth is evaluated ○ “Given the image, does the hypothesis/question justify the answer”? ○ No / Weak No / Weak Yes / Yes ● Use it to compare four models on three datasets ○ The datasets: e-SNLI-VE, VCR, VQA-X ○ 19,194 evaluations from 234 human participants

Slide 39

Slide 39 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 e-ViL: The Datasets VCR (Zellers et al., 2019) e-SNLI-VE VQA-X (Park et al., 2018) Premise: Hypothesis: The man and woman are about to go on a honeymoon. Label: Neutral Explanation: Not all couples go on a honeymoon right after getting married. Park et al., Multimodal explanations: Justifying decisions and pointing to the evidence. In CVPR, 2018. Zellers et al., From recognition to cognition: Visual commonsense reasoning. In CVPR, 2019.

Slide 40

Slide 40 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 e-ViL: The Models Park et al., Multimodal explanations: Justifying decisions and pointing to the evidence. CVPR 2018. Wu and Mooney, Faithful multimodal explanation for visual question answering. BlackboxNLP 2019. Marasović et al., Natural language rationales with full-stack visual reasoning: From pixels to semantic frames to commonsense graphs. EMNLP Findings 2020.

Slide 41

Slide 41 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 🏅e-UG Contextualized embeddings of image and question Answer Explanation Chen et al., UNITER: Universal image-text representation learning. ECCV 2020.

Slide 42

Slide 42 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results

Slide 43

Slide 43 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results

Slide 44

Slide 44 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results

Slide 45

Slide 45 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results

Slide 46

Slide 46 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results VL Model Multi-modal feature vector Predict task Explanation module Explanation Backprop vs. VL Model Image + Question Multi-modal feature vector Predict task Image + Question Can explanations increase task performance?

Slide 47

Slide 47 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results ⚖ Automatic metrics Overall small correlation In some cases, no significant correlation METEOR and BERTScore are best overall

Slide 48

Slide 48 text

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Dataset, Code, Evaluation Framework available at https://github.com/maximek3/e-ViL

Slide 49

Slide 49 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Models may generate inconsistent NLEs. Adversarial attack for detecting the generation of inconsistent NLEs (novel seq2seq scenario).

Slide 50

Slide 50 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Models may generate inconsistent NLEs. Definition: A pair of instances for which a model generates two logically contradictory explanations forms an inconsistency.

Slide 51

Slide 51 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Examples of inconsistencies Self-Driving Cars Question Answering Visual Question Answering Recommender Systems

Slide 52

Slide 52 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. A model providing inconsistent explanations can have either of the two undesired behaviours: a) at least one of the explanations is not faithfully describing the decision-making process of the model b) the model relied on a faulty decision-making process for at least one of the instances. Q: Is there an animal in the image? A: Yes, because dogs are animals. Q’: Is there a Husky in the image? A’: No, because dogs are not animals. If both explanations in A and A’ are faithful to the decision-making process of the model (i.e., if a) does not hold), then for the second instance (A’) the model relied on the faulty decision-making process that dogs are not animals.

Slide 53

Slide 53 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Goal: Checking if models are robust against generating inconsistent natural language explanations. Setup: Model m provides a prediction and a natural language explanation, e m (x), for its prediction on the instance x. Find an instance x’ such that e m (x) and e m (x’) are inconsistent. High-level Approach (A) For an instance x and the explanations e m (x), create a list of explanations that are inconsistent with e m (x). (B) For an inconsistent explanation i e created at step (A) find an input x’ such that e m (x’) = i e .

Slide 54

Slide 54 text

Slide 55

Slide 55 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Context-free vs. Context-dependent Inconsistencies Q: Is there an animal in the image? A: Yes, there is a dog in the image. Q’: Is there a Husky in the image? A’: No, there is no dog in the image. Q: Is there an animal in the image? A: Yes, because dogs are animals. Q’: Is there a Husky in the image? A’: No, because dogs are not animals. Context-free: inconsistency no matter what input, e.g., explanations formed by pure background knowledge. Context-dependent: inconsistency depends on parts of the input. Context Inconsistent Inconsistent

Slide 56

Slide 56 text

Slide 57

Slide 57 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find the variable part x’ v of an input x’ such that e m (x’) = i e . Q: Is there an animal in the image? A: Yes, because dogs are animals. x : e m (x) : Q’: Is there a Husky in the image? (B) Search for x’ v that leads the model to generate i e . A’: ..., because dogs are not animals. : x’ (A) List of explanations inconsistent with the explanation “dogs are animals”. Dogs are not animals. Not all dogs are animals. A dog is not an animal. … i e x’ v x v x c

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). For a given task, one may define a set of logical rules to transform an explanation into an inconsistent counterpart: 1. Negation: “A dog is an animal.” “A dog is not an animal.” 2. Task-specific antonyms: “The car continues because it is green light.” “The car continues because it is red light.” 3. Swap explanations of mutually exclusive labels: Recommender(movie X, user U) = No because “X is a horror.” Recommender(movie Z, user U) = No because “Z is a comedy.” Recommender(movie Y, user U) = Yes because “Z is a comedy.” Recommender(movie K, user U) = Yes because “K is a horror.”

Slide 60

Slide 60 text

Slide 61

Slide 61 text

Slide 62

Slide 62 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Approach I. Train RevExpl(x c , e m (x)) = x v II. For each explanation e = e m (x): a) Create a list of statements that are inconsistent with e, call it I e ● by using logic rules: negation, task-specific antonyms, and swapping between explanations for mutually exclusive labels b) For each e’ in I e , query RevExpl to get the variable part of a reverse input: x’ v = RevExpl(x c , e’) c) Query m on the reverse input x’ = (x c , x v ’) and get the reverse explanation e m (x’) d) Check if e m (x’) is inconsistent with e m (x) ● by checking if e m (x’) is in I e

Slide 63

Slide 63 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find an input x’ such that e m (x’) = i e . Novel Adversarial Setup 1) No predefined adversarial targets (label attacks do not have this issue). 2) At step (B), the model has to generate a full target sequence: the goal is to generate the exact explanation that was identified at step (A) as inconsistent with the explanation e m (x). Current attacks focus on the presence/absence of a very small number of tokens in the target sequence (Cheng et al., 2020, Zhao et al., 2018). 3) Adversarial inputs x’ do not have to be a paraphrase or a small perturbation of the original input (can happen as a byproduct). Current works focus on adversaries being paraphrases or a minor deviation from the original input (Belinkov and Bisk, 2018).

Slide 64

Slide 64 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. e-SNLI x = (premise, hypothesis). We revert only the hypothesis. To create the list of inconsistent explanations for any generated explanation, we use: ● negation: if the explanation contains “not” or “n’t” we delete it ● swapping explanations (the 3 labels are mutually exclusive) by identifying templates for each label: x c x v Entailment ● X is a type of Y ● X implies Y ● X is the same as Y ● X is a rephrasing of Y ● X is synonymous with Y . . . Neutral ● not all X are Y ● not every X is Y ● just because X does not mean Y ● X is not necessarily Y ● X does not imply Y . . . Contradiction ● cannot be X and Y at the same time ● X is not Y ● X is the opposite of Y ● it is either X or Y . . . If e m (x) does not contain a negation or does not fit in any template, we discard it (2.6% of e-SNLI test set were discarded).

Slide 65

Slide 65 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. If e m (x) corresponds to a template from a label, then create the list of inconsistent statements I e by replacing the associated X and Y in the templates of the other two labels. Example: e m (x) = “Dog is a type of animal.” matches the entailment template “X is a type of Y” with X = “dog” and Y = “animal”. Replace X and Y in all the neutral and contradiction templates, we obtain the list of inconsistencies: Neutral ● not all dog are animal ● not every dog is animal ● just because dog does not mean animal ● dog is not necessarily animal ● dog does not imply animal . . . Contradiction ● cannot be dog and animal at the same time ● dog is not animal ● dog is the opposite of animal ● it is either dog or animal . . .

Slide 66

Slide 66 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. ● RevExpl(premise, explanation) = hypothesis ○ same architecture as Expl-Pred-Att ○ 32.78% test accuracy (exact string match for the generated hypothesis) ● Manual annotation of 100 random reverse hypothesis gives 82% to be realistic ○ majority of unrealistic are due to repetition of a token ● Success rate of our adversarial method for finding inconsistencies 4.51% on the e-SNLI test set ○ 443 distinct pairs of inconsistent explanations Best model from before: Expl-Pred-Att ● 64.27% correct explanations

Slide 67

Slide 67 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.

Slide 68

Slide 68 text

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Manual scanning had no success and even point out to robust explanations ● first 50 instances of test ● explanations including woman, prisoner, snowboarding ● manually created adversarial inputs (Carmona et al., 2018) P: A bird is above water. H: A swan is above water. E: Not all birds are a swan. P: A small child watches the outside world through a window. H: A small toddler watches the outside world through a window. E: Not every child is a toddler. P: A swan is above water. H: A bird is above water. E: A swan is a bird. P: A small toddler watches the outside world through a window. H: A small child watches the outside world through a window. E: A toddler is a small child. V. Carmona et al., Behavior Analysis of NLI Models: Uncovering the Inﬂuence of Three Factors on Robustness, NAACL, 2018.

Slide 69

Slide 69 text

Summary e-SNLI and e-SNLI-VE: two large datasets of NLEs Models with NLEs A glimpse into spurious correlations and NLEs 📏 A benchmark for visual-language tasks with NLEs ⚖ Evaluation of automatic metrics for NLEs Inconsistencies of NLEs Adversarial attack for detecting the generation of inconsistent NLEs (novel seq2seq scenario)

Slide 70

Slide 70 text

Open Questions Faithfulness Explanations to increase task performance Zero/Few-Shot learning Automatic evaluation NLEs usefulness in increasing public trust and acceptance

Slide 71

Slide 71 text

Thank you! @oanacamb Questions?