Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Networks with Natural Language Explanations

wing.nus
November 04, 2021

Neural Networks with Natural Language Explanations

In order for machine learning to garner widespread public adoption, models must be able to provide human-understandable and robust explanations for their decisions. In this talk, we will focus on the emerging direction of building neural networks that learn from natural language explanations at training time and generate such explanations at testing time. We will see an extension of the large Stanford Natural Language Inference (SNLI) dataset with an additional layer of human-written natural language explanations for the entailment relations, called e-SNLI. We will see different types of architectures that incorporate these explanations into their training process and generate them at testing time. We will further see a similar approach for vision-language models, where we introduce e-SNLI-VE, a large dataset of visual-textual entailment with natural language explanations. We will also see e-ViL, a benchmark for natural language explanations in vision-language tasks, and e-UG, the current SOTA model for natural language explanation generation on such tasks. These large datasets of explanations open up a range of research directions for using natural language explanations both for improving models and for asserting their trust. However, models trained on such datasets may nonetheless generate inconsistent explanations. An adversarial framework for sanity checking models over generating such inconsistencies will be presented.

Seminar page: https://wing-nus.github.io/ir-seminar/speaker-oana
YouTube Video recording: https://www.youtube.com/watch?v=-bopzFou7jQ

wing.nus

November 04, 2021
Tweet

More Decks by wing.nus

Other Decks in Education

Transcript

  1. Neural Networks with Natural Language Explanations Oana-Maria Camburu Postdoctoral Researcher

    University of Oxford Talk at National University of Singapore, Thursday 28th of October 2021
  2. Outline 1. Introduction 2. e-SNLI: Natural Language Inference with Natural

    Language Explanations (NeurIPS’18) 3. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (ICCV’21) 4. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations (ACL’20) 5. Summary and Open Questions 6. Q&A
  3. Introduction Deep neural networks have been responsible for SOTA in

    many areas, but are still typically black-boxes. Even when they have high performance on test sets, they are notoriously prone to • relying on spurious correlations in datasets (Chen et al., 2016; Gururangan et al., 2018; McCoy et al., 2019) • adversarial attacks (Szegedy et al., 2014; Moosavi-Dezfooli et al., 2017; Jia and Liang, 2017) • exacerbating discrimination (Bolukbasi et al., 2016; Buolamwini and Gebru, 2018) https://www.wired.com/2016/10/understanding-artificial-intelligence-decisions/ D. Chen et al., A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task, ACL, 2016. T. McCoy et al., Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference, ACL, 2019. S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. C. Szegedy et al., Intriguing Properties of Neural Networks, ICLR, 2014. S. Moosavi-Dezfooli et al., Universal Adversarial Perturbations, CVPR, 2017. R. Jia and P. Liang, Adversarial Examples for Evaluating Reading Comprehension Systems, EMNLP, 2017. T. Bolukbasi et al., Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS, 2016. J. Buolamwini and T. Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, FAT, 2018. Debugging and Improvement Fairness and Accountability Trust Acceptance
  4. Introduction Types of explanations

  5. Introduction Types of explanations 1. Feature-based “The plot was not

    interesting, but the actors were great.” M. Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classifier, KDD, 2016. S. Lundberg and S. Lee, A Unified Approach to Interpreting Model Predictions, NeurIPS, 2017. M. Sundararajan, Axiomatic Attribution for Deep Networks, ICML, 2017.
  6. Introduction Types of explanations 1. Feature-based 2. Training-based Training set

    AI prediction P. Koh and P. Liang, Understanding Black-box Predictions via Influence Functions, ICML, 2017.
  7. Introduction Types of explanations 1. Feature-based 2. Training-based 3. Concept-based

    https://medium.com/intuit-engineering/navigating-the-sea-of-explainability-f6cc4631f473 B. Kim et al., Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), ICML, 2018
  8. Introduction Types of explanations 1. Feature-based 2. Training-based 3. Concept-based

    4. Surrogate models A. Alaa and M. van der Shaar, Demystifying Black-box Models with Symbolic Metamodels, NeurIPS, 2019
  9. Introduction Types of explanations 1. Feature-based 2. Training-based 3. Concept-based

    4. Surrogate models 5. Natural language (in this talk) . . .
  10. Introduction I am stopping because there is a person crossing.

    Models that • learn from natural language explanations that justify the ground-truth labels at training time • generate natural language explanations for their predictions at testing time Why are you stopping?
  11. Introduction Motivation • Humans do not learn just from labeled

    examples. Heider (1958): people look for explanations to improve their understanding of someone or something so that they can derive a stable model that can be used for prediction and control. • • Human-friendly explanations. Kaur et al. (2020): “data scientists over-trust and misuse interpretability tools” and “few of our participants [197 data scientists] were able to accurately describe the visualizations output by these tools. F. Heider, The psychology of interpersonal relations, New York: Wiley, 1958 H. Kaur et al. ,Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning, CHI 2020. Explaining already trained AI systems may help us spot problems, but there is no generic solution to guide the systems into learning correct decision-making process.
  12. Introduction Ingredients Natural language explanations (NLEs) Models that can learn

    from natural language explanations and generate such explanations
  13. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. e-SNLI: one of the first and largest datasets of NLEs Two types of architectures for models with NLEs A glimpse into spurious correlations and NLEs
  14. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. SNLI (Bowman et al., 2015) S. Bowman et al., A large annotated corpus for learning natural language inference, EMNLP, 2015.
  15. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. e-SNLI
  16. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. e-SNLI • Train (~550k): 1 explanation per instance • Dev and Test (~10k): 3 explanations per instance • Quality control ◦ require annotators to highlight salient tokens and use them in the explanation ◦ several in-browser checks and re-annotation of trivial explanations Premise: A man in a blue shirt standing in front of a garage-like structure painted with geometric designs. Hypothesis: A man is repainting a garage Label: Neutral Explanation: It is not clear whether the man is repainting the garage or not. Premise: A black race car starts up in front of a crowd of people. Hypothesis: A man is driving down a lonely road. Label: Contradiction Explanation: A road can’t be lonely if there is a crowd of people. Premise: Two women are embracing while holding to go packages. Hypothesis: Two women are holding food in their hands. Label: Entailment Explanation: Holding to go packages implies that there is food in it.
  17. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Typical SNLI architecture Sentence Encoder Sentence Encoder Premise Hypothesis u v Fully-Connected Layers Label (u, v, |u - v|, u * v)
  18. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Predict-then-Explain Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label
  19. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Predict-then-Explain Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation
  20. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Explain-then-Predict Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation Explanation Sentence Encoder
  21. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Models Sentence Encoder Explanation Generator = BiLSTM-Max = LSTM or LSTM with Attention Sentence Encoder Sentence Encoder Premise Hypothesis u v Fully-Connected Layers Label (u, v, |u - v|, u * v) No-Expl Sentence Encoder Sentence Encoder Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation Predict-then-Explain Premise Hypothesis u v (u, v, |u - v|, u * v) Fully-Connected Layers Label Explanation Generator Explanation Explanation Sentence Encoder Sentence Encoder Sentence Encoder Explain-then-Predict
  22. e-SNLI: Natural Language Inference with Natural Language Explanations @ NeurIPS’18

    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Inter-annotator BLEU: 22.51 Results
  23. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Results
  24. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Spurious correlations SNLI is notorious for spurious correlations • Hypothesis → Label 67% (Gururangan et al., 2018) ◦ “tall”, “sad” → neutral ◦ “animal”, “outside” → entailment ◦ “sleeping”, negations → contradiction S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. Sentence Encoder Sentence Encoder Premise Hypothesis u v Fully-Connected Layers Label 67% !!
  25. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Spurious correlations SNLI is notorious for spurious correlations • Hypothesis → Label 67% (Gururangan et al., 2018) ◦ “tall”, “sad” → neutral ◦ “animal”, “outside” → entailment ◦ “sleeping”, negations → contradiction S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. Sentence Encoder Sentence Encoder Hypothesis u v Fully-Connected Layers Label 67% !! Can explanations rely on the same spurious correlations? Sentence Encoder Hypothesis v Explanation ? Explanation Generator Premise
  26. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Spurious correlations SNLI is notorious for spurious correlations • Hypothesis → Label 67% (Gururangan et al., 2018) ◦ “tall”, “sad” → neutral ◦ “animal”, “outside” → entailment ◦ “sleeping”, negations → contradiction S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019. Sentence Encoder Sentence Encoder Hypothesis u v Fully-Connected Layers Label 67% !! Can explanations rely on the same spurious correlations? Far less! Sentence Encoder Hypothesis v 6% Explanation Generator Premise Explanation
  27. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18 O.

    Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom. Dataset and Code are available at https://github.com/OanaMariaCamburu/e-SNLI
  28. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 🗃 e-SNLI-VE: the largest vision-language with NLEs dataset 📏 e-ViL: The first benchmark for vision-language tasks with NLEs ⚖ Evaluation of automatic metrics for NLEs 🏅 e-UG: State-of-the-art across 3 datasets
  29. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz SNLI Premise: A man and woman getting married. Hypothesis: A man and a woman inside a church. Label: Neutral Flickr30k Caption: A man and woman getting married. Xie. et al., A novel task for fine-grained image understanding, 2019 (Xie et al., 2019)
  30. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz SNLI-VE (Xie et al., 2019) Premise: Hypothesis: A man is driving down a lonely road. Label: Contradiction Premise: Hypothesis: Two women are holding food in their hands. Label: Entailment Xie. et al., A novel task for fine-grained image understanding, 2019 Premise: Hypothesis: A man is repainting a garage Label: Neutral
  31. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Premise: Hypothesis: A man is driving down a lonely road. Label: Contradiction Explanation: A road can’t be lonely if there is a crowd of people. Premise: Hypothesis: Two women are holding food in their hands. Label: Entailment Explanation: Holding to go packages implies that there is food in it. Premise: Hypothesis: A man is repainting a garage Label: Neutral Contradiction Explanation: The man is just staying in front of the garage with no signs of repairing being done.
  32. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter
  33. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Premise: Hypothesis: A man and women inside a church. Original Label: Neutral Caption 2/5: A man and a woman that is holding flowers smile in the sunlight. Caption 4/5: A happy couple enjoying their open air wedding. Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter
  34. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Premise: Hypothesis: There is a person in the store. Original Label: Entailment Explanation: It is already mentioned that someone is in the store. Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter
  35. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections Manual re-annotation of neutrals in dev and test sets False neutral tagger Keyword Filters Similarity Filter Premise: Hypothesis: A woman is painting a mural while another woman supervises. Original Label: Entailment Explanation: A woman is painting a mural on the wall and there is another woman who supervises. Textual Premise: A woman painting a mural on the wall while another woman supervises.
  36. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz e-SNLI-VE = SNLI-VE + e-SNLI + Corrections
  37. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 How do we evaluate NLEs? • Automatic metrics? • How many annotators? • How many samples? • What kind of annotators? • correct/incorrect • Scale from 1 to 5 • better/same/worse than ground truth • … ❌ Lack of unified evaluation framework Q: Is the woman happy? Answer: Yes Predicted NLE: She is throwing her hands in the air in celebration. Ground-truth NLE: She has a big smile on her face.
  38. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 e-ViL: The Benchmark • A re-usable framework for evaluating NLEs ◦ Based on human evaluation ◦ 300 samples per model-dataset pair ◦ 3 annotators per example ◦ For every predicted explanation, ground-truth is evaluated ◦ “Given the image, does the hypothesis/question justify the answer”? ◦ No / Weak No / Weak Yes / Yes • Use it to compare four models on three datasets ◦ The datasets: e-SNLI-VE, VCR, VQA-X ◦ 19,194 evaluations from 234 human participants
  39. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 e-ViL: The Datasets VCR (Zellers et al., 2019) e-SNLI-VE VQA-X (Park et al., 2018) Premise: Hypothesis: The man and woman are about to go on a honeymoon. Label: Neutral Explanation: Not all couples go on a honeymoon right after getting married. Park et al., Multimodal explanations: Justifying decisions and pointing to the evidence. In CVPR, 2018. Zellers et al., From recognition to cognition: Visual commonsense reasoning. In CVPR, 2019.
  40. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 📏 e-ViL: The Models Park et al., Multimodal explanations: Justifying decisions and pointing to the evidence. CVPR 2018. Wu and Mooney, Faithful multimodal explanation for visual question answering. BlackboxNLP 2019. Marasović et al., Natural language rationales with full-stack visual reasoning: From pixels to semantic frames to commonsense graphs. EMNLP Findings 2020.
  41. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz 🏅e-UG Contextualized embeddings of image and question Answer Explanation Chen et al., UNITER: Universal image-text representation learning. ECCV 2020.
  42. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results
  43. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results
  44. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results
  45. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results
  46. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results VL Model Multi-modal feature vector Predict task Explanation module Explanation Backprop vs. VL Model Image + Question Multi-modal feature vector Predict task Image + Question Can explanations increase task performance?
  47. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Results ⚖ Automatic metrics Overall small correlation In some cases, no significant correlation METEOR and BERTScore are best overall
  48. e-ViL: A Dataset and Benchmark for Natural Language Explanations in

    Vision-Language Tasks @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz Dataset, Code, Evaluation Framework available at https://github.com/maximek3/e-ViL
  49. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Models may generate inconsistent NLEs. Adversarial attack for detecting the generation of inconsistent NLEs (novel seq2seq scenario).
  50. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Models may generate inconsistent NLEs. Definition: A pair of instances for which a model generates two logically contradictory explanations forms an inconsistency.
  51. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Examples of inconsistencies Self-Driving Cars Question Answering Visual Question Answering Recommender Systems
  52. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. A model providing inconsistent explanations can have either of the two undesired behaviours: a) at least one of the explanations is not faithfully describing the decision-making process of the model b) the model relied on a faulty decision-making process for at least one of the instances. Q: Is there an animal in the image? A: Yes, because dogs are animals. Q’: Is there a Husky in the image? A’: No, because dogs are not animals. If both explanations in A and A’ are faithful to the decision-making process of the model (i.e., if a) does not hold), then for the second instance (A’) the model relied on the faulty decision-making process that dogs are not animals.
  53. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Goal: Checking if models are robust against generating inconsistent natural language explanations. Setup: Model m provides a prediction and a natural language explanation, e m (x), for its prediction on the instance x. Find an instance x’ such that e m (x) and e m (x’) are inconsistent. High-level Approach (A) For an instance x and the explanations e m (x), create a list of explanations that are inconsistent with e m (x). (B) For an inconsistent explanation i e created at step (A) find an input x’ such that e m (x’) = i e .
  54. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Context-free vs. Context-dependent Inconsistencies Q: Is there an animal in the image? A: Yes, because dogs are animals. Q’: Is there a Husky in the image? A’: No, because dogs are not animals. Context-free: inconsistency no matter what input, e.g., explanations formed by pure background knowledge. Inconsistent
  55. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Context-free vs. Context-dependent Inconsistencies Q: Is there an animal in the image? A: Yes, there is a dog in the image. Q’: Is there a Husky in the image? A’: No, there is no dog in the image. Q: Is there an animal in the image? A: Yes, because dogs are animals. Q’: Is there a Husky in the image? A’: No, because dogs are not animals. Context-free: inconsistency no matter what input, e.g., explanations formed by pure background knowledge. Context-dependent: inconsistency depends on parts of the input. Context Inconsistent Inconsistent
  56. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Context-free vs. Context-dependent Inconsistencies Q: Is there an animal in the image? A: Yes, there is a dog in the image. Q’: Is there a Husky in the image? A’: No, there is no dog in the image. Q: Is there an animal in the image? A: Yes, because dogs are animals. Q’: Is there a Husky in the image? A’: No, because dogs are not animals. Context-free: inconsistency no matter what input, e.g., explanations formed by pure background knowledge. Context-dependent: inconsistency depends on parts of the input. Inconsistent NOT Inconsistent
  57. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find the variable part x’ v of an input x’ such that e m (x’) = i e . Q: Is there an animal in the image? A: Yes, because dogs are animals. x : e m (x) : Q’: Is there a Husky in the image? (B) Search for x’ v that leads the model to generate i e . A’: ..., because dogs are not animals. : x’ (A) List of explanations inconsistent with the explanation “dogs are animals”. Dogs are not animals. Not all dogs are animals. A dog is not an animal. … i e x’ v x v x c
  58. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find the variable part of an input x’ v such that e m (x’) = i e . Q: Is there an animal in the image? A: Yes, because dogs are animals. x : e m (x) : A’: ..., because dogs are not animals. (A) List of explanations inconsistent with the explanation “dogs are animals”. Dogs are not animals. Not all dogs are animals. A dog is not an animal. … i e x v x c ?
  59. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). For a given task, one may define a set of logical rules to transform an explanation into an inconsistent counterpart: 1. Negation: “A dog is an animal.” “A dog is not an animal.” 2. Task-specific antonyms: “The car continues because it is green light.” “The car continues because it is red light.” 3. Swap explanations of mutually exclusive labels: Recommender(movie X, user U) = No because “X is a horror.” Recommender(movie Z, user U) = No because “Z is a comedy.” Recommender(movie Y, user U) = Yes because “Z is a comedy.” Recommender(movie K, user U) = Yes because “K is a horror.”
  60. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find the variable part of an input x’ v such that e m (x’) = i e .
  61. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find the variable part of an input x’ v such that e m (x’) = i e . Train a model, RevExpl, to go from an explanation e m (x) to the input that caused m to generate the explanation. Is there an animal in the image? Yes, because dogs are animals. Dogs are animals. m(x) = (pred(x), e m (x)) RevExpl(xc, em(x)) = xv Is there an animal in the image?
  62. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Approach I. Train RevExpl(x c , e m (x)) = x v II. For each explanation e = e m (x): a) Create a list of statements that are inconsistent with e, call it I e • by using logic rules: negation, task-specific antonyms, and swapping between explanations for mutually exclusive labels b) For each e’ in I e , query RevExpl to get the variable part of a reverse input: x’ v = RevExpl(x c , e’) c) Query m on the reverse input x’ = (x c , x v ’) and get the reverse explanation e m (x’) d) Check if e m (x’) is inconsistent with e m (x) • by checking if e m (x’) is in I e
  63. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. High-level Approach (A) For an instance x and the explanation e m (x), create a list of statements that are inconsistent with e m (x). (B) For an inconsistent statement i e created at step (A), find an input x’ such that e m (x’) = i e . Novel Adversarial Setup 1) No predefined adversarial targets (label attacks do not have this issue). 2) At step (B), the model has to generate a full target sequence: the goal is to generate the exact explanation that was identified at step (A) as inconsistent with the explanation e m (x). Current attacks focus on the presence/absence of a very small number of tokens in the target sequence (Cheng et al., 2020, Zhao et al., 2018). 3) Adversarial inputs x’ do not have to be a paraphrase or a small perturbation of the original input (can happen as a byproduct). Current works focus on adversaries being paraphrases or a minor deviation from the original input (Belinkov and Bisk, 2018).
  64. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. e-SNLI x = (premise, hypothesis). We revert only the hypothesis. To create the list of inconsistent explanations for any generated explanation, we use: • negation: if the explanation contains “not” or “n’t” we delete it • swapping explanations (the 3 labels are mutually exclusive) by identifying templates for each label: x c x v Entailment • X is a type of Y • X implies Y • X is the same as Y • X is a rephrasing of Y • X is synonymous with Y . . . Neutral • not all X are Y • not every X is Y • just because X does not mean Y • X is not necessarily Y • X does not imply Y . . . Contradiction • cannot be X and Y at the same time • X is not Y • X is the opposite of Y • it is either X or Y . . . If e m (x) does not contain a negation or does not fit in any template, we discard it (2.6% of e-SNLI test set were discarded).
  65. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. If e m (x) corresponds to a template from a label, then create the list of inconsistent statements I e by replacing the associated X and Y in the templates of the other two labels. Example: e m (x) = “Dog is a type of animal.” matches the entailment template “X is a type of Y” with X = “dog” and Y = “animal”. Replace X and Y in all the neutral and contradiction templates, we obtain the list of inconsistencies: Neutral • not all dog are animal • not every dog is animal • just because dog does not mean animal • dog is not necessarily animal • dog does not imply animal . . . Contradiction • cannot be dog and animal at the same time • dog is not animal • dog is the opposite of animal • it is either dog or animal . . .
  66. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. • RevExpl(premise, explanation) = hypothesis ◦ same architecture as Expl-Pred-Att ◦ 32.78% test accuracy (exact string match for the generated hypothesis) • Manual annotation of 100 random reverse hypothesis gives 82% to be realistic ◦ majority of unrealistic are due to repetition of a token • Success rate of our adversarial method for finding inconsistencies 4.51% on the e-SNLI test set ◦ 443 distinct pairs of inconsistent explanations Best model from before: Expl-Pred-Att • 64.27% correct explanations
  67. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
  68. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language

    Explanations @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom. Manual scanning had no success and even point out to robust explanations • first 50 instances of test • explanations including woman, prisoner, snowboarding • manually created adversarial inputs (Carmona et al., 2018) P: A bird is above water. H: A swan is above water. E: Not all birds are a swan. P: A small child watches the outside world through a window. H: A small toddler watches the outside world through a window. E: Not every child is a toddler. P: A swan is above water. H: A bird is above water. E: A swan is a bird. P: A small toddler watches the outside world through a window. H: A small child watches the outside world through a window. E: A toddler is a small child. V. Carmona et al., Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness, NAACL, 2018.
  69. Summary e-SNLI and e-SNLI-VE: two large datasets of NLEs Models

    with NLEs A glimpse into spurious correlations and NLEs 📏 A benchmark for visual-language tasks with NLEs ⚖ Evaluation of automatic metrics for NLEs Inconsistencies of NLEs Adversarial attack for detecting the generation of inconsistent NLEs (novel seq2seq scenario)
  70. Open Questions Faithfulness Explanations to increase task performance Zero/Few-Shot learning

    Automatic evaluation NLEs usefulness in increasing public trust and acceptance
  71. Thank you! @oanacamb Questions?