Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Networks with Natural Language Explanations

wing.nus
November 04, 2021

Neural Networks with Natural Language Explanations

In order for machine learning to garner widespread public adoption, models must be able to provide human-understandable and robust explanations for their decisions. In this talk, we will focus on the emerging direction of building neural networks that learn from natural language explanations at training time and generate such explanations at testing time. We will see an extension of the large Stanford Natural Language Inference (SNLI) dataset with an additional layer of human-written natural language explanations for the entailment relations, called e-SNLI. We will see different types of architectures that incorporate these explanations into their training process and generate them at testing time. We will further see a similar approach for vision-language models, where we introduce e-SNLI-VE, a large dataset of visual-textual entailment with natural language explanations. We will also see e-ViL, a benchmark for natural language explanations in vision-language tasks, and e-UG, the current SOTA model for natural language explanation generation on such tasks. These large datasets of explanations open up a range of research directions for using natural language explanations both for improving models and for asserting their trust. However, models trained on such datasets may nonetheless generate inconsistent explanations. An adversarial framework for sanity checking models over generating such inconsistencies will be presented.

Seminar page: https://wing-nus.github.io/ir-seminar/speaker-oana
YouTube Video recording: https://www.youtube.com/watch?v=-bopzFou7jQ

wing.nus

November 04, 2021
Tweet

More Decks by wing.nus

Other Decks in Education

Transcript

  1. Neural Networks with
    Natural Language Explanations
    Oana-Maria Camburu
    Postdoctoral Researcher
    University of Oxford
    Talk at National University of Singapore, Thursday 28th of October 2021

    View Slide

  2. Outline
    1. Introduction
    2. e-SNLI: Natural Language Inference with Natural Language Explanations (NeurIPS’18)
    3. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (ICCV’21)
    4. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations (ACL’20)
    5. Summary and Open Questions
    6. Q&A

    View Slide

  3. Introduction
    Deep neural networks have been responsible for SOTA in many areas, but are still typically black-boxes.
    Even when they have high performance on test sets, they are notoriously prone to
    ● relying on spurious correlations in datasets (Chen et al., 2016; Gururangan et al., 2018; McCoy et al., 2019)
    ● adversarial attacks (Szegedy et al., 2014; Moosavi-Dezfooli et al., 2017; Jia and Liang, 2017)
    ● exacerbating discrimination (Bolukbasi et al., 2016; Buolamwini and Gebru, 2018)
    https://www.wired.com/2016/10/understanding-artificial-intelligence-decisions/
    D. Chen et al., A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task, ACL, 2016.
    T. McCoy et al., Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference, ACL, 2019.
    S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019.
    C. Szegedy et al., Intriguing Properties of Neural Networks, ICLR, 2014.
    S. Moosavi-Dezfooli et al., Universal Adversarial Perturbations, CVPR, 2017.
    R. Jia and P. Liang, Adversarial Examples for Evaluating Reading Comprehension Systems, EMNLP, 2017.
    T. Bolukbasi et al., Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS, 2016.
    J. Buolamwini and T. Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, FAT, 2018.
    Debugging and Improvement
    Fairness and Accountability
    Trust
    Acceptance

    View Slide

  4. Introduction
    Types of explanations

    View Slide

  5. Introduction
    Types of explanations
    1. Feature-based
    “The plot was not interesting, but the
    actors were great.”
    M. Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classifier, KDD, 2016.
    S. Lundberg and S. Lee, A Unified Approach to Interpreting Model Predictions, NeurIPS, 2017.
    M. Sundararajan, Axiomatic Attribution for Deep Networks, ICML, 2017.

    View Slide

  6. Introduction
    Types of explanations
    1. Feature-based
    2. Training-based
    Training set
    AI prediction
    P. Koh and P. Liang, Understanding Black-box Predictions via Influence Functions, ICML, 2017.

    View Slide

  7. Introduction
    Types of explanations
    1. Feature-based
    2. Training-based
    3. Concept-based
    https://medium.com/intuit-engineering/navigating-the-sea-of-explainability-f6cc4631f473
    B. Kim et al., Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), ICML, 2018

    View Slide

  8. Introduction
    Types of explanations
    1. Feature-based
    2. Training-based
    3. Concept-based
    4. Surrogate models
    A. Alaa and M. van der Shaar, Demystifying Black-box Models with Symbolic Metamodels, NeurIPS, 2019

    View Slide

  9. Introduction
    Types of explanations
    1. Feature-based
    2. Training-based
    3. Concept-based
    4. Surrogate models
    5. Natural language (in this talk)
    .
    .
    .

    View Slide

  10. Introduction
    I am stopping
    because there
    is a person
    crossing.
    Models that
    ● learn from natural language
    explanations that justify the
    ground-truth labels at training
    time
    ● generate natural language
    explanations for their
    predictions at testing time
    Why are
    you
    stopping?

    View Slide

  11. Introduction
    Motivation
    ● Humans do not learn just from labeled examples. Heider (1958): people look for
    explanations to improve their understanding of someone or something so that they can
    derive a stable model that can be used for prediction and control.

    ● Human-friendly explanations. Kaur et al. (2020): “data scientists over-trust and misuse
    interpretability tools” and “few of our participants [197 data scientists] were able to accurately
    describe the visualizations output by these tools.
    F. Heider, The psychology of interpersonal relations, New York: Wiley, 1958
    H. Kaur et al. ,Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning, CHI 2020.
    Explaining already trained AI systems may help us spot problems, but
    there is no generic solution to guide the systems into learning correct
    decision-making process.

    View Slide

  12. Introduction
    Ingredients
    Natural language explanations
    (NLEs)
    Models that can learn from
    natural language explanations
    and generate such explanations

    View Slide

  13. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    e-SNLI: one of the first and largest datasets of NLEs
    Two types of architectures for models with NLEs
    A glimpse into spurious correlations and NLEs

    View Slide

  14. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    SNLI (Bowman et al., 2015)
    S. Bowman et al., A large annotated corpus for learning natural language inference, EMNLP, 2015.

    View Slide

  15. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    e-SNLI

    View Slide

  16. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    e-SNLI
    ● Train (~550k): 1 explanation per instance
    ● Dev and Test (~10k): 3 explanations per instance
    ● Quality control
    ○ require annotators to highlight salient tokens and use them in the explanation
    ○ several in-browser checks and re-annotation of trivial explanations
    Premise:
    A man in a blue shirt standing in
    front of a garage-like structure
    painted with geometric designs.
    Hypothesis:
    A man is repainting a garage
    Label:
    Neutral
    Explanation: It is not clear
    whether the man is repainting the
    garage or not.
    Premise:
    A black race car starts up in front
    of a crowd of people.
    Hypothesis:
    A man is driving down a lonely
    road.
    Label:
    Contradiction
    Explanation: A road can’t be
    lonely if there is a crowd of
    people.
    Premise:
    Two women are embracing while
    holding to go packages.
    Hypothesis:
    Two women are holding food in
    their hands.
    Label:
    Entailment
    Explanation: Holding to go
    packages implies that there is
    food in it.

    View Slide

  17. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Models
    Typical SNLI architecture
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    Fully-Connected Layers
    Label
    (u, v, |u - v|, u * v)

    View Slide

  18. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Models
    Predict-then-Explain
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    (u, v, |u - v|, u * v)
    Fully-Connected Layers
    Label

    View Slide

  19. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Models
    Predict-then-Explain
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    (u, v, |u - v|, u * v)
    Fully-Connected Layers
    Label
    Explanation Generator
    Explanation

    View Slide

  20. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Models
    Explain-then-Predict
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    (u, v, |u - v|, u * v)
    Fully-Connected Layers
    Label
    Explanation Generator
    Explanation
    Explanation
    Sentence Encoder

    View Slide

  21. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Models
    Sentence Encoder Explanation Generator
    = BiLSTM-Max = LSTM or LSTM with Attention
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    Fully-Connected Layers
    Label
    (u, v, |u - v|, u * v)
    No-Expl
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    (u, v, |u - v|, u * v)
    Fully-Connected Layers
    Label
    Explanation
    Generator
    Explanation
    Predict-then-Explain
    Premise Hypothesis
    u v
    (u, v, |u - v|, u * v)
    Fully-Connected
    Layers
    Label
    Explanation
    Generator
    Explanation
    Explanation
    Sentence
    Encoder
    Sentence
    Encoder
    Sentence
    Encoder
    Explain-then-Predict

    View Slide

  22. e-SNLI: Natural Language Inference with Natural Language Explanations @ NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Inter-annotator BLEU: 22.51
    Results

    View Slide

  23. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Results

    View Slide

  24. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Spurious correlations
    SNLI is notorious for spurious correlations
    ● Hypothesis → Label 67% (Gururangan et al., 2018)
    ○ “tall”, “sad” → neutral
    ○ “animal”, “outside” → entailment
    ○ “sleeping”, negations → contradiction
    S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019.
    Sentence Encoder Sentence Encoder
    Premise Hypothesis
    u v
    Fully-Connected
    Layers
    Label
    67% !!

    View Slide

  25. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Spurious correlations
    SNLI is notorious for spurious correlations
    ● Hypothesis → Label 67% (Gururangan et al., 2018)
    ○ “tall”, “sad” → neutral
    ○ “animal”, “outside” → entailment
    ○ “sleeping”, negations → contradiction
    S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019.
    Sentence Encoder Sentence Encoder
    Hypothesis
    u v
    Fully-Connected
    Layers
    Label
    67% !!
    Can explanations rely on the
    same spurious correlations?
    Sentence Encoder
    Hypothesis
    v
    Explanation
    ?
    Explanation Generator
    Premise

    View Slide

  26. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Spurious correlations
    SNLI is notorious for spurious correlations
    ● Hypothesis → Label 67% (Gururangan et al., 2018)
    ○ “tall”, “sad” → neutral
    ○ “animal”, “outside” → entailment
    ○ “sleeping”, negations → contradiction
    S. Gururangan et al., Annotation Artifacts in Natural Language Inference Data, NAACL, 2019.
    Sentence Encoder Sentence Encoder
    Hypothesis
    u v
    Fully-Connected
    Layers
    Label
    67% !!
    Can explanations rely on the
    same spurious correlations?
    Far less!
    Sentence Encoder
    Hypothesis
    v
    6%
    Explanation Generator
    Premise
    Explanation

    View Slide

  27. e-SNLI: Natural Language Inference with Natural Language Explanations @NeurIPS’18
    O. Camburu, T. Rocktäschel, T. Lukasiewicz, P. Blunsom.
    Dataset and Code are available at
    https://github.com/OanaMariaCamburu/e-SNLI

    View Slide

  28. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    🗃 e-SNLI-VE: the largest vision-language with NLEs dataset
    📏 e-ViL: The first benchmark for vision-language tasks with NLEs
    ⚖ Evaluation of automatic metrics for NLEs
    🏅 e-UG: State-of-the-art across 3 datasets

    View Slide

  29. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    SNLI
    Premise:
    A man and woman getting married.
    Hypothesis:
    A man and a woman inside a church.
    Label:
    Neutral
    Flickr30k
    Caption:
    A man and woman getting married.
    Xie. et al., A novel task for fine-grained image understanding, 2019
    (Xie et al., 2019)

    View Slide

  30. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    SNLI-VE (Xie et al., 2019)
    Premise:
    Hypothesis:
    A man is driving down a lonely
    road.
    Label:
    Contradiction
    Premise:
    Hypothesis:
    Two women are holding food in
    their hands.
    Label:
    Entailment
    Xie. et al., A novel task for fine-grained image understanding, 2019
    Premise:
    Hypothesis:
    A man is repainting a garage
    Label:
    Neutral

    View Slide

  31. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    e-SNLI-VE = SNLI-VE + e-SNLI + Corrections
    Premise:
    Hypothesis:
    A man is driving down a lonely
    road.
    Label:
    Contradiction
    Explanation: A road can’t be
    lonely if there is a crowd of
    people.
    Premise:
    Hypothesis:
    Two women are holding food in
    their hands.
    Label:
    Entailment
    Explanation: Holding to go
    packages implies that there is
    food in it.
    Premise:
    Hypothesis:
    A man is repainting a garage
    Label:
    Neutral Contradiction
    Explanation: The man is just
    staying in front of the garage
    with no signs of repairing being
    done.

    View Slide

  32. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    e-SNLI-VE = SNLI-VE + e-SNLI + Corrections
    Manual re-annotation of
    neutrals in dev and test sets
    False neutral tagger
    Keyword Filters
    Similarity Filter

    View Slide

  33. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    e-SNLI-VE = SNLI-VE + e-SNLI + Corrections
    Premise:
    Hypothesis:
    A man and women inside a church.
    Original Label:
    Neutral
    Caption 2/5:
    A man and a woman that is holding flowers
    smile in the sunlight.
    Caption 4/5:
    A happy couple enjoying their open air wedding.
    Manual re-annotation of
    neutrals in dev and test sets
    False neutral tagger
    Keyword Filters
    Similarity Filter

    View Slide

  34. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    e-SNLI-VE = SNLI-VE + e-SNLI + Corrections
    Premise:
    Hypothesis:
    There is a person in the store.
    Original Label:
    Entailment
    Explanation:
    It is already mentioned that someone is in the
    store.
    Manual re-annotation of
    neutrals in dev and test sets
    False neutral tagger
    Keyword Filters
    Similarity Filter

    View Slide

  35. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    e-SNLI-VE = SNLI-VE + e-SNLI + Corrections
    Manual re-annotation of
    neutrals in dev and test sets
    False neutral tagger
    Keyword Filters
    Similarity Filter
    Premise:
    Hypothesis:
    A woman is painting a mural while another
    woman supervises.
    Original Label:
    Entailment
    Explanation:
    A woman is painting a mural on the wall and
    there is another woman who supervises.
    Textual Premise:
    A woman painting a
    mural on the wall
    while another
    woman supervises.

    View Slide

  36. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    e-SNLI-VE = SNLI-VE + e-SNLI + Corrections

    View Slide

  37. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    📏 How do we evaluate NLEs?
    ● Automatic metrics?
    ● How many annotators?
    ● How many samples?
    ● What kind of annotators?
    ● correct/incorrect
    ● Scale from 1 to 5
    ● better/same/worse than ground truth
    ● …
    ❌ Lack of unified evaluation framework
    Q: Is the woman happy?
    Answer: Yes
    Predicted NLE: She is throwing her hands in the
    air in celebration.
    Ground-truth NLE: She has a big smile on her
    face.

    View Slide

  38. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    📏 e-ViL: The Benchmark
    ● A re-usable framework for evaluating NLEs
    ○ Based on human evaluation
    ○ 300 samples per model-dataset pair
    ○ 3 annotators per example
    ○ For every predicted explanation, ground-truth is evaluated
    ○ “Given the image, does the hypothesis/question justify the
    answer”?
    ○ No / Weak No / Weak Yes / Yes
    ● Use it to compare four models on three datasets
    ○ The datasets: e-SNLI-VE, VCR, VQA-X
    ○ 19,194 evaluations from 234 human participants

    View Slide

  39. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    📏 e-ViL: The Datasets
    VCR (Zellers et al., 2019)
    e-SNLI-VE VQA-X (Park et al., 2018)
    Premise:
    Hypothesis:
    The man and woman are about to go
    on a honeymoon.
    Label: Neutral
    Explanation:
    Not all couples go on a honeymoon
    right after getting married.
    Park et al., Multimodal explanations: Justifying decisions and pointing to the evidence. In CVPR, 2018.
    Zellers et al., From recognition to cognition: Visual commonsense reasoning. In CVPR, 2019.

    View Slide

  40. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    📏 e-ViL: The Models
    Park et al., Multimodal explanations: Justifying decisions and pointing to the evidence. CVPR 2018.
    Wu and Mooney, Faithful multimodal explanation for visual question answering. BlackboxNLP 2019.
    Marasović et al., Natural language rationales with full-stack visual reasoning: From pixels to semantic frames to commonsense graphs. EMNLP Findings 2020.

    View Slide

  41. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    🏅e-UG
    Contextualized embeddings of image and question
    Answer
    Explanation
    Chen et al., UNITER: Universal image-text representation learning. ECCV 2020.

    View Slide

  42. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Results

    View Slide

  43. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Results

    View Slide

  44. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Results

    View Slide

  45. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Results

    View Slide

  46. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Results
    VL Model
    Multi-modal feature vector
    Predict
    task
    Explanation
    module
    Explanation
    Backprop
    vs.
    VL Model
    Image + Question
    Multi-modal feature vector
    Predict
    task
    Image + Question
    Can explanations
    increase task
    performance?

    View Slide

  47. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Results
    ⚖ Automatic metrics
    Overall small correlation
    In some cases, no significant correlation
    METEOR and BERTScore are best overall

    View Slide

  48. e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    @ICCV’21 M. Kayser, O. Camburu, L. Salewski, C. Emde, V. Do, Z. Akata, T. Lukasiewicz
    Dataset, Code, Evaluation Framework available at
    https://github.com/maximek3/e-ViL

    View Slide

  49. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Models may generate inconsistent NLEs.
    Adversarial attack for detecting the generation of inconsistent NLEs (novel seq2seq scenario).

    View Slide

  50. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Models may generate inconsistent NLEs.
    Definition: A pair of instances for which a model generates two logically contradictory explanations forms an inconsistency.

    View Slide

  51. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Examples of inconsistencies
    Self-Driving Cars Question Answering
    Visual Question Answering
    Recommender Systems

    View Slide

  52. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    A model providing inconsistent explanations can have either of the two undesired behaviours:
    a) at least one of the explanations is not faithfully describing the decision-making process of the model
    b) the model relied on a faulty decision-making process for at least one of the instances.
    Q: Is there an
    animal in the
    image?
    A: Yes,
    because dogs
    are animals.
    Q’: Is there a
    Husky in the
    image?
    A’: No, because
    dogs are not
    animals.
    If both explanations in A and A’ are faithful to the
    decision-making process of the model (i.e., if a) does
    not hold), then for the second instance (A’) the model
    relied on the faulty decision-making process that dogs
    are not animals.

    View Slide

  53. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Goal: Checking if models are robust against generating inconsistent natural language explanations.
    Setup: Model m provides a prediction and a natural language explanation, e
    m
    (x), for its prediction on the instance x.
    Find an instance x’ such that e
    m
    (x) and e
    m
    (x’) are inconsistent.
    High-level Approach
    (A) For an instance x and the explanations e
    m
    (x), create a list of explanations that are inconsistent with e
    m
    (x).
    (B) For an inconsistent explanation i
    e
    created at step (A) find an input x’ such that e
    m
    (x’) = i
    e
    .

    View Slide

  54. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Context-free vs. Context-dependent Inconsistencies
    Q: Is there
    an animal in
    the image?
    A: Yes,
    because dogs
    are animals.
    Q’: Is there a
    Husky in the
    image?
    A’: No, because
    dogs are not
    animals.
    Context-free: inconsistency no matter what
    input, e.g., explanations formed by pure
    background knowledge.
    Inconsistent

    View Slide

  55. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Context-free vs. Context-dependent Inconsistencies
    Q: Is there
    an animal in
    the image?
    A: Yes, there
    is a dog in
    the image.
    Q’: Is there a
    Husky in the
    image?
    A’: No, there is no
    dog in the image.
    Q: Is there
    an animal in
    the image?
    A: Yes,
    because dogs
    are animals.
    Q’: Is there a
    Husky in the
    image?
    A’: No, because
    dogs are not
    animals.
    Context-free: inconsistency no matter what
    input, e.g., explanations formed by pure
    background knowledge.
    Context-dependent: inconsistency depends on
    parts of the input.
    Context
    Inconsistent
    Inconsistent

    View Slide

  56. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Context-free vs. Context-dependent Inconsistencies
    Q: Is there
    an animal in
    the image?
    A: Yes, there
    is a dog in
    the image.
    Q’: Is there a
    Husky in the
    image?
    A’: No, there is no
    dog in the image.
    Q: Is there
    an animal in
    the image?
    A: Yes,
    because dogs
    are animals.
    Q’: Is there a
    Husky in the
    image?
    A’: No, because
    dogs are not
    animals.
    Context-free: inconsistency no matter what
    input, e.g., explanations formed by pure
    background knowledge.
    Context-dependent: inconsistency depends on
    parts of the input.
    Inconsistent NOT Inconsistent

    View Slide

  57. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    High-level Approach
    (A) For an instance x and the explanation e
    m
    (x), create a list of statements that are inconsistent with e
    m
    (x).
    (B) For an inconsistent statement i
    e
    created at step (A), find the variable part x’
    v
    of an input x’ such that e
    m
    (x’) = i
    e
    .
    Q: Is there an
    animal in the
    image?
    A: Yes,
    because dogs
    are animals.
    x :
    e
    m
    (x) :
    Q’: Is there a
    Husky in the
    image?
    (B) Search for x’
    v
    that leads
    the model to generate i
    e
    .
    A’: ..., because
    dogs are not
    animals.
    : x’
    (A) List of explanations
    inconsistent with the explanation
    “dogs are animals”.
    Dogs are not animals.
    Not all dogs are animals.
    A dog is not an animal.

    i
    e
    x’
    v
    x
    v
    x
    c

    View Slide

  58. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    High-level Approach
    (A) For an instance x and the explanation e
    m
    (x), create a list of statements that are inconsistent with e
    m
    (x).
    (B) For an inconsistent statement i
    e
    created at step (A), find the variable part of an input x’
    v
    such that e
    m
    (x’) = i
    e
    .
    Q: Is there an
    animal in the
    image?
    A: Yes,
    because dogs
    are animals.
    x :
    e
    m
    (x) :
    A’: ..., because
    dogs are not
    animals.
    (A) List of explanations
    inconsistent with the explanation
    “dogs are animals”.
    Dogs are not animals.
    Not all dogs are animals.
    A dog is not an animal.

    i
    e
    x
    v
    x
    c
    ?

    View Slide

  59. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    High-level Approach
    (A) For an instance x and the explanation e
    m
    (x), create a list of statements that are inconsistent with e
    m
    (x).
    For a given task, one may define a set of logical rules to transform an explanation into an inconsistent counterpart:
    1. Negation: “A dog is an animal.” “A dog is not an animal.”
    2. Task-specific antonyms: “The car continues because it is green light.” “The car continues because it is red light.”
    3. Swap explanations of mutually exclusive labels:
    Recommender(movie X, user U) = No because “X is a horror.” Recommender(movie Z, user U) = No because “Z is a comedy.”
    Recommender(movie Y, user U) = Yes because “Z is a comedy.” Recommender(movie K, user U) = Yes because “K is a horror.”

    View Slide

  60. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    High-level Approach
    (A) For an instance x and the explanation e
    m
    (x), create a list of statements that are inconsistent with e
    m
    (x).
    (B) For an inconsistent statement i
    e
    created at step (A), find the variable part of an input x’
    v
    such that e
    m
    (x’) = i
    e
    .

    View Slide

  61. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    High-level Approach
    (A) For an instance x and the explanation e
    m
    (x), create a list of statements that are inconsistent with e
    m
    (x).
    (B) For an inconsistent statement i
    e
    created at step (A), find the variable part of an input x’
    v
    such that e
    m
    (x’) = i
    e
    .
    Train a model, RevExpl, to go from an explanation e
    m
    (x) to the input that caused m to generate the explanation.
    Is there an
    animal in the
    image?
    Yes, because
    dogs are
    animals.
    Dogs are
    animals.
    m(x) = (pred(x), e
    m
    (x)) RevExpl(xc, em(x)) = xv
    Is there an
    animal in the
    image?

    View Slide

  62. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Approach
    I. Train RevExpl(x
    c
    , e
    m
    (x)) = x
    v
    II. For each explanation e = e
    m
    (x):
    a) Create a list of statements that are inconsistent with e, call it I
    e
    ● by using logic rules: negation, task-specific antonyms, and swapping between explanations for mutually
    exclusive labels
    b) For each e’ in I
    e
    , query RevExpl to get the variable part of a reverse input: x’
    v
    = RevExpl(x
    c
    , e’)
    c) Query m on the reverse input x’ = (x
    c
    , x
    v
    ’) and get the reverse explanation e
    m
    (x’)
    d) Check if e
    m
    (x’) is inconsistent with e
    m
    (x)
    ● by checking if e
    m
    (x’) is in I
    e

    View Slide

  63. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    High-level Approach
    (A) For an instance x and the explanation e
    m
    (x), create a list of statements that are inconsistent with e
    m
    (x).
    (B) For an inconsistent statement i
    e
    created at step (A), find an input x’ such that e
    m
    (x’) = i
    e
    .
    Novel Adversarial Setup
    1) No predefined adversarial targets (label attacks do not have this issue).
    2) At step (B), the model has to generate a full target sequence: the goal is to generate the exact explanation that was
    identified at step (A) as inconsistent with the explanation e
    m
    (x). Current attacks focus on the presence/absence of a very
    small number of tokens in the target sequence (Cheng et al., 2020, Zhao et al., 2018).
    3) Adversarial inputs x’ do not have to be a paraphrase or a small perturbation of the original input (can happen as a
    byproduct). Current works focus on adversaries being paraphrases or a minor deviation from the original input
    (Belinkov and Bisk, 2018).

    View Slide

  64. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    e-SNLI
    x = (premise, hypothesis). We revert only the hypothesis.
    To create the list of inconsistent explanations for any generated explanation, we use:
    ● negation: if the explanation contains “not” or “n’t” we delete it
    ● swapping explanations (the 3 labels are mutually exclusive) by identifying templates for each label:
    x
    c
    x
    v
    Entailment
    ● X is a type of Y
    ● X implies Y
    ● X is the same as Y
    ● X is a rephrasing of Y
    ● X is synonymous with Y
    . . .
    Neutral
    ● not all X are Y
    ● not every X is Y
    ● just because X does not mean Y
    ● X is not necessarily Y
    ● X does not imply Y
    . . .
    Contradiction
    ● cannot be X and Y at the same time
    ● X is not Y
    ● X is the opposite of Y
    ● it is either X or Y
    . . .
    If e
    m
    (x) does not contain a negation or does not fit in any template, we discard it (2.6% of e-SNLI test set were discarded).

    View Slide

  65. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    If e
    m
    (x) corresponds to a template from a label, then create the list of inconsistent statements I
    e
    by replacing the associated X and Y in the
    templates of the other two labels.
    Example: e
    m
    (x) = “Dog is a type of animal.” matches the entailment template “X is a type of Y” with X = “dog” and Y = “animal”.
    Replace X and Y in all the neutral and contradiction templates, we obtain the list of inconsistencies:
    Neutral
    ● not all dog are animal
    ● not every dog is animal
    ● just because dog does not mean animal
    ● dog is not necessarily animal
    ● dog does not imply animal
    . . .
    Contradiction
    ● cannot be dog and animal at the same time
    ● dog is not animal
    ● dog is the opposite of animal
    ● it is either dog or animal
    . . .

    View Slide

  66. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    ● RevExpl(premise, explanation) = hypothesis
    ○ same architecture as Expl-Pred-Att
    ○ 32.78% test accuracy (exact string match for the generated hypothesis)
    ● Manual annotation of 100 random reverse hypothesis gives 82% to be realistic
    ○ majority of unrealistic are due to repetition of a token
    ● Success rate of our adversarial method for finding inconsistencies 4.51% on the e-SNLI test set
    ○ 443 distinct pairs of inconsistent explanations
    Best model from before: Expl-Pred-Att
    ● 64.27% correct explanations

    View Slide

  67. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.

    View Slide

  68. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
    @ACL’20 O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, P. Blunsom.
    Manual scanning had no success and even point out to robust explanations
    ● first 50 instances of test
    ● explanations including woman, prisoner, snowboarding
    ● manually created adversarial inputs (Carmona et al., 2018)
    P: A bird is above water.
    H: A swan is above water.
    E: Not all birds are a swan.
    P: A small child watches the
    outside world through a
    window.
    H: A small toddler watches the
    outside world through a
    window.
    E: Not every child is a toddler.
    P: A swan is above water.
    H: A bird is above water.
    E: A swan is a bird.
    P: A small toddler watches the
    outside world through a
    window.
    H: A small child watches the
    outside world through a
    window.
    E: A toddler is a small child.
    V. Carmona et al., Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness, NAACL, 2018.

    View Slide

  69. Summary
    e-SNLI and e-SNLI-VE: two large datasets of NLEs
    Models with NLEs
    A glimpse into spurious correlations and NLEs
    📏 A benchmark for visual-language tasks with NLEs
    ⚖ Evaluation of automatic metrics for NLEs
    Inconsistencies of NLEs
    Adversarial attack for detecting the generation of inconsistent NLEs (novel seq2seq scenario)

    View Slide

  70. Open Questions
    Faithfulness
    Explanations to increase task performance
    Zero/Few-Shot learning
    Automatic evaluation
    NLEs usefulness in increasing public trust and acceptance

    View Slide

  71. Thank you!
    @oanacamb
    Questions?

    View Slide