$30 off During Our Annual Pro Sale. View Details »

20191102_ACL2019_adversarial_examples_in_NLP_YoheiKIKUTA

yoppe
November 02, 2019

 20191102_ACL2019_adversarial_examples_in_NLP_YoheiKIKUTA

yoppe

November 02, 2019
Tweet

More Decks by yoppe

Other Decks in Research

Transcript

  1. ACL2019

    Adversarial Examples in NLP
    2019/11/02

    @yohei_kikuta

    View Slide

  2. ঺հ࿦จ
    • Generating Natural Language Adversarial Examples through
    Probability Weighted Word Saliency (long)

    • Generating Fluent Adversarial Examples for Natural Languages
    (short)

    • Robust Neural Machine Translation with Doubly Adversarial
    Inputs (long)

    • Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training,
    and Model Development for Multi-Hop QA (long)
    2/25

    View Slide

  3. ࿦จϝϞ
    • Generating Natural Language …:

    https://github.com/yoheikikuta/paper-reading/issues/41

    • Generating Fluent Adversarial …:

    https://github.com/yoheikikuta/paper-reading/issues/42

    • Robust Neural Machine …:

    https://github.com/yoheikikuta/paper-reading/issues/43

    • Avoiding Reasoning Shortcuts …:

    https://github.com/yoheikikuta/paper-reading/issues/44
    3/25

    View Slide

  4. Adversarial ○○○
    • Adversarial Network (ఢରతωοτϫʔΫ)

    యܕతʹ͸ generator ͱ discriminator Λରཱతʹֶश

    ߴੑೳͳ generator Λ࡞Δ͜ͱ͕Ͱ͖ɺ͔ͦ͜Β༷ʑʹൃల

    • Adversarial Example (ఢରతαϯϓϧ)

    Ϟσϧ͕ޡೝࣝ͢ΔΑ͏ೖྗσʔλʹઁಈΛՃ͑Δ

    ࡞੒ࣗମ΍ͦͷੑ࣭ʹڵຯ͕͋Δʢಛʹը૾ೝࣝ෼໺Ͱ੝Μʣ

    • Adversarial Training (ఢରతֶश)

    adv. example Λਖ਼ଇԽͱͯ͠ར༻

    ը૾Ͱ͸ adv. example Λ๷͙ͨΊɺNLP Ͱ͸൚ԽੑೳͷͨΊ
    4/25

    View Slide

  5. Adversarial example ͷ෼ྨ
    White box

    Ϟσϧߏ଄΍ඍ෼৘ใ΋࢖͑Δ
    5
    Black box

    ೖग़ྗ͔͠࢖͑ͳ͍
    ※ ଞʹ΋ targeted / non-targeted (ૂͬͨΫϥεʹؒҧ͑ͤ͞Δ͔൱͔) ͷ෼ྨ΋͋Δ
    Input
    Output (softmax)
    Input
    Output (softmax)
    /25

    View Slide

  6. NLP ʹ͓͚Δ adv. example ͷ೉͠͞
    • ը૾ͱҟͳΓ཭ࢄ

    ɾೖྗΛগ͠ม͑Δɺͱ͍͏ૢ࡞͕͠ʹ͍͘

    ɾembedding sp. ͳΒඍ෼Մ͕ͩೖྗ΁໭͢ͷʹૢ࡞Λཁ͢Δ

    • ը૾ͱҟͳΓઁಈͱݴͬͯ΋ਓؒʹ஌֮͠΍͍͢

    ɾޠኮʹޡΓ͕͋Ε͹໨ʹ෇͘ (e.g., mood → mooP)

    ɾจ๏ʹޡΓ͕͋Ε͹໨ʹ෇͘ (e.g., I was … → I is …)

    • ҙຯతʹҟͳΔ΋ͷΛ࡞ͬͯ͠·͍͕ͪ (e.g., knight → night)
    6/25

    View Slide

  7. యܕతͳ adv. example in NLP
    • adv. example ͷ࡞੒

    HotFlip : จࣈϕʔε (e.g., moo“d” → moo“P”)

    Genetic attack : ୯ޠͷஔ׵Λϕʔεͱͨ͠Ҩ఻తΞϧΰϦζϜ

    (e.g, A “runner” wants… → A “racer” wants…)

    • adv. training ΁ͷར༻

    VAT : embedding sp. ͰઁಈΛՃ͑ͨೖྗͰֶश
    7/25

    View Slide

  8. ACL2019 Ͱͷ adv. example in NLP
    • adv. example ͷ࡞੒ : ୯ޠஔ׵ʹ͓͍ͯ୯ޠͷબͼํΛޮ཰Խ

    ɾword saliency ͱ༧ଌ֬཰Λ߹Θͤͯ༏ઌॱҐΛܾఆ

    ɾݴޠϞσϧΛಋೖ͠ Metropolis-Hastings αϯϓϦϯά

    • adv. training ΁ͷར༻ : ΑΓ໰୊ʹಛԽͨ͠ઃఆ

    ɾػց຋༁ʹ͓͍ͯ encoder ͱ decoder ͷೖྗ྆ํʹઁಈ

    ɾ2-hop QA ͷσʔλʹଘࡏ͢Δ 1-hop shortcut Λ๷͙ઁಈ
    8/25

    View Slide

  9. Generating Natural Language Adversarial Examples through
    Probability Weighted Word Saliency (long)

    word-based, non-targeted, white box attack
    9/25

    View Slide

  10. ·ͱΊ
    10
    … was funny as …
    wi−1
    wi
    wi+1
    mirthful

    laughable


    exist

    ….
    equally

    ….
    laughable
    WordNet ͔ΒྨٛޠΛूΊͯ͘Δ

    Named Entity ͷ৔߹͸ ͷޠኮू߹Λ࢖͏
    D − Dytrue
    ω*
    i
    = arg max
    ω′
    i
    [P(ytrue
    |x) − P(ytrue
    |x′
    i
    )]
    exist equally
    softmax(S(x))i
    ⋅ ΔP*
    i
    S(x, ωi
    ) = P(ytrue
    |x) − P(ytrue
    | ̂
    xi
    )
    … was laughable as …
    laughable

    equally


    … was laughable equally …
    ྨٛޠͷதͰϞσϧ͕࠷΋ؒҧ͍ʹۙͮ͘΋ͷΛબͿ
    saliency ͱ༧ଌ֬཰ͷࠩ෼ͰείΞϦϯάͯ͠ฒͼସ͑

    ͸ i ൪໨Λ unknown ʹͨ͠΋ͷͰ ͸ ʹͨ͠΋ͷ
    S
    ̂
    xi
    x*
    i
    ω*
    i
    ΔP*
    i
    = P(ytrue
    |x) − P(ytrue
    |x*
    i
    )
    Ϟσϧ͕༧ଌΛؒҧ͑Δ·Ͱॱ൪ʹஔ׵
    /25

    View Slide

  11. ࣮ݧ݁Ռ
    Dataset Model Original Random Gradient TiWO WS PWWS
    IMDB
    word-CNN 86.55% 45.36% 37.43% 10.00% 9.64% 5.50%
    Bi-dir LSTM 84.86% 37.79% 14.57% 3.57% 3.93% 2.00%
    AG’s News
    char-CNN 89.70% 67.80% 72.14% 58.50% 62.45% 56.30%
    word-CNN 90.56% 74.13% 73.63% 60.70% 59.70% 56.72%
    Yahoo! Answers
    LSTM 92.00% 74.50% 73.80% 62.50% 62.50% 53.00%
    word-CNN 96.01% 82.09% 80.10% 69.15% 66.67% 57.71%
    Classification accuracy of each selected model on the original three datasets and the perturbed datasets
    erent attacking methods. Column 3 (Original) represents the classification accuracy of the model for the
    amples. A lower classification accuracy corresponds to a more effective attacking method.
    Dataset Model Random Gradient TiWO WS PWWS
    IMDB
    word-CNN 22.01% 20.53% 15.06% 14.38% 3.81%
    Bi-dir LSTM 17.77% 12.61% 4.34% 4.68% 3.38%
    AG’s News
    char-CNN 27.43% 27.73% 26.46% 21.94% 18.93%
    word-CNN 22.22% 22.09% 20.28% 20.21% 16.76%
    Yahoo! Answers
    LSTM 40.86% 41.09% 37.14% 39.75% 35.10%
    word-CNN 31.68% 31.29% 30.06% 30.42% 25.43%
    Word replacement rate of each attacking method on the selected models for the three datasets. The lower
    replacement rate, the better the attacking method could be in terms of retaining the semantics of the text.
    nal Prediction Adversarial Prediction Perturbed Texts
    Positive Negative Ah man this movie was funny (laughable) as hell, yet strange. I like
    how they kept the shakespearian language in this movie, it just felt
    ironic because of how idiotic the movie really was. this movie has got
    ence = 96.72% Confidence = 74.78%
    Dataset Model Original Random Gradient TiWO WS PWWS
    IMDB
    word-CNN 86.55% 45.36% 37.43% 10.00% 9.64% 5.50%
    Bi-dir LSTM 84.86% 37.79% 14.57% 3.57% 3.93% 2.00%
    AG’s News
    char-CNN 89.70% 67.80% 72.14% 58.50% 62.45% 56.30%
    word-CNN 90.56% 74.13% 73.63% 60.70% 59.70% 56.72%
    Yahoo! Answers
    LSTM 92.00% 74.50% 73.80% 62.50% 62.50% 53.00%
    word-CNN 96.01% 82.09% 80.10% 69.15% 66.67% 57.71%
    ification accuracy of each selected model on the original three datasets and the perturbed datasets
    attacking methods. Column 3 (Original) represents the classification accuracy of the model for the
    es. A lower classification accuracy corresponds to a more effective attacking method.
    Dataset Model Random Gradient TiWO WS PWWS
    IMDB
    word-CNN 22.01% 20.53% 15.06% 14.38% 3.81%
    Bi-dir LSTM 17.77% 12.61% 4.34% 4.68% 3.38%
    AG’s News
    char-CNN 27.43% 27.73% 26.46% 21.94% 18.93%
    word-CNN 22.22% 22.09% 20.28% 20.21% 16.76%
    Yahoo! Answers
    LSTM 40.86% 41.09% 37.14% 39.75% 35.10%
    word-CNN 31.68% 31.29% 30.06% 30.42% 25.43%
    replacement rate of each attacking method on the selected models for the three datasets. The lower
    cement rate, the better the attacking method could be in terms of retaining the semantics of the text.
    ediction Adversarial Prediction Perturbed Texts
    ve Negative Ah man this movie was funny (laughable) as hell, yet strange. I like
    how they kept the shakespearian language in this movie, it just felt
    = 96.72% Confidence = 74.78%
    ఏҊख๏͸ Probability Weighted Word Saliency (PWWS)
    11
    ֤छσʔληοτʹ͓͚Δ accuracy

    ɾͦΕͧΕ {2, 4, 10} Ϋϥε෼ྨ

    ɾೋ஋෼ྨͷ IMDB ͸ߴ͍੒ޭ཰

    ɾैདྷख๏ΑΓ΋վળ
    ஔ͖׵͑ͨ୯ޠͷׂ߹

    ɾগͳ͍΄Ͳݩͷจʹ͍ۙͷͰخ͍͠

    ɾैདྷख๏ΑΓ΋վળ
    /25
    ݁Ռͷද͸ https://www.aclweb.org/anthology/P19-1103/ ΑΓҾ༻

    View Slide

  12. ۩ମྫ
    Dataset Model Random Gradient TiWO WS PWWS
    IMDB
    word-CNN 22.01% 20.53% 15.06% 14.38% 3.81%
    Bi-dir LSTM 17.77% 12.61% 4.34% 4.68% 3.38%
    AG’s News
    char-CNN 27.43% 27.73% 26.46% 21.94% 18.93%
    word-CNN 22.22% 22.09% 20.28% 20.21% 16.76%
    Yahoo! Answers
    LSTM 40.86% 41.09% 37.14% 39.75% 35.10%
    word-CNN 31.68% 31.29% 30.06% 30.42% 25.43%
    Table 3: Word replacement rate of each attacking method on the selected models for the three datasets. The lower
    the word replacement rate, the better the attacking method could be in terms of retaining the semantics of the text.
    Original Prediction Adversarial Prediction Perturbed Texts
    Positive Negative Ah man this movie was funny (laughable) as hell, yet strange. I like
    how they kept the shakespearian language in this movie, it just felt
    ironic because of how idiotic the movie really was. this movie has got
    to be one of troma’s best movies. highly recommended for some
    senseless fun!
    Confidence = 96.72% Confidence = 74.78%
    Negative Positive The One and the Only! The only really good description of the punk
    movement in the LA in the early 80’s. Also, the definitive documentary
    about legendary bands like the Black Flag and the X. Mainstream
    Americans’ repugnant views about this film are absolutely hilarious
    (uproarious)! How can music be SO diversive in a country of
    supposed liberty...even 20 years after... find out!
    Confidence = 72.40% Confidence = 69.03%
    Table 4: Adversarial example instances in the IMDB dataset with Bi-directional LSTM model. Columns 1 and
    2 represent the category prediction and confidence of the classification model for the original sample and the
    adversarial examples, respectively. In column 3, the green word is the word in the original text, while the red is the
    substitution in the adversarial example.
    Original Prediction Adversarial Prediction Perturbed Texts
    Business Sci/Tech site security gets a recount at rock the vote. grassroots movement to
    register younger voters leaves publishing (publication) tools accessible
    Confidence = 91.26% Confidence = 33.81%
    IMDB, Bi-dir LSTM ͷྫ
    12
    IMDB ͸؆୯ͳͷͰগ਺Λஔ͖׵͑Ε͹Α͍͕ɺଞͷσʔληοτͰ͸ଟ͘Λஔ׵͢Δඞཁ༗
    /25
    ۩ମྫ͸ https://www.aclweb.org/anthology/P19-1103/ ΑΓҾ༻

    View Slide

  13. Generating Fluent Adversarial Examples for Natural Languages
    (short)

    word-based, targeted, white/black box attack
    13/25

    View Slide

  14. ·ͱΊ
    14
    : empty trash cans …
    x
    α(x′|x) = min {1,
    π(x′)g(x|x′)
    π(x)g(x′|x) }
    : the trash cans …
    x′
    Metropolice-Hastings sampling
    ݴޠϞσϧ ͱ෼ྨث Λ࢖ͬͯఆৗ෼෍Λఆٛ
    LM C
    π(x| ˜
    y) ∝ LM(x) ⋅ C(˜
    y|x)
    ఏҊ෼෍ ͸୯ޠͷஔ׵ɾૠೖɾ࡟আͰఆٛ
    g(x′|x)
    g(x′|x) = pr
    TB
    r
    (x′|x) + pi
    TB
    i
    (x′|x) + pd
    TB
    d
    (x′|x)
    TB
    r
    (x′|x) =
    π(ω1
    , …ωm−1
    , ωc( ∈ Q), ωm+1
    , …, ωn
    | ˜
    y)

    ω∈Q
    π(ω1
    , …ωm−1
    , ω, ωm+1
    , …, ωn
    | ˜
    y)
    ͸ϥϯμϜͳ୯ޠΛૠೖͨ͠ͷͪஔ׵
    TB
    i
    (x′|x)
    if (m ൪໨Λআ͍ͨ΋ͷ)
    TB
    d
    (x′|x) = 1 x′ = x−m
    ͸૒ํ޲ Ͱఆٛ͢ΔείΞ ͷ্Ґ n ݸͷू߹
    Q LM S
    SB(ω|x) = LM(ω|x[1:m−1]
    ) ⋅ LMback
    (ω|x[m+1:n]
    )
    SW(ω|x) = SB(ω|x) ⋅ sim (
    ∂loss
    ∂em
    , em
    − e)
    Black box:
    White box:
    ※ White box Ͱ͸ૠೖ΍࡟আͰ͸ඍ෼৘ใ͕ಘΒΕͳ͍ʢ࢖͑ͳ͍ʣͷͰஔ׵ͷΈ
    /25

    View Slide

  15. ࣮ݧ݁Ռ
    adv. attack ͷ੒ޭ཰ (b-: black box, w-: white box)

    ɾInvok# ͸Ϟσϧݺͼग़͠ճ਺ʢগͳ͍ํ͕ྑ͍ʣ

    ɾPPL ͸ݴޠϞσϧͷ਺ࣈʢখ͍͞ = ྲྀெ ͱओுʣ

    ɾ ͸ Metropolis-Hastings ͷ acceptance ratio
    α
    15
    (a) IMDB (b) SNLI
    Figure 3: Invocation-success curves of the attacks.
    Task Approach Succ(%) Invok# PPL ↵(%)
    IMDB
    Genetic 98.7 1427.5 421.1 –
    b-MHA 98.7 1372.1 385.6 17.9
    w-MHA 99.9 748.2 375.3 34.4
    SNLI
    Genetic 76.8 971.9 834.1 –
    b-MHA 86.6 681.7 358.8 9.7
    w-MHA 88.6 525.0 332.4 13.3
    Table 1: Adversarial attack results on IMDB and SNLI.
    The acceptance rates (↵) of M-H sampling are in a rea-
    sonable range.
    filtered by the victim classifier and a language
    model, which leads to the next generation.
    Hyper-parameters. As in the work of Miao et al.
    (2018), MHA is limited to make proposals for at
    most 200 times, and we pre-select 30 candidates at
    each iteration. Constraints are included in MHA
    to forbid any operations on sentimental words (eg.
    “great”) or negation words (eg. “not”) in IMDB
    experiments with SentiWordNet (Esuli and Sebas-
    tiani, 2006; Baccianella et al., 2010). All LSTMs
    w
    -MHA:
    the
    trash cans are sitting on a beach.
    Prediction: hEntailmenti
    Case 2
    Premise: a man is holding a microphone in front of his
    mouth.
    Hypothesis: a male has a device near his mouth.
    Prediction: hEntailmenti
    Genetic: a
    masculine
    has a device near his mouth.
    Prediction: hNeutrali
    b
    -MHA: a man has a device near his
    car
    .
    Prediction: hNeutrali
    w
    -MHA: a man has a device near his
    home
    .
    Prediction: hNeutrali
    Table 2: Adversarial examples generated on SNLI.
    curves of the genetic approach is caused by its
    population-based nature.
    We list detailed results in Table 1. Success rates
    are obtained by invoking the victim model for at
    most 6,000 times. As shown, the gaps of suc-
    cess rates between the models are not very large,
    because all models can give pretty high success
    rate. However, as expected, our proposed MHA
    provides lower perplexity (PPL) 1, which means
    the examples generated by MHA are more likely
    to appear in the corpus of the evaluation language
    model. As the corpus is large enough and the lan-
    guage model for evaluation is strong enough, it in-
    Model
    Attack succ (%)
    Genetic b-MHA w-MHA
    Victim model 98.7 98.7 99.9
    + Genetic adv training 93.8 99.6 100.0
    + b-MHA adv training 93.0 95.7 99.7
    + w-MHA adv training 92.4 97.5 100.0
    Table 3: Robustness test results on IMDB.
    Model
    Acc (%)
    Train # = 10K 30K 100K
    Victim model 58.9 65.8 73.0
    + Genetic adv training 58.8 66.1 73.6
    + w-MHA adv training 60.0 66.9 73.5
    that the adversarial examples from MHA could be
    more effective than unfluent ones from genetic at-
    tack, as assumed in Figure 1.
    To test whether the new models could achieve
    accuracy gains after adversarial training, experi-
    ments are carried out on different sizes of training
    data, which are subsets of SNLI’s training set. The
    number of adversarial examples is fixed to 250
    during experiment. The classification accuracies
    of the new models after the adversarial training by
    different approaches are listed in Table 4. Adver-
    sarial training with w-MHA significantly improves
    the accuracy on all three settings (with p-values
    Model
    Attack succ (%)
    Genetic b-MHA w-MHA
    Victim model 98.7 98.7 99.9
    + Genetic adv training 93.8 99.6 100.0
    + b-MHA adv training 93.0 95.7 99.7
    + w-MHA adv training 92.4 97.5 100.0
    Table 3: Robustness test results on IMDB.
    Model
    Acc (%)
    Train # = 10K 30K 100K
    Victim model 58.9 65.8 73.0
    + Genetic adv training 58.8 66.1 73.6
    + w-MHA adv training 60.0 66.9 73.5
    Table 4: Accuracy results after adversarial training.
    that the adversarial examples from MHA could be
    more effective than unfluent ones from genetic at-
    tack, as assumed in Figure 1.
    To test whether the new models could achieve
    accuracy gains after adversarial training, experi-
    ments are carried out on different sizes of training
    data, which are subsets of SNLI’s training set. The
    number of adversarial examples is fixed to 250
    during experiment. The classification accuracies
    of the new models after the adversarial training by
    different approaches are listed in Table 4. Adver-
    sarial training with w-MHA significantly improves
    the accuracy on all three settings (with p-values
    less than 0.02). w-MHA outperforms the genetic
    adv. training ͨ͠Ϟσϧ΁ͷ adv. attack ͷ੒ޭ཰

    ɾσʔληοτ͸ IMDB

    ɾఏҊख๏Λ࢖͑͹ैདྷͷ adv. attack ΋গ͠๷͛Δ

    ɾͦ΋ͦ΋࿦ͱͯ͠ adv. training ͯ͠΋ޮՌ͸͔ᷮ
    adv. training ͯ͠Ϟσϧͷ൚Խੑೳ͕޲্͢Δ͔

    ɾσʔληοτ͸ SNLI

    ɾैདྷख๏Ͱ͸σʔλྔ͕ଟ͍ͱ͜ΖͷΈޮ͘

    ɾఏҊख๏Ͱ͸σʔλྔ͕গͳ͍ͱ͜ΖͰ΋ޮ͘
    /25
    ݁Ռͷද͸ https://www.aclweb.org/anthology/P19-1559/ ΑΓҾ༻

    View Slide

  16. ۩ମྫ
    (b) SNLI
    curves of the attacks.
    Invok# PPL ↵(%)
    1427.5 421.1 –
    1372.1 385.6 17.9
    748.2 375.3 34.4
    971.9 834.1 –
    681.7 358.8 9.7
    525.0 332.4 13.3
    Case 1
    Premise: three men are sitting on a beach dressed in or-
    ange with refuse carts in front of them.
    Hypothesis: empty trash cans are sitting on a beach.
    Prediction: hContradictioni
    Genetic:
    empties
    trash cans are sitting on a beach.
    Prediction: hEntailmenti
    b
    -MHA:
    the
    trash cans are sitting
    in
    a beach.
    Prediction: hEntailmenti
    w
    -MHA:
    the
    trash cans are sitting on a beach.
    Prediction: hEntailmenti
    Case 2
    Premise: a man is holding a microphone in front of his
    mouth.
    Hypothesis: a male has a device near his mouth.
    Prediction: hEntailmenti
    Genetic: a
    masculine
    has a device near his mouth.
    Prediction: hNeutrali
    b
    -MHA: a man has a device near his
    car
    .
    Prediction: hNeutrali
    w
    -MHA: a man has a device near his
    home
    .
    Prediction: hNeutrali
    Table 2: Adversarial examples generated on SNLI.
    SNLI ͷྫ
    16/25
    ۩ମྫ͸ https://www.aclweb.org/anthology/P19-1559/ ΑΓҾ༻

    View Slide

  17. Robust Neural Machine Translation with Doubly Adversarial Inputs
    (long)

    (sub)word-based, non-targeted, white box attack
    17/25

    View Slide

  18. ·ͱΊ
    18
    Encoder
    ℒ(θ) = ℒclean
    (θmt
    ) + ℒlm
    (θx
    mt
    ) + ℒrobust
    (θmt
    ) + ℒlm
    (θy
    mt
    )
    x = x1
    , …, xi
    , …, xI
    P(y|x; θmt
    ) = ΠJ
    j=1
    P(yj
    |z≤j
    , h; θmt
    )
    y = y1
    , …, yj
    , …, yJ
    e(x) = e(x1
    ), …, e(xi
    ), …, e(xI
    )
    AdvGen: Ұఆׂ߹ͷ୯ޠΛஔ׵

    ɾ ʹج͖ͮޠኮީิݶఆ 

    ɾ຋༁ޡࠩʹج͖ͮஔ׵ޠኮܾఆ
    Q ∙
    Qsrc
    (xi
    , x) = Plm
    (xi
    |x, x>i
    ; θx
    lm
    ) Qtrg
    (zi
    , z) = λPlm
    (zi
    |z, z>j
    ; θy
    lm
    ) + (1 − λ)P(zi
    |z, x′; θmt
    )
    ∙′
    i
    = arg max
    ∙∈∙
    sim (e( ∙ ) − e( ∙i
    ), ∇e(∙i
    )(−log P(y| ∙ ; θmt
    ))) where ∙ = {x, z}
    Decoder
    z = z1
    , …, zj
    , …, zJ
    e(z) = e(z1
    ), …, e(zj
    ), …, e(zJ
    )
    x′
    i
    e(x′
    i
    )
    z′
    j
    e(z′
    j
    )
    ℒclean
    (θmt
    ) =
    1
    |S| ∑
    (x,y)∈S
    − log P(y|x; θmt
    ) ℒrobust
    (θmt
    ) =
    1
    |S| ∑
    (x,y)∈S
    − log P(y|x′, z′; θmt
    )
    /25

    View Slide

  19. ࣮ݧ݁Ռ
    19
    Method Model MT06 MT02 MT03 MT04 MT05 MT08
    Vaswani et al. (2017) Trans.-Base 44.59 44.82 43.68 45.60 44.57 35.07
    Miyato et al. (2017) Trans.-Base 45.11 45.95 44.68 45.99 45.32 35.84
    Sennrich et al. (2016a) Trans.-Base 44.96 46.03 44.81 46.01 45.69 35.32
    Wang et al. (2018) Trans.-Base 45.47 46.31 45.30 46.45 45.62 35.66
    Cheng et al. (2018)
    RNMTlex. 43.57 44.82 42.95 45.05 43.45 34.85
    RNMTfeat. 44.44 46.10 44.07 45.61 44.06 34.94
    Cheng et al. (2018)
    Trans.-Basefeat. 45.37 46.16 44.41 46.32 45.30 35.85
    Trans.-Baselex. 45.78 45.96 45.51 46.49 45.73 36.08
    Sennrich et al. (2016b)* Trans.-Base 46.39 47.31 47.10 47.81 45.69 36.43
    Ours Trans.-Base 46.95 47.06 46.48 47.39 46.58 37.38
    Ours + BackTranslation* Trans.-Base 47.74 48.13 47.83 49.13 49.04 38.61
    Table 2: Comparison with baseline methods trained on different backbone models (second column). * indicate
    the method trained using an extra corpus.
    Method Model MT06 MT02 MT03 MT04 MT05 MT08
    Vaswani et al. (2017) Trans.-Base 44.59 44.82 43.68 45.60 44.57 35.07
    Ours Trans.-Base 46.95 47.06 46.48 47.39 46.58 37.38
    Table 3: Results on NIST Chinese-English translation.
    Method Model BLEU
    Vaswani et al.
    Trans.-Base 27.30
    Trans.-Big 28.40
    Chen et al. RNMT+ 28.49
    Ours
    Trans.-Base 28.34
    Trans.-Big 30.01
    Table 4: Results on WMT’14 English-German transla-
    tion.
    German translation. We compare our approach
    with Transformer for different numbers of hidden
    Miyato et al. (2017) applied perturbations to
    word embeddings using adversarial learning in
    text classification tasks. We apply this method to
    the NMT model.
    Sennrich et al. (2016a) augmented the training
    data with word dropout. We follow their method
    to randomly set source word embeddings to zero
    with the probability of 0.1. This simple technique
    performs reasonably well on the Chinese-English
    translation.
    Wang et al. (2018) introduced a data
    augmentation method for NMT called SwitchOu
    to randomly replace words in both source and
    Method Model MT06 MT02 MT03 MT04 MT05 MT08
    Vaswani et al. (2017) Trans.-Base 44.59 44.82 43.68 45.60 44.57 35.07
    Miyato et al. (2017) Trans.-Base 45.11 45.95 44.68 45.99 45.32 35.84
    Sennrich et al. (2016a) Trans.-Base 44.96 46.03 44.81 46.01 45.69 35.32
    Wang et al. (2018) Trans.-Base 45.47 46.31 45.30 46.45 45.62 35.66
    Cheng et al. (2018)
    RNMTlex. 43.57 44.82 42.95 45.05 43.45 34.85
    RNMTfeat. 44.44 46.10 44.07 45.61 44.06 34.94
    Cheng et al. (2018)
    Trans.-Basefeat. 45.37 46.16 44.41 46.32 45.30 35.85
    Trans.-Baselex. 45.78 45.96 45.51 46.49 45.73 36.08
    Sennrich et al. (2016b)* Trans.-Base 46.39 47.31 47.10 47.81 45.69 36.43
    Ours Trans.-Base 46.95 47.06 46.48 47.39 46.58 37.38
    Ours + BackTranslation* Trans.-Base 47.74 48.13 47.83 49.13 49.04 38.61
    Table 2: Comparison with baseline methods trained on different backbone models (second column). * indicates
    the method trained using an extra corpus.
    Method Model MT06 MT02 MT03 MT04 MT05 MT08
    Vaswani et al. (2017) Trans.-Base 44.59 44.82 43.68 45.60 44.57 35.07
    Ours Trans.-Base 46.95 47.06 46.48 47.39 46.58 37.38
    Table 3: Results on NIST Chinese-English translation.
    Method Model BLEU
    Vaswani et al.
    Trans.-Base 27.30
    Trans.-Big 28.40
    Chen et al. RNMT+ 28.49
    Ours
    Trans.-Base 28.34
    Trans.-Big 30.01
    Table 4: Results on WMT’14 English-German transla-
    tion.
    Miyato et al. (2017) applied perturbations to
    word embeddings using adversarial learning in
    text classification tasks. We apply this method to
    the NMT model.
    Sennrich et al. (2016a) augmented the training
    data with word dropout. We follow their method
    to randomly set source word embeddings to zero
    with the probability of 0.1. This simple technique
    performs reasonably well on the Chinese-English
    English-German ຋༁

    ɾதࠃޠΑΓখ͍͞෯͕ͩޮ͍ͯΔ
    ֤छϕʔεϥΠϯͱൺֱ

    ɾChinese-English ຋༁

    ɾࢦඪ͸ BLEU scores

    ɾҰ൪্͕ vanilla Transformer

    ɾ* ͸ back translation ࢖༻
    /25
    ݁Ռͷද͸ https://www.aclweb.org/anthology/P19-1425/ ΑΓҾ༻

    View Slide

  20. ۩ମྫ
    adv. example Ͱ͸ͳͯ͘ noisy input Ͱͷ݁Ռ

    (ϥϯμϜʹબΜͩ୯ޠΛ embedding ͷҙຯͰ͍ۙ୯ޠʹஔ͖׵͚͑ͨͩ)
    20
    Input & Noisy Input ŸS∞Ü-ƒ$˝å$˝Æ⇢Ù∆⌥('∆)ÑÀ} \s˚⇥
    Reference this expressed the relationship of close friendship and cooperation between
    China and Russia and between our parliaments.
    Vaswani et al. this reflects the close friendship and cooperation between China and Russia
    on Input and between the parliaments of the two countries.
    Vaswani et al. this reflects the close friendship and cooperation between the two countries
    on Noisy Input and the two parliaments.
    Ours this reflects the close relations of friendship and cooperation between China
    on Input and Russia and between their parliaments.
    Ours this embodied the close relations of friendship and cooperation between China
    on Noisy Input and Russia and between their parliaments.
    Table 5: Comparison of translation results of Transformer and our model for an input and its perturbed input.
    Method 0.00 0.05 0.10 0.15 4.4 Results on Noisy Data
    /25
    ۩ମྫ͸ https://www.aclweb.org/anthology/P19-1425/ ΑΓҾ༻

    View Slide

  21. Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training,
    and Model Development for Multi-Hop QA (long)

    Ϟσϧʹґଘ͠ͳ͍σʔληοτಛ༗ͷ adv. attack
    21/25

    View Slide

  22. ·ͱΊ
    22
    nd Mohit Bansal
    hapel Hill
    nsal}@cs.unc.edu
    What was the father of Kasper Schmeichel
    voted to be by the IFFHS in 1992?
    R. Bolesław Kelly MBE (] ; born 18 November 1963)
    is a Danish former professional footballer who played
    as a Defender, and was voted the IFFHS
    World's Best Defender in 1992 and 1993.
    Kasper Peter Schmeichel (] ; born 5 November 1986) is
    a Danish professional footballer who plays as a
    goalkeeper ... . He is the son of former Manchester
    United and Danish international goalkeeper
    Peter Schmeichel.
    Edson Arantes do Nascimento (] ; born 23 October 1940),
    known as Pelé (] ), is a retired Brazilian professional
    footballer who played as a forward. In 1999, he was
    voted World Player of the Century by IFFHS.
    Peter Bolesław Schmeichel MBE (] ; born 18
    November 1963) is a Danish former professional
    footballer who played as a goalkeeper, and was voted
    the IFFHS World's Best Goalkeeper in 1992 and 1993.
    Kasper Hvidt (born 6 February 1976 in Copenhagen)
    is a Danish retired handball goalkeeper, who lastly played
    for KIF Kolding and previous Danish national team. ...
    Hvidt was also voted as Goalkeeper of the Year
    March 20, 2009, second place was Thierry Omeyer ...
    Prediction: World's Best Goalkeeper (correct)
    Question
    Golden Reasoning
    Chain Docs
    Distractor
    Docs
    Adversarial
    Doc
    Prediction under adversary: IFFHS World's Best Defender
    Figure 1: HotpotQA example with a reasoning short-
    cut, and our adversarial document that eliminates this
    shortcut to necessitate multi-hop reasoning.
    HotpotQA σʔλ͸ຊདྷ͸ҎԼͷΑ͏ʹஈ֊Λ౿·͍ͤͯͨ

    Kasper → (son of) → Peter → (voted as) → world’s best GK
    ໰୊จΛϚονͤ͞Δ͚ͩͰ౴͕͑ग़ͤͯ͠·͏ (shortcut)

    αϯϓϦϯάͯ͠ௐ΂ͨΒ൒෼ఔ౓΋ shotcut ΛؚΜͰ͍ͨ
    nd Mohit Bansal
    hapel Hill
    sal}@cs.unc.edu
    What was the father of Kasper Schmeichel
    voted to be by the IFFHS in 1992?
    R. Bolesław Kelly MBE (] ; born 18 November 1963)
    is a Danish former professional footballer who played
    as a Defender, and was voted the IFFHS
    World's Best Defender in 1992 and 1993.
    Kasper Peter Schmeichel (] ; born 5 November 1986) is
    a Danish professional footballer who plays as a
    goalkeeper ... . He is the son of former Manchester
    United and Danish international goalkeeper
    Peter Schmeichel.
    Edson Arantes do Nascimento (] ; born 23 October 1940),
    known as Pelé (] ), is a retired Brazilian professional
    footballer who played as a forward. In 1999, he was
    voted World Player of the Century by IFFHS.
    Peter Bolesław Schmeichel MBE (] ; born 18
    November 1963) is a Danish former professional
    footballer who played as a goalkeeper, and was voted
    the IFFHS World's Best Goalkeeper in 1992 and 1993.
    Kasper Hvidt (born 6 February 1976 in Copenhagen)
    is a Danish retired handball goalkeeper, who lastly played
    for KIF Kolding and previous Danish national team. ...
    Hvidt was also voted as Goalkeeper of the Year
    March 20, 2009, second place was Thierry Omeyer ...
    Prediction: World's Best Goalkeeper (correct)
    Question
    Golden Reasoning
    Chain Docs
    Distractor
    Docs
    Adversarial
    Doc
    Prediction under adversary: IFFHS World's Best Defender
    Figure 1: HotpotQA example with a reasoning short-
    cut, and our adversarial document that eliminates this
    shortcut to necessitate multi-hop reasoning.
    ಉ͡ shortcut ߏ଄Ͱݩͷ౴͑͸ม͑ͳ͍ adv. Doc Λ௥Ճ

    ɾݩͷ౴͑ʹ GloVe ͷҙຯͰ͍ۙ΋ͷΛऔಘͯ͠ஔ׵

    ɾ౴͕͑ໃ६͠ͳ͍Α͏ʹ title Λଞͷσʔλͷ΋ͷʹஔ׵

    ɾ৽ͨʹ࢖༻ͨ͠ title ͷݩͷจষ΋Ҿͬுͬͯ͘Δ

    ɾݩͷจষͰ౴͑ʹӨڹ͠ͳ͍෦෼Λ࡞੒ͨ͠จͱೖସ

    ɹࠨͷྫͰ͸੺࿮ͷจͱ R. Boleslaw Kelly ͷจΛ

    ɹ౴͑ʹӨڹ͠ͳ͍จʢ͜͜ʹ͸ࡌͤͯͳ͍ೖྗจʣͱೖସ
    ͜ͷσʔλΛ dev-set ʹ͢ΔͱϞσϧͷੑೳ͕ஶ͘͠௿Լ

    ʢϞσϧ͕ shortcut Λ࢖ͬͯ౴͍͑ͯͨ͜ͱΛࣔͨ͠ʣ

    ͦΕΛ౿·͑ͯ 2-hop Λ໌ࣔతʹऔΓೖΕͨϞσϧ΋ఏҊ
    /25
    ۩ମྫ͸ https://www.aclweb.org/anthology/P19-1262/ ΑΓҾ༻

    View Slide

  23. ఏҊϞσϧ
    23
    RNN RNN
    question
    bi-attention
    RNN
    RNN
    self-attention
    bi-attention
    Word Emb Char Emb
    context
    Word Emb Char Emb
    Query2Context Attention
    Softmax
    W,b
    Previous
    Control
    W,b W,b
    Control Unit
    Contextualized
    word emb
    question
    vector
    Context2Query and Query2Context
    Attention
    Softmax
    Context2Query
    Attention
    Bridge-entity
    Supervision
    RNN
    Start index
    RNN
    End index
    Figure 3: A 2-hop bi-attention model with a control unit. The Context2Query attention is modeled as in Seo et al.
    (2017). The output distribution cv of the control unit is used to bias the Query2Context attention.
    where W1, W2 and W3 are trainable parameters,
    and is element-wise multiplication. Then the
    query-to-context attention vector is derived as:
    control unit imitates human’s behavior when an-
    swering a question that requires multiple reason-
    ing steps. For the example in Fig. 1, a human
    question
    Word Emb Char Emb
    context
    Word Emb Char Emb
    Contextualized
    word emb
    vector
    Figure 3: A 2-hop bi-attention model with a control unit. The Context2Query attention is modeled as in Seo et al.
    (2017). The output distribution cv of the control unit is used to bias the Query2Context attention.
    where W1, W2 and W3 are trainable parameters,
    and is element-wise multiplication. Then the
    query-to-context attention vector is derived as:
    mj = max1sS Ms,j
    pj =
    exp(mj)
    PJ
    j=1
    exp(mj)
    qc =
    J
    X
    j=1
    pjhj
    (2)
    We then obtain the question-aware context rep-
    resentation and pass it through another layer of
    BiLSTM:
    h0
    j = [hj; cqj
    ; hj cqj
    ; cqj
    qc]
    h1 = BiLSTM(h0)
    (3)
    where ; is concatenation. Self-attention is modeled
    upon h1 as BiAttn(h1, h1) to produce h2. Then,
    we apply linear projection to h2 to get the start in-
    dex logits for span prediction and the end index
    logits is modeled as h3 = BiLSTM(h2) followed
    by linear projection. Furthermore, the model uses
    a 3-way classifier on h3 to predict the answer as
    control unit imitates human’s behavior when an-
    swering a question that requires multiple reason-
    ing steps. For the example in Fig. 1, a human
    reader would first look for the name of “Kasper
    Schmeichel’s father”. Then s/he can locate the
    correct answer by finding what “Peter Schme-
    ichel” (the answer to the first reasoning hop) was
    “voted to be by the IFFHS in 1992”. Recall
    that S, J are the lengths of the question and con-
    text. At each hop i, given the recurrent control
    state ci 1, contextualized question representation
    u, and question’s vector representation q, the con-
    trol unit outputs a distribution cv over all words in
    the question and updates the state ci:
    cqi = Proj[ci 1; q]; cai,s = Proj(cqi us)
    cvis = softmax(cais); ci =
    S
    X
    s=1
    cvi,s · us
    (4)
    where Proj is the linear projection layer. The dis-
    ৄࡉ͸ׂѪ͢Δ͕ɺcontrol unit Ͱ i ൪໨ͷ hop Ͱ context Λߟྀ࣭ͭͭ͠໰ͷͲͷ෦෼ʹ஫໨͢Δ͔Λௐ੔
    Sentence level 

    supporting facts

    prediction
    Text span prediction
    supporting fact Λܨ͙

    entity Λ༧ଌ
    ౴͑Λ༧ଌ
    จষ͕ supporting fact

    ͔൱͔Λ༧ଌ
    /25
    ਤ͸ https://www.aclweb.org/anthology/P19-1262/ ΑΓҾ༻

    View Slide

  24. ࣮ݧ݁Ռ
    24
    Train Reg Reg Adv Adv
    Eval Reg Adv Reg Adv
    1-hop Base 42.32 26.67 41.55 37.65
    1-hop Base + sp 43.12 34.00 45.12 44.65
    2-hop 47.68 34.71 45.71 40.72
    2-hop + sp 46.41 32.30 47.08 46.87
    Table 1: EM scores after training on the regular data or
    on the adversarial training set ADD4DOCS-RAND, and
    evaluation on the regular dev set or the ADD4DOCS-
    RAND adv-dev set. “1-hop Base” and ”2-hop” do not
    have sentence-level supporting-facts supervision.
    containing answer (4 or 8) and mixing strategy
    (randomly insert or prepend). We name these
    4 dev sets “Add4Docs-Rand”, “Add4Docs-Prep”,
    “Add8Docs-Rand”, and “Add8Docs-Prep”. For
    adversarial training, we choose the “Add4Docs-
    Rand” training set since it is shown in Wang and
    Bansal (2018) that training with randomly inserted
    adversaries yields the model that is the most ro-
    bust to the various adversarial evaluation settings.
    In the adversarial training examples, the fake titles
    and answers are sampled from the original training
    set. We randomly select 40% of the adversarial ex-
    A4D-R A4D-P A8D-R A8D-P
    1-hop Base 37.65 37.72 34.14 34.84
    1-hop Base + sp 44.65 44.51 43.42 43.59
    2-hop 40.72 41.03 37.26 37.70
    2-hop + sp 46.87 47.14 44.28 44.44
    Table 2: EM scores on 4 adversarial evaluation set-
    tings after training on ADD4DOCS-RAND. ‘-R’ and
    ‘-P’ represent random insertion and prepending. A4D
    and A8D stands for ADD4DOCS and ADD8DOCS adv-
    dev sets.
    in the first row, the single-hop baseline trained
    on regular data performs poorly on the adversar-
    ial evaluation, suggesting that it is indeed exploit-
    ing the reasoning shortcuts instead of actually per-
    forming the multi-hop reasoning in locating the
    Train Regular Regular Adv Adv
    Eval Regular Adv Regular Adv
    2-hop 47.68 34.71 45.71 40.72
    2-hop - Ctrl 46.12 32.46 45.20 40.32
    2-hop - Bridge 43.31 31.80 41.90 37.37
    1-hop Base 42.32 26.67 41.55 37.65
    Table 3: Ablation for the Control unit and Bridge-entity
    supervision, reported as EM scores after training on
    the regular or adversarial ADD4DOCS-RAND data, and
    evaluation on regular dev set and ADD4DOCS-RAND
    adv-dev set. Note that 1-hop Base is same as 2-hop
    without both control unit and bridge-entity supervision.
    sarial evaluation. After we add the sentence-level
    supporting-fact supervision, the 2-hop model (row
    D-P
    84
    59
    70
    44
    set-
    and
    4D
    Train Regular Regular Adv Adv
    Eval Regular Adv Regular Adv
    2-hop 47.68 34.71 45.71 40.72
    2-hop - Ctrl 46.12 32.46 45.20 40.32
    2-hop - Bridge 43.31 31.80 41.90 37.37
    1-hop Base 42.32 26.67 41.55 37.65
    Table 3: Ablation for the Control unit and Bridge-entity
    supervision, reported as EM scores after training on
    train, dev ͷͦΕͧΕͰ adv. example Λ࢖ͬͨ݁Ռ 

    ɾࢦඪ͸ Exact Match (EM)

    ɾsp ͸ sentence level prediction

    ɾී௨ͷσʔλͰֶशͨ͠΋ͷΛ adv. dev ͰѱԽ

    ɹʢshortcut ͕࢖͑ͳ͘ͳͬͯഁ୼ʣ

    ɾఏҊͨ͠ 2-hop Ϟσϧ͸ߴੑೳ
    adv. training ༷ͯ͠ʑͳ adv. dev Ͱݕূ

    ɾݩσʔλ͸ 10 ݸͷύϥάϥϑ͔Β੒Δ

    ɾͦͷ͏ͪͷԿݸΛ adv. example ʹ͢Δ͔ {4 or 8}

    ɹ2/10 ݸ͸౴͑Λಋ͘ͷʹඞཁͰ͜Ε͸ඞͣ࢒͢

    ɾadv. example ΛϥϯμϜʹૠೖ͢Δ͔෇͚଍͔͢ {R or P}
    ablation study Ͱ෇͚Ճ͑ͨػೳΛݕূ

    ɾcontrol unit ΋ bridge entity ֶश΋༗ޮ

    /25
    ݁Ռͷද͸ https://www.aclweb.org/anthology/P19-1262/ ΑΓҾ༻

    View Slide

  25. ࿦จϝϞʢ࠶ܝʣ
    • Generating Natural Language …:

    https://github.com/yoheikikuta/paper-reading/issues/41

    • Generating Fluent Adversarial …:

    https://github.com/yoheikikuta/paper-reading/issues/42

    • Robust Neural Machine …:

    https://github.com/yoheikikuta/paper-reading/issues/43

    • Avoiding Reasoning Shortcuts …:

    https://github.com/yoheikikuta/paper-reading/issues/44
    25/25

    View Slide