Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[SNLP2019] Generalized Data Augmentation for Low-Resource Translation

Shun Kiyono
September 20, 2019

[SNLP2019] Generalized Data Augmentation for Low-Resource Translation

Shun Kiyono

September 20, 2019
Tweet

More Decks by Shun Kiyono

Other Decks in Research

Transcript

  1. Generalized Data
    Augmentation for
    Low-Resource Translation
    ཧݚAIP / ౦๺େֶ סɾླ໦ݚڀࣨ
    ਗ਼໺ॢ
    Generalized Data Augmentation for Low-Resource Translation
    Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig
    Language Technologies Institute, Carnegie Mellon University
    {mengzhox, xiangk, aanastas, gneubig}@andrew.cmu.edu
    Abstract
    Translation to or from low-resource languages
    (LRLs) poses challenges for machine transla-
    tion in terms of both adequacy and fluency.
    : Available Resource
    : Generated Resource
    LRL ENG
    [1] ENG!LRL
    ಡΉਓ
    ※஫ऍͷͳ͍ਤද͸࿦จ͔ΒҾ༻͞Εͨ΋ͷͰ͢

    View full-size slide

  2. ͲΜͳ࿦จ͔ʁ
    • എܠɾ໰୊
    • Low Resourceݴޠରͷ৔߹ٯ຋༁Ͱͷੑೳ޲্͕ࠔ೉
    • ΞΠσΞ
    • ݴޠతʹ͍ۙHigh ResourceݴޠରͷσʔλΛ
    ͏·͘׆༻͢Δʢྫ: ΞθϧόΠδϟϯޠͱτϧίޠʣ
    • ߩݙ
    • High Resourceݴޠͷ࢖͍ํ͸ඇࣗ໌ͳͷͰɼ৭Μͳ
    ख๏ͷ૊Έ߹ΘͤΛ໢ཏతʹ࣮ݧ
    • High ResourceݴޠΛ୯ޠ୯ҐͰLow Resourceݴޠʹ
    ஔ׵͢Δͷ͕ྑ͍
    • High ResourceݴޠΛܦ༝͢Δ͜ͱͰɼ୯Ұݴޠσʔλ
    Λ׆༻Մೳͱࣔͨ͠
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 2
    τϧίͱΞθϧόΠδϟϯ͸͍ۙ

    View full-size slide

  3. എܠ: Backtranslation͕͍͢͝ʂ
    • Backtranslation (ҎԼɼٯ຋༁)
    • ٯ຋༁Ϟσϧͷग़ྗͨ͠຋༁จΛ৽͍ٙ͠ࣅର༁
    σʔλͱͯ͠༻͍Δख๏
    • େྔɾߴ඼࣭ͳ୯Ұݴޠίʔύε͕࢖͑Δʂ
    • ػց຋༁ͷData Augmentationख๏ͱͯ͠Ұൠత
    • ٙࣅର༁σʔλͷྔʹରͯ͠ੑೳ͕εέʔϧ͢Δ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 3
    sed on the Big Transformer architecture with
    ks in the encoder and decoder. We use the
    hyper-parameters for all experiments, i.e.,
    representations of size 1024, feed-forward
    with inner dimension 4096. Dropout is set
    for En-De and 0.1 for En-Fr, we use 16 at-
    n heads, and we average the checkpoints of
    st ten epochs. Models are optimized with
    (Kingma and Ba, 2015) using 1 = 0.9,
    0.98, and ✏ = 1e 8 and we use the same
    ng rate schedule as Vaswani et al. (2017). All
    s use label smoothing with a uniform prior
    ution over the vocabulary ✏ = 0.1 (Szegedy
    2015; Pereyra et al., 2017). We run exper-
    s on DGX-1 machines with 8 Nvidia V100
    5M 8M 11M 17M 29M
    23.5
    24
    24.5
    25
    25.5
    Total training data
    BLEU (newstest2012)
    greedy beam
    top10 sampling
    beam+noise
    Figure 1: Accuracy of models trained on dif-
    Diagram from https://arxiv.org/abs/1808.09381

    View full-size slide

  4. എܠ: Low Resourceͷ৔߹ɼ
    ٯ຋༁͸͘͢͝ͳ͍…
    • Low Resource (LRL)ͷ৔߹
    • ͜͜Ͱ͸਺ઍ~਺ສจରΛLow Resourceͱ͢Δ
    • ٯ຋༁ʹΑΔੑೳ޲্͸ݶఆత
    • Ή͠Ζੑೳ͕ѱԽ͢Δ৔߹΋…
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 4
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw ˆ
    Sm , THE
    THE
    } 15.24 24.25 32.30 30.00
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69

    View full-size slide

  5. ΞΠσΞ: High Resourceͳ
    ݴޠରΛ׆༻͢Δ
    • High Resource Language (HRL) Λ͏·͘׆༻͍ͨ͠
    • ͨͩ͠ɼͲ͏΍ͬͯHRLΛ׆༻͢Ε͹͍͍͔͸ඇࣗ໌
    • ৭Μͳख๏ͷ૊Έ߹ΘͤΛ໢ཏతʹࢼͯ͠ɼ
    ݁ՌΛใࠂ
    • ʢ৽͍͠ํ๏࿦ͷఏҊͰ͸ͳ͍ʣ
    • ʢ஌ݟͷڞ༗͕ओͳߩݙʣ
    • ʢGeneralized Data Augmentationͱ͸ʁ)
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 5
    LRL ENG
    HRL ENG
    ౷ޠߏ଄΍ޠኮ͕
    Highly-relatedͰ͋Δͱ
    Ծఆ͢Δ

    View full-size slide

  6. Generalized Data Augmentation
    ͷશମ૾
    uages
    ransla-
    uency.
    nts of
    ective
    his pa-
    r data
    ransla-
    ingual
    high-
    we ex-
    hod to
    mak-
    : Available Resource
    : Generated Resource
    LRL
    ENG
    [c]
    HRL
    ENG
    [b]
    ENG
    [a]
    HRL LRL ENG
    LRL ENG
    LRL ENG
    [1] ENG!LRL
    [2]
    ENG!HRL
    [4]
    HRL!LRL
    [3] HRL!LRL
    Figure 1: With a low-resource language (LRL) and a
    related high-resource language (HRL), typical data aug-
    mentation scenarios use any available parallel data [b]
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 6
    ٯ຋༁Ͱ͸ੑೳ޲্͕ࠔ೉ʜ
    ͜ͷͭͷ૊Έ߹ΘͤΛࢼ͢

    View full-size slide

  7. ׆༻๏1: HRLàLRL
    • HRLͱENGͷର༁ίʔύε͸͋Δఔ౓ଘࡏ
    • HRLΛLRLʹ຋༁͢Ε͹ɼLRLͱENGͷٙࣅର༁
    ίʔύε͕࡞ΕΔ
    • ਅͷର༁ίʔύεͱֶࠞͥͯश
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 7
    a Augmentation for Low-Resource Translation
    iang Kong, Antonios Anastasopoulos, Graham Neubig
    echnologies Institute, Carnegie Mellon University
    angk, aanastas, gneubig}@andrew.cmu.edu
    ource languages
    machine transla-
    acy and fluency.
    arge amounts of
    as an effective
    ms. In this pa-
    mework for data
    machine transla-
    ide monolingual
    a related high-
    ecifically, we ex-
    : Available Resource
    : Generated Resource
    LRL
    ENG
    [c]
    HRL
    ENG
    [b]
    ENG
    [a]
    HRL LRL ENG
    LRL ENG
    LRL ENG
    [1] ENG!LRL
    [2]
    ENG!HRL
    [4]
    HRL!LRL
    [3] HRL!LRL
    Figure 1: With a low-resource language (LRL) and a

    View full-size slide

  8. ׆༻๏2: ENGàHRLàLRL
    • ENG͔ΒHRLΛܦ༝ͯ͠LRLʹ຋༁
    • ENGàHRLͷٯ຋༁ϞσϧΛ׆༻
    • ENGͷ୯Ұݴޠίʔύε͔ΒɼLRLàENGͳ
    ٙࣅର༁σʔλΛ֫ಘՄೳ
    • خ͠͞: ENGͷ୯Ұݴޠίʔύε͸΄΅ແݶʹଘࡏ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 8
    ta Augmentation for Low-Resource Translation
    Xiang Kong, Antonios Anastasopoulos, Graham Neubig
    Technologies Institute, Carnegie Mellon University
    iangk, aanastas, gneubig}@andrew.cmu.edu
    t
    esource languages
    r machine transla-
    uacy and fluency.
    large amounts of
    ed as an effective
    lems. In this pa-
    amework for data
    e machine transla-
    -side monolingual
    gh a related high-
    : Available Resource
    : Generated Resource
    LRL
    ENG
    [c]
    HRL
    ENG
    [b]
    ENG
    [a]
    HRL LRL ENG
    LRL ENG
    LRL ENG
    [1] ENG!LRL
    [2]
    ENG!HRL
    [4]
    HRL!LRL
    [3] HRL!LRL

    View full-size slide

  9. Ͳ͏΍ͬͯHRL à LRL͢Δ͔ʁ
    • ͦ΋ͦ΋HRL͔ΒLRL΁ͷ຋༁͕Low Resource
    • Ծఆ: HRLͱLRL͸ݴޠతʹࣅ͍ͯΔ
    • ڭࢣͳ͠ͷख๏Ͱ΋ͦΕͳΓͷ຋༁ਫ਼౓͕ݟࠐΊΔ

    ୯ޠ୯Ґͷஔ͖׵͑ &
    ڭࢣͳ͠.5Λར༻
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 9
    Data Example Sentence Pivot BLEU
    SLE
    (GLG) Pero con todo, veste obrigado a agardar nas mans dunha serie de estraños moi profesionais.
    SHE
    (POR) Em vez disso, somos obrigados a esperar nas mãos de uma série de estranhos muito profissionais. 0.09
    ˆ
    Sw
    H )L
    En vez disso, somos obrigados a esperar nas mans de unha serie de estraños moito profesionais. 0.18
    ˆ
    Sm
    H )L
    En vez diso, somos obrigados a esperar nas mans dunha serie de estraños moi profesionais. 0.54
    TLE
    But instead, you are forced there to wait in the hands of a series of very professional strangers.
    Table 3: A POR-GLG pivoting example with corresponding pivot BLEU scores. Edits by word substitution or
    M-UMT are highlighted.
    UMT’s scores are 2 to 10 BLEU points worse than

    ୯ޠ୯Ґ
    ͷஔ͖׵͑

    ڭࢣͳ͠
    .5

    View full-size slide

  10. (1) ୯ޠ୯Ґͷஔ͖׵͑
    1. ݸʑͷݴޠͰ୯ޠϕΫτϧΛֶश͓͖ͯ͠ɼ
    ࣸ૾WΛֶश [Xing+2015]
    2. ୯ޠϕΫτϧۭؒͰۙ๣ͷ୯ޠϖΞΛ
    ࣙॻʹ௥Ճ
    3. HRLதͷ֤୯ޠΛରԠ͢ΔLRLͷ୯ޠͰஔ׵
    • ରԠ͢Δ୯ޠ͕ແ͚Ε͹ແࢹ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 10
    n HRL and LRL
    slating from En-
    be achieved by
    on via Pivoting"
    et by translating
    a, into the LRL.
    we can construct
    to an HRL-ENG
    LRL system and
    o LRL, creating
    fore, except that
    result of back-
    at we have first
    ual data ME
    to
    hose to the LRL,
    ME
    }.
    method to obtain a bilingual dictionary between
    the two highly-related languages. Following Xing
    et al. (2015), we formulate the task of finding the
    optimal mapping between the source and target
    word embedding spaces as the Procrustes problem
    (Schönemann, 1966), which can be solved by sin-
    gular value decomposition (SVD):
    min
    W
    kWX Y k2
    F
    s.t. WT W = I,
    where X and Y are the source and target word
    embedding spaces respectively.
    As a seed dictionary to provide supervision, we
    simply exploit identical words from the two lan-
    guages. With the learned mapping W, we compute
    the distance between mapped source and target
    words with the CSLS similarity measure (Lample
    et al., 2018b). Moreover, to ensure the quality of
    the dictionary, a word pair is only added to the
    dictionary if both words are each other’s closest
    neighbors. Adding an LRL word to the dictionary
    for every HRL word results in relatively poor per-
    Published as a conference paper at ICLR 2018
    Figure 1: Toy illustration of the method. (A) There are two distributions of word embeddings, English words
    in red denoted by X and Italian words in blue denoted by Y , which we want to align/translate. Each dot
    represents a word in that space. The size of the dot is proportional to the frequency of the words in the training
    corpus of that language. (B) Using adversarial learning, we learn a rotation matrix W which roughly aligns the
    two distributions. The green stars are randomly selected words that are fed to the discriminator to determine
    whether the two word embeddings come from the same distribution. (C) The mapping W is further refined via
    Procrustes. This method uses frequent words aligned by the previous step as anchor points, and minimizes an
    energy function that corresponds to a spring system between anchor points. The refined mapping is then used
    to map all words in the dictionary. (D) Finally, we translate by using the mapping W and a distance metric,
    Diagram from https://arxiv.org/abs/1710.04087

    View full-size slide

  11. (2) ڭࢣͳ͠MT
    • طଘͷڭࢣͳ͠MTͷख๏ͱ΄ͱΜͲಉ͡
    • ʢ࿦จதͰ͸“Modified UMT”ͱදه͞Ε͍ͯΔ͕ɼҧ͍
    ͕෼͔Βͳ͔ͬͨ…)
    • ʢڪΒ͘ɼ༧ΊHRLàLRLʹ୯ޠஔ׵͍ͯ͠Δͷ͕ࠩ෼ʣ
    • Denoising Auto-encoderͱIterative Back-translation
    ͷ2͔ͭΒlossΛܭࢉɾॏΈ෇͖࿨Λ
    ໨తؔ਺ͱֶͯ͠श
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 11
    to be similar
    d word order,
    ation process
    o completely
    next step is to
    into a version
    achieve this
    ropose to use
    MT).
    ne Translation
    018a,c) makes
    uages without
    pling denois-
    nslation, and
    oders and de-
    odel to extend
    ng into learn-
    d on data-rich,
    e English and
    work to low-
    stitution strategy (§3.1). Our initialization is com-
    prised of a sequence of three steps:
    1. First, we use an induced dictionary to substi-
    tute HRL words in MH
    to LRL ones, producing
    a pseudo-LRL monolingual dataset ˆ
    M
    L
    .
    2. Second, we learn a joint word segmentation
    model on both ML
    and ˆ
    M
    L
    and apply it to
    both datasets.
    3. Third, we train a NMT model in an unsu-
    pervised fashion between ML
    and ˆ
    M
    L
    . The
    training objective L is a weighted sum of two
    loss terms for denoising auto-encoding and
    iterative back-translation:
    L = 1
    Ex⇠ML
    log Ps)s(x|C(x))
    + E
    y⇠ ˆ
    ML
    log Pt)t(y|C(y))
    + 2
    Ex⇠ML
    log Pt)s(x|u⇤(y|x))
    + E
    y⇠ ˆ
    ML
    log Ps)t(y|u⇤(x|y))
    where u⇤ denotes translations obtained with

    View full-size slide

  12. ࣮ݧઃఆ
    • σʔλ: Multilingual TED corpus [Qi+2018]
    • ݴޠର:
    • ୯Ұݴޠίʔύεʹ͸WikipediaΛར༻
    • Ϟσϧ: Transformer (4 layer)
    • ϕʔεϥΠϯ: HRLͱLRLͷଟݴޠNMT
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 12
    Datasets
    LRL (HRL)
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    SLE
    , TLE
    5.9K 4.5K 10K 61K
    SHE
    , THE
    182K 208K 185K 103K
    SLH
    , TLH
    5.7K 4.2K 3.8K 44K
    ML
    2.02M 1.95M 1.98M 2M
    MH
    2M 2M 2M 2M
    ME
    2M/ 200K
    Table 1: Statistics (number of sentences) of all datasets.
    directly translating ENG to LRL under the follow-
    ing three conditions: 1) HRL and LRL are related
    enough to allow for the induction of a high-quality
    bilingual dictionary; 2) There exists a relatively
    The statistics of the
    in Table 1. For AZE, B
    able Wikipedia data,
    guages we sample a si
    ple 2M/200K English
    data, which are used fo
    augmentation from En
    4.2 Pre-processing
    We train a joint sen
    each LRL-HRL pair b
    lingual corpora of th
    mentation model for E
    monolingual data only
    for each model to 20K
    by their respective seg
    We use FastText
    Low Resource:
    େମ਺ઍ~਺ສͷن໛

    View full-size slide

  13. ࣮ݧ݁Ռɿؤுͬͨ
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    + { ˆ
    Sw
    E )H )L
    ˆ
    Sm
    E )H )L
    , ME
    ME
    }
    Table 2: Evaluation of translation performance over four language pairs. Rows 1 and 2 show pre-training BLEU
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 13
    طଘݚڀͷ࿦จ஋
    ϕʔεϥΠϯ
    Data Augmentation
    ͷ࠷ߴ஋
    (͜ͷล·ͰΠέϧ)

    View full-size slide

  14. ෼ੳ1: ௨ৗͷٯ຋༁Λ࢖ͬͨ৔߹
    • ENGàLRLͰٯ຋༁Λͯ͠΋ੑೳ͸্͕Βͳ͍
    • Ή͠ΖԼ͕Δ
    • ENGàHRLΛ௥Ճ͢Δͱੑೳඍ૿
    • Ұ෦ͷίʔύε(BEL)Ͱ͸ޮՌ͸ݶఆత
    • ʢHRLͱLRLͷྨࣅ౓͕Өڹ͍ͯͦ͠͏ʣ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 14
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    ٯ຋༁͸΍ͬͺΓΠέͯͳ͍

    View full-size slide

  15. ෼ੳ2: HRLàLRLͨ͠৔߹ͷ݁Ռ
    • HRLଆΛ୯ޠ୯ҐͰஔ׵͢Δ͚ͩͰܶతʹੑೳ޲্
    • ڭࢣͳ͠MTͰ΋ੑೳ޲্͢Δ͕ɼ୯ޠஔ׵ͱಉ౳͔
    ͦΕҎԼ
    • ʢڭࢣͳ͠MTͷํֶ͕श͕େมͳͷͰɼ͜ͷ݁Ռ͸ऐ͍͠ʣ
    • ୯ޠஔ׵ & ڭࢣͳ͠MTͷ૊Έ߹ΘͤͰߋʹੑೳ޲্
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 15
    ୯ޠ୯Ґͷஔ͖׵͑͸͍͢͝
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    + { ˆ
    Sw
    E )H )L
    ˆ
    Sm
    E )H )L
    , ME
    ME
    }
    Table 2: Evaluation of translation performance over four language pairs. Rows 1 and 2 show pre-training BLEU

    View full-size slide

  16. ෼ੳ3: ENGàHRLàLRLͷޮՌ
    • HRLΛܦ༝ͯ͠ٯ຋༁͢Δ͜ͱͰɼ୯Ұݴޠίʔ
    ύεΛ༗ޮ׆༻͢Δ͜ͱ͕Մೳ
    • HRLàLRLͷ৔߹ͱ܏޲͸ࣅ͍ͯΔ
    • ୯ޠ୯Ґͷஔ׵ > ڭࢣͳ͠MT
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 16
    ୯Ұݴޠίʔύε΋ੑೳ޲্ʹد༩͢Δ
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    + { ˆ
    Sw
    E )H )L
    ˆ
    Sm
    E )H )L
    , ME
    ME
    }
    able 2: Evaluation of translation performance over four language pairs. Rows 1 and 2 show pre-training BLEU
    cores. Rows 3–13 show scores after fine tuning. Statistically significantly best scores are highlighted (p < 0.05)
    mixed fine-tuning strategy of Chu et al. (2017),
    ne-tuning the base model on the concatenation of
    he base and augmented datasets. For each setting,
    we perform a sufficient number of updates to reach
    (2019), indicating the difficulties of directly trans
    lating between LRLand ENG in an unsupervised
    fashion. Rows 3 and 4 show that standard super
    vised back-translation from English at best yield

    View full-size slide

  17. ෼ੳ4: ݁ہɼڭࢣͳ͠MT͸ͩΊ
    • ୯ޠ୯Ґͷஔ׵Λͨ͠HRLàLRLͳσʔλͱɼ
    ENGàHRLàLRLͳσʔλΛ૊Έ߹Θͤͨ৔߹͕
    ࠷΋ੑೳ͕ྑ͍
    • ↑ʹ௥Ճͯ͠ڭࢣͳ͠.5Λ࢖͏ͱੑೳѱԽʜ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 17
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    Training Data
    BLEU for X)ENG
    AZE BEL GLG SLK
    (TUR) (RUS) (POR) (CES)
    Results from Literature
    SDE (Wang et al., 2019) 12.89 18.71 31.16 29.16
    many-to-many (Aharoni et al., 2019) 12.78 21.73 30.65 29.54
    Standard NMT
    1 {SLE
    SHE
    , TLE
    THE
    } (supervised MT) 11.83 16.34 29.51 28.12
    2 {ML
    , ME
    } (unsupervised MT) 0.47 0.18 1.15 0.75
    Standard Supervised Back-translation
    3 + { ˆ
    Ss
    E )L
    , ME
    } 11.84 15.72 29.19 29.79
    4 + { ˆ
    Ss
    E )H
    , ME
    } 12.46 16.40 30.07 30.60
    Augmentation from HRL-ENG
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    5 + { ˆ
    Ss
    H )L
    , THE
    } (supervised MT) 11.92 15.79 29.91 28.52
    6 + { ˆ
    Su
    H )L
    , THE
    } (unsupervised MT) 11.86 13.83 29.80 28.69
    7 + { ˆ
    Sw
    H )L
    , THE
    } (word subst.) 14.87 23.56 32.02 29.60
    8 + { ˆ
    Sm
    H )L
    , THE
    } (modified UMT) 14.72 23.31 32.27 29.55
    9 + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    } 15.24 24.25 32.30 30.00
    Augmention from ENG by pivoting
    10 + { ˆ
    Sw
    E )H )L
    , ME
    } (word subst.) 14.18 21.74 31.72 30.90
    11 + { ˆ
    Sm
    E )H )L
    , ME
    } (modified UMT) 13.71 19.94 31.39 30.22
    Combinations
    12 + { ˆ
    Sw
    H )L
    ˆ
    Sw
    E )H )L
    , THE
    ME
    } (word subst.) 15.74 24.51 33.16 32.07
    13
    + { ˆ
    Sw
    H )L
    ˆ
    Sm
    H )L
    , THE
    THE
    }
    15.91 23.69 32.55 31.58
    + { ˆ
    Sw
    E )H )L
    ˆ
    Sm
    E )H )L
    , ME
    ME
    }
    Table 2: Evaluation of translation performance over four language pairs. Rows 1 and 2 show pre-training BLEU
    cores. Rows 3–13 show scores after fine tuning. Statistically significantly best scores are highlighted (p < 0.05).
    mixed fine-tuning strategy of Chu et al. (2017),
    fine-tuning the base model on the concatenation of
    he base and augmented datasets. For each setting,
    we perform a sufficient number of updates to reach
    convergence in terms of development perplexity.
    We use the performance on the development sets
    as provided by the TED corpus) as our criterion
    or selecting the best model, both for augmentation
    (2019), indicating the difficulties of directly trans-
    lating between LRLand ENG in an unsupervised
    fashion. Rows 3 and 4 show that standard super-
    vised back-translation from English at best yields
    very modest improvements. Notable is the excep-
    tion of SLK-ENG, which has more parallel data for
    training than other settings. In the case of BEL and
    GLG, it even leads to worse performance. Across

    View full-size slide

  18. ڭࢣͳ͠MT্͕ख͍͔͘ͳ͍ʁ
    • ࣮ݧͰ͸Ұ؏ͯ͠ʮ୯ޠ୯Ґͷஔ׵ > ڭࢣͳ͠MTʯ
    • ڭࢣͳ͠MT͸ͪΌΜͱ຋༁Ͱ͖ͯΔͷ͔ʁ
    • →຋༁͸ग़དྷ͍ͯΔ(pivot BLEU͸্ঢ)͕ɼੑೳʹߩ
    ݙ͠ͳ͍(translation BLEU͸Լ߱)
    • ஶऀ͍Θ͘ɼ୯ޠ୯Ґͷஔ׵ޙʹڭࢣͳ͠MT
    ͍ͯ͠Δͷ͕ݪҼͱͷ͜ͱ(ᡰʹམͪͳ͍…)
    September 28, 2019 Inui-Suzuki Laboratory 18
    SHL
    ˆ
    Sw
    HL
    -3-à )3-ͷੑೳ
    ࠷ऴతͳ
    #-&6είΞ

    View full-size slide

  19. ʢ࠶ܝʣͲΜͳ࿦จ͔ʁ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 19
    • എܠɾ໰୊
    • Low Resourceݴޠରͷ৔߹ٯ຋༁Ͱͷੑೳ޲্͕ࠔ೉
    • ΞΠσΞ
    • ݴޠతʹ͍ۙHigh ResourceݴޠରͷσʔλΛ
    ͏·͘׆༻͢Δʢྫ: ΞθϧόΠδϟϯޠͱτϧίޠʣ
    • ߩݙ
    • High Resourceݴޠͷ࢖͍ํ͸ඇࣗ໌ͳͷͰɼ৭Μͳ
    ख๏ͷ૊Έ߹ΘͤΛ໢ཏతʹ࣮ݧ
    • High ResourceݴޠΛ୯ޠ୯ҐͰLow Resourceݴޠʹ
    ஔ׵͢Δͷ͕ྑ͍
    • High ResourceݴޠΛܦ༝͢Δ͜ͱͰɼ୯Ұݴޠσʔλ
    Λ׆༻Մೳͱࣔͨ͠

    View full-size slide

  20. ײ૝
    • Generalized Data Augmentationײʹ͚ܽΔؾ͕͢Δ
    • Կ͕ ”generalized”ʁ
    • ΋ͬͱΨΠυϥΠϯతͳ৘ใɾ࣮ݧ͕ཉ͍͠
    • ڭࢣͳ͠MT͕ޮՌബͳͷ͸ͳ͔ͥʁ
    • HRLàLRLͷ຋༁ੑೳ͸ߴ͍ͷ͚ͩͲ…
    • ௚ײతʹ͸ɼٙࣅσʔλͷ࣭͕ߴ͍΄Ͳੑೳ޲্ʹ
    د༩͢Δ͸ͣ
    • ࣭͕த్൒୺ʹߴ͍͜ͱ͕ݪҼͩΖ͏͔ʁ
    • ٙࣅσʔλ͸ٙࣅσʔλͱͯ۠͠ผग़དྷͨ΄͏͕Α͍
    ͱ͍͏ใࠂ΋͋Δ
    • [Edunov+2018] Understanding Back-Translation at Scale
    • [Caswell+2019] Tagged Back-Translation
    • Figure1͕ૉ੖Β͍͠ʢٙࣅσʔλͷ࡞Γํͷେ࿮͕
    ֓؍Ͱ͖Δʣ
    September 28, 2019 RIKEN AIP / Inui-Suzuki Laboratory 20

    View full-size slide