Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械翻訳コンペティション参加報告

Shun Kiyono
February 26, 2021

 機械翻訳コンペティション参加報告

第6回特許情報シンポジウムでの講演資料です

Shun Kiyono

February 26, 2021
Tweet

More Decks by Shun Kiyono

Other Decks in Research

Transcript

  1. ػց຋༁ίϯϖςΟγϣϯࢀՃใࠂ
    ਗ਼໺ ॢ
    ࠃཱݚڀ։ൃ๏ਓཧԽֶݚڀॴ ֵ৽஌ೳ౷߹ݚڀηϯλʔ
    ໨తࢦ޲ج൫ٕज़ݚڀάϧʔϓ ࣗવݴޠཧղνʔϜ

    View full-size slide

  2. ँࣙ
    • ߨԋͷΞΠσΞग़͠ɾεϥΠυͷਪᏏʹڠྗͯ͠
    ͩͬͨ͘͞౦๺େֶͷླ໦५ઌੜɺখྛᰜհ͞Μɺ
    ҏ౻୓ւ͞Μʹײँ͠·͢
    2

    View full-size slide

  3. ࣗݾ঺հ
    • ܦྺ
    • 2013 – 2019౦๺େֶʢֶ࢜ म࢜ʣ
    • 2019 – ཧԽֶݚڀॴ ֵ৽஌ೳ౷߹ݚڀηϯλʔ
    • 2020 – : ౦๺େֶʢത࢜ޙظ՝ఔ ʣ
    • ͜Ε·Ͱͷݚڀ
    • ੜ੒ܕཁ໿ [BlackboxNLP 2018], [PACLIC 2018]
    • ߴ଎ & େن໛ͳ൒ڭࢣ͋Γֶश [AAAI 2019]
    • จ๏ޡΓగਖ਼ [EMNLP 2019], [TASLP 2020]
    • ػց຋༁ʁ
    3
    ػց຋༁ͷج൫ٕज़͸
    λεΫΛ·͍ͨͰ࢖ΘΕ͍ͯΔ
    ⇛ଞλεΫͷܦݧ͕࢖͑Δ
    ͭ·Γɺ৘೤͑͋͞Ε͹େৎ෉ʂ

    View full-size slide

  4. ࠓ೔࿩͢͜ͱ
    ᶃ ػց຋༁ίϯϖςΟγϣϯ(WMT)ͷ঺հ
    ᶄ ͰࢀՃͯ͠Έͯ෼͔ͬͨ͜ͱͭ
    - -
    ౦๺େ ཧݚAIP NTT CSݚ
    (10%)
    (90%)

    View full-size slide

  5. ࠓճࢀՃͨ͠ίϯϖςΟγϣϯɿ8.5
    • WMT: ΋ͱ΋ͱ͸ػց຋༁ͷϫʔΫγϣοϓ
    • ਺೥લʹࠃࡍձٞʹͳΓ·ͨ͠
    • ͞·͟·ͳίϯϖςΟγϣϯ͕ซઃ
    • ৽ฉهࣄ຋༁
    • ະ஌ͷυϝΠϯͷจʹର͢Δ຋༁
    • ڭࢣͳ͠ػց຋༁
    • ੜ෺ֶɾҩྍܥจॻͷػց຋༁
    • νϟοτจͷػց຋༁
    6
    զʑ͕ࢀՃͨ͠λεΫ
    ࠷΋ྺ࢙͕ݹ͍
    ͔ͭ
    ڝ૪ͷܹ͍͠λεΫ

    View full-size slide

  6. ίϯϖςΟγϣϯͷྲྀΕʢ೔ӳͷ৔߹ʣ
    7
    ᶃ γεςϜߏங
    ᶄ γεςϜධՁ
    ೔ӳͷର༁ίʔύε
    ୯ݴޠίʔύεʢ೔ʣ
    ୯ݴޠίʔύεʢӳʣ
    લ೥౓·Ͱͷςετσʔλ
    ௒େྔ(100ຕ~)ͷGPUͰ
    ࢼߦࡨޡ γεςϜ׬੒
    σʔληοτͷ४උɾલॲཧ
    ೔ຊޠจ ຋༁จ
    ςετσʔλͷ຋༁ ࣗಈධՁ (BLEU)ɾਓखධՁ
    νʔϜϝϯόʔ

    View full-size slide

  7. • զʑɿ౦๺େ-ཧݚAIP-NTTνʔϜ
    • ͦͷଞʹ ژେɾNICTɾDeepMindɾFacebookɾ
    ΤδϯόϥେɾNAVERɾOPPOɾTencentɾ
    WeChatɾDiDi ͳͲ͕ࢀՃ
    ࢀՃνʔϜͷ঺հ
    8
    ࠓ໺ ᰜਓ
    ਗ਼໺ ॢ ҏ౻ ୓ւ ৿Լ ກ ླ໦ ५
    ౦๺େֶ
    ཧݚ"*1 /55ίϛϡχέʔγϣϯՊֶجૅݚڀॴ
    ίϯϖςΟγϣϯ্Ґͷৗ࿈
    8"5ɾ8.5ɾ
    8"5ͰҐ

    View full-size slide

  8. ݁ՌɿࣗಈධՁई౓ʢ#-&6ʣͰ্Ґ
    ಠˠӳҐ
    Team BLEU
    Tohoku-AIP-NTT 43.8
    Huoshan_Translate 43.5
    OPPO 43.2
    UEDIN 42.3
    Online-B 41.9
    ӳˠಠҐ
    Team BLEU
    Tohoku-AIP-NTT 38.8
    Tencent_Translation 38.6
    OPPO 38.6
    Huoshan_Translate 38.2
    eTranslation 37.9
    ೔ˠӳҐ
    Team BLEU
    NiuTrans 26.7
    Tohoku-AIP-NTT 25.5
    OPPO 24.8
    NICT_Kyoto 22.8
    eTranslation 22.2
    ӳˠ೔Ґ
    Team BLEU
    NiuTrans 28.4
    OPPO 27.3
    ENMT 25.9
    Tohoku-AIP-NTT 25.8
    NICT_Kyoto 23.9
    9

    View full-size slide

  9. ਓखධՁɿશݴޠରͰҐΛୡ੒
    tion
    aseline
    sformer
    mer
    Inuktitut!English
    Ave. Ave. z System
    73.1 0.168 NiuTrans
    72.9 0.167 Facebook-AI
    71.2 0.100 CUNI-Transfer
    70.7 0.096 Groningen
    70.3 0.072 SRPOL
    71.1 0.066 Helsinki
    70.2 0.055 NRC
    70.2 0.054 UEDIN
    70.1 0.047 UQAM-TanLe
    68.8 0.006 NICT-Kyoto
    68.4 0.035 OPPO
    Japanese!English
    Ave. Ave. z System
    75.1 0.184 Tohoku-AIP-NTT
    76.4 0.147 NiuTrans
    74.1 0.088 OPPO
    75.2 0.084 NICT-Kyoto
    73.3 0.068 Online-B
    70.9 0.026 Online-A
    71.1 0.019 eTranslation
    64.1 0.208 zlabs-nlp
    66.0 0.220 Online-G
    61.7 0.240 Online-Z
    Polish!English
    Ave. Ave. z System
    77.2 0.131 SRPOL
    76.7 0.097 Online-G
    77.7 0.096 NICT-Rui
    77.9 0.094 Online-B
    78.1 0.085 SJTU-NICT
    76.6 0.083 Online-A
    75.2 0.050 OPPO
    77.3 0.006 Online-Z
    78.1 0.003 CUNI-Transformer
    76.1 0.038 NICT-Kyoto
    73.3 0.041 VolcTrans
    73.2 0.048 PROMT-NMT
    74.3 0.072 Tilde
    74.0 0.130 zlabs-nlp
    Russian!English
    Ave. Ave. z System
    79.3 0.124 Online-G
    80.9 0.114 Online-A
    79.7 0.113 OPPO
    80.6 0.104 eTranslation
    79.5 0.096 PROMT-NMT
    80.2 0.072 Online-B
    79.9 0.062 HUMAN
    77.7 0.042 ariel xv
    79.2 0.026 AFRL
    10
    74.1 0.049 UEDIN-CUNI
    74.1 0.065 CUNI-T2T-2018
    72.5 0.069 Online-G
    71.8 0.080 Online-Z
    71.9 0.094 PROMT-NMT
    72.0 0.141 zlabs-nlp
    German!English
    Ave. Ave. z System
    82.6 0.228 VolcTrans
    84.6 0.220 OPPO
    82.2 0.186 HUMAN
    81.5 0.179 Tohoku-AIP-NTT
    81.3 0.179 Online-A
    81.5 0.172 Online-G
    79.8 0.171 PROMT-NMT
    82.1 0.167 Online-B
    78.5 0.131 UEDIN
    78.8 0.085 Online-Z
    74.2 0.079 WMTBiomedBaseline
    71.1 0.106 zlabs-nlp
    20.5 1.618 yolo
    Khmer!English
    Ave. Ave. z System
    69.0 0.168 Online-B
    69.4 0.146 GTCOM
    68.5 0.136 Huawei-TSC
    62.6 0.047 VolcTrans
    58.1 0.210 OPPO
    56.9 0.222 Online-Z
    55.5 0.282 Online-G
    Pashto!English
    Ave. Ave. z System
    67.3 0.032 Online-B
    66.7 0.024 GTCOM
    65.5 0.016 Huawei-TSC
    62.7 0.106 VolcTrans
    62.1 0.164 OPPO
    61.0 0.195 Online-Z
    76.0 0.016 DiD
    75.2 0.022 On
    71.7 0.153 zla
    Tamil!Eng
    Ave. Ave. z System
    68.7 0.203 GTCO
    70.3 0.202 OPPO
    68.9 0.176 Online
    73.9 0.173 Faceb
    70.9 0.150 NiuTr
    71.9 0.116 VolcT
    64.5 0.007 Online
    66.4 0.001 zlabs-
    67.5 0.016 Micro
    60.8 0.020 UEDI
    64.5 0.068 Online
    63.4 0.078 DCU
    53.7 0.398 Online
    53.9 0.451 TALP
    Table 12: Official results of WMT20 News Translation Task for translation into-English. Systems ordered b
    z-score; systems within a cluster are considered tied; lines indicate clusters according to Wilcoxon rank-sum tes
    grayed entry indicates resources that fall outside the constraints provided.
    77.1 0.322 UEDIN-CUNI
    70.5 0.048 Online-B
    69.1 0.017 Online-Z
    68.7 0.008 Online-A
    62.7 0.216 Online-G
    48.1 0.760 zlabs-nlp
    English!German
    Ave. Ave. z System
    90.5 0.569 HUMAN-B
    87.4 0.495 OPPO
    88.6 0.468 Tohoku-AIP-NTT
    85.7 0.446 HUMAN-A
    84.5 0.416 Online-B
    84.3 0.385 Tencent-Translation
    84.6 0.326 VolcTrans
    85.3 0.322 Online-A
    82.5 0.312 eTranslation
    84.2 0.299 HUMAN-paraphrase
    82.2 0.260 AFRL
    81.0 0.251 UEDIN
    79.3 0.247 PROMT-NMT
    77.7 0.126 Online-Z
    73.9 0.120 Online-G
    68.1 0.278 zlabs-nlp
    65.5 0.338 WMTBiomedBaseline
    59.8
    53.9
    52.8
    Ave.
    88.6
    76.4
    75.6
    76.3
    74.0
    70.6
    72.0
    72.4
    69.7
    71.8
    70.1
    69.0
    64.5
    63.9
    47.7
    Table 13: Official results of WMT20 News Translatio
    z-score; systems within a cluster are considered tied; lin
    grayed entry indicates resources that fall outside the cons
    English!Chinese
    Ave. Ave. z System
    80.6 0.568 HUMAN-B
    82.5 0.529 HUMAN-A
    80.0 0.447 OPPO
    79.0 0.420 Tencent-Translation
    77.3 0.415 Huawei-TSC
    77.4 0.404 NiuTrans
    77.7 0.387 SJTU-NICT
    76.6 0.373 VolcTrans
    73.7 0.282 Online-B
    73.0 0.241 Online-A
    69.5 0.136 dong-nmt
    68.5 0.135 Online-Z
    70.1 0.122 Online-G
    68.7 0.082 zlabs-nlp
    English!Czech
    Ave. Ave. z System
    85.6 0.654 HUMAN
    82.2 0.546 CUNI-DocTransformer
    81.8 0.538 OPPO
    80.8 0.505 SRPOL
    80.5 0.458 CUNI-T2T-2018
    80.4 0.441 eTranslation
    79.3 0.434 CUNI-Transformer
    77.1 0.322 UEDIN-CUNI
    English!Inuktitut (News only)
    Ave. Ave. z System
    90.5 0.574 HUMAN
    75.3 0.425 MultiLingual-Ubiqus
    77.4 0.409 CUNI-Transfer
    71.9 0.369 NRC
    74.6 0.368 Facebook-AI
    79.2 0.364 NICT-Kyoto
    71.6 0.339 Groningen
    75.2 0.296 Helsinki
    72.8 0.282 SRPOL
    68.9 0.084 UQAM-TanLe
    66.4 0.081 UEDIN
    48.2 0.384 OPPO
    English!Japanese
    Ave. Ave. z System
    79.7 0.576 HUMAN
    77.7 0.502 NiuTrans
    76.1 0.496 Tohoku-AIP-NTT
    75.8 0.496 OPPO
    75.9 0.492 ENMT
    71.8 0.375 NICT-Kyoto
    71.3 0.349 Online-A
    70.2 0.335 Online-B
    63.9 0.159 zlabs-nlp
    59.8 0.032 Online-Z
    Ave
    83.
    79.
    75.
    77.
    77.
    78.
    76.
    72.
    72.
    72.
    74.
    71.
    68.
    ʢਓखධՁʹ౷ܭత༗ҙ͕ࠩͳ͍৔߹ɺಉ཰ҰҐͱ͍͏ѻ͍ʣ

    View full-size slide

  10. ͦͷଞɿݴޠݱ৅ςετͰ΋޷੒੷
    11
    ෳ୯ޠදݱɾݻ༗໊ࢺ
    ػೳޠɾಈࢺͷ੍࣌ͱ͍ͬͨ
    ݴޠݱ৅ͷऔΓѻ͍ʹؔ͢Δ
    ςετ
    category items Tohoku Huoshan UEdin Onl-B Onl-G Onl-A PROMT
    Ambiguity 81 82.7 77.8 72.8 79.0 84.0 76.5 64.2
    Composition 49 98.0 98.0 93.9 93.9 95.9 93.9 89.8
    Coordination & ellipsis 78 89.7 91.0 89.7 91.0 85.9 87.2 87.2
    False friends 36 72.2 80.6 72.2 80.6 77.8 69.4 72.2
    Function word 72 86.1 80.6 86.1 90.3 90.3 83.3 88.9
    LDD & interrogatives 174 89.1 86.2 85.1 83.3 86.8 77.6 81.0
    MWE 80 80.0 75.0 71.3 77.5 77.5 71.3 70.0
    Named entitiy & terminology 89 92.1 84.3 87.6 82.0 82.0 88.8 87.6
    Negation 20 100.0 100.0 100.0 100.0 100.0 95.0 100.0
    Non-verbal agreement 61 91.8 88.5 88.5 86.9 90.2 83.6 82.0
    Punctuation 60 96.7 98.3 98.3 71.7 61.7 100.0 98.3
    Subordination 180 90.6 88.3 91.1 91.1 92.2 88.9 90.0
    Verb tense/aspect/mood 4447 84.6 85.3 80.3 75.9 79.6 77.5 75.1
    Verb valency 87 79.3 81.6 77.0 81.6 77.0 77.0 71.3
    micro-average 5514 85.3 85.4 81.2 77.7 80.6 78.7 76.5
    macro-average 5514 88.1 86.8 85.3 84.6 84.3 83.6 82.7
    BLEU 43.8 43.5 42.3 41.9 41.4 40.4 39.6
    Table 5: Accuracies (%) of successful translations for 11 systems and 14 categories. Boldface indicates the si
    Onl-A Onl-B Onl-G PROMT
    category 2019 2020 2019 2020 2019 2020 2020
    Ambiguity +2.6 +7.7 +1.3 +2.6 +2.6 +11.5 +16.7
    Composition +10.4 +2.1 -4.1 +12.5 +12.5 +10.4
    [Avramidis+2020]
    զʑͷγεςϜ͕
    ϚΫϩฏۉ஋Ͱ࠷΋ྑ͍੒੷

    View full-size slide

  11. ෼͔ͬͨ͜ͱᶃ
    ࠷ઌ୺ͷ/.5γεςϜ͸టष͍
    12

    View full-size slide

  12. Α͋͘Δ 4.5WT/.5 ؍
    13
    4.5ʢ౷ܭతػց຋༁ʣ
    Ȭdz̺̀ͦ ɵƹ̺̀ͦ ȴʗ̺̀ͦ
    ǺȰǂȿ / ͦ͠͠
    ̶͍͌ͦ̿ͦ͡͠
    ǨĂ̆ɭ̑
    ȩȷ
    ̀͘͠
    ŇǪ
    ǺȰǭŖ
    ̼̬͛ͦ̓ͤ
    ̪̬ͤͤ͟͞/
    ̢̡̬̿ͤ͟͞
    ̯̻̀ͦͦ / ǺȰ̴̥ͤͤ
    Ȏƽ̄ǮȪǺȰ(SMT)̆Řǫ̑
    sȷ/€
    ̢̠͗ͤ́͞
    N-best
    ǺȰǭŖ
    N-best
    ǺȰǭŖ
    ŮȨǺȰ
    GIZA++
    MGIZA
    FastAlign
    Nile
    SRILM
    KenLM
    RNNLM
    Moses, Joshua
    Travatar, KyotoEBMT
    MERT
    MIRA
    PRO
    13
    😩 ෳ਺ϞδϡʔϧʹΑΔ൥ࡶͳγεςϜ
    😩 Τϥʔ఻ൖ͕຋༁ਫ਼౓ʹѱӨڹ
    /.5ʢχϡʔϥϧػց຋༁ʣ
    😀 ୯ҰͷϞσϧͰҰؾ௨؏ֶश͕Մೳ
    😀 Τϥʔ఻ൖΛղফ→ߴ͍຋༁ਫ਼౓
    ຋༁Ϟσϧ
    ର༁ίʔύε
    ܇࿅
    ݪݴޠจ ର৅ݴޠจ
    ೖྗ σίʔυ

    ※θϩ͔Β࢝ΊΔχϡʔϥϧωοτϫʔΫػց຋༁ https://www.slideshare.net/ToshiakiNakazawa/nlp2017-nmt-tutorial ΑΓҾ༻

    View full-size slide

  13. Α͋͘Δ 4.5WT/.5 ؍
    14
    4.5ʢ౷ܭతػց຋༁ʣ
    Ȭdz̺̀ͦ ɵƹ̺̀ͦ ȴʗ̺̀ͦ
    ǺȰǂȿ / ͦ͠͠
    ̶͍͌ͦ̿ͦ͡͠
    ǨĂ̆ɭ̑
    ȩȷ
    ̀͘͠
    ŇǪ
    ǺȰǭŖ
    ̼̬͛ͦ̓ͤ
    ̪̬ͤͤ͟͞/
    ̢̡̬̿ͤ͟͞
    ̯̻̀ͦͦ / ǺȰ̴̥ͤͤ
    Ȏƽ̄ǮȪǺȰ(SMT)̆Řǫ̑
    sȷ/€
    ̢̠͗ͤ́͞
    N-best
    ǺȰǭŖ
    N-best
    ǺȰǭŖ
    ŮȨǺȰ
    GIZA++
    MGIZA
    FastAlign
    Nile
    SRILM
    KenLM
    RNNLM
    Moses, Joshua
    Travatar, KyotoEBMT
    MERT
    MIRA
    PRO
    13
    😩 ෳ਺ϞδϡʔϧʹΑΔ൥ࡶͳγεςϜ
    😩 Τϥʔ఻ൖ͕຋༁ਫ਼౓ʹѱӨڹ
    /.5ʢχϡʔϥϧػց຋༁ʣ
    😀 ୯ҰͷϞσϧͰҰؾ௨؏ֶश͕Մೳ
    😀 Τϥʔ఻ൖΛղফ→ߴ͍຋༁ਫ਼౓
    ຋༁Ϟσϧ
    ର༁ίʔύε
    ܇࿅
    ݪݴޠจ ର৅ݴޠจ
    ೖྗ σίʔυ

    ※θϩ͔Β࢝ΊΔχϡʔϥϧωοτϫʔΫػց຋༁ https://www.slideshare.net/ToshiakiNakazawa/nlp2017-nmt-tutorial ΑΓҾ༻
    ଟ͘ͷ৔໘Ͱ͜Ε͸ਖ਼͍͕͠ʜ
    ʮ࠷ઌ୺ͷ/.5ʯͰ͸ঢ়گ͕ҧ͏

    View full-size slide

  14. ͍ͭ΋ͷ/.5͔Β࠷ઌ୺/.5΁
    16
    ର༁ίʔύε
    ςετσʔλ
    ຋༁Ϟσϧ
    ຋༁จ
    ීஈզʑ͕׳Ε਌͠ΜͩNMTͷ࿮૊Έ

    View full-size slide

  15. ͍ͭ΋ͷ/.5͔Β࠷ઌ୺/.5΁
    17
    ୯Ұݴޠ
    ίʔύε
    ٯ຋༁Ϟσϧ
    ର༁ίʔύε
    ٙࣅ
    ର༁ίʔύε
    ର༁ίʔύε
    ର৅υϝΠϯ
    ίʔύε
    ςετσʔλ
    ຋༁Ϟσϧ
    ग़ྗީิ/จ
    ຋༁จ
    -3
    ॱํ޲
    ຋༁Ϟσϧ
    3-
    ॱํ޲
    ຋༁Ϟσϧ
    -3
    ٯํ޲
    ຋༁Ϟσϧ
    3-
    ٯํ޲
    ຋༁Ϟσϧ
    ϚεΫ
    ݴޠϞσϧ
    ୯ํ޲
    ݴޠϞσϧ
    ࠷ऴ຋༁݁Ռ

    View full-size slide

  16. ςετσʔλ͸৽ฉهࣄ
    υϝΠϯͳͷͰɺ
    ৽ฉهࣄͷσʔλʹదԠ
    ͍ͤͨ͞
    ⇛ϑΝΠϯνϡʔχϯά
    ࠷ઌ୺/.5͸lటष͍zٕज़ͷू߹ମ
    18
    ຋༁ϞσϧʢTransformerʣ
    ͸ϋΠύʔύϥϝʔλʹ
    ରͯ͠ඇৗʹηϯγςΟϒ
    ϋΠύϥͷ஌ݟ΋೔ਐ݄า
    ⇛ϋΠύʔύϥϝʔλͷௐ੔
    ର༁ίʔύε͚ͩͰ͸
    σʔλ͕଍Γͳ͍
    ୯ҰݴޠίʔύεΛ࢖ͬͯ
    σʔλΛ૿΍͍ͨ͠
    ⇛ٯ຋༁ʹΑΔσʔλ֦ு
    ࡾਓدΕ͹จघͷ஌ܙʂ
    ଞͷϞσϧͷҙݟ΋औΓ
    ೖΕͯग़ྗΛܾΊ͍ͨ
    ⇛ϦϥϯΩϯά
    Ϟσϧֶश݁Ռʹ͸Ϝϥ
    ͕͋ΔͷͰ
    ෳ਺ͷϞσϧΛಠཱʹ܇
    ࿅ͯ͠ଟ਺ܾ͠Α͏
    ⇛Ξϯαϯϒϧ
    ୯Ұݴޠ
    ίʔύε
    ٯ຋༁Ϟσϧ
    ର༁ίʔύε
    ٙࣅ
    ର༁ίʔύε
    ର༁ίʔύε
    ର৅υϝΠϯ
    ίʔύε
    ςετσʔλ
    ຋༁Ϟσϧ
    ग़ྗީิ/จ
    ຋༁จ
    -3
    ॱํ޲
    ຋༁Ϟσϧ
    3-
    ॱํ޲
    ຋༁Ϟσϧ
    -3
    ٯํ޲
    ຋༁Ϟσϧ
    3-
    ٯํ޲
    ຋༁Ϟσϧ
    ϚεΫ
    ݴޠϞσϧ
    ୯ํ޲
    ݴޠϞσϧ
    ࠷ऴ຋༁݁Ռ

    View full-size slide

  17. ࠷ઌ୺/.5͸lటष͍zٕज़ͷू߹ମ
    19
    ຋༁ϞσϧʢTransformerʣ
    ͸ϋΠύʔύϥϝʔλʹ
    ରͯ͠ඇৗʹηϯγςΟϒ
    ϋΠύϥͷ஌ݟ΋೔ਐ݄า
    ⇛ϋΠύʔύϥϝʔλͷௐ੔
    ୯Ұݴޠ
    ίʔύε
    ٯ຋༁Ϟσϧ
    ର༁ίʔύε
    ٙࣅ
    ର༁ίʔύε
    ର༁ίʔύε
    ର৅υϝΠϯ
    ίʔύε
    ςετσʔλ
    ຋༁Ϟσϧ
    ग़ྗީิ/จ
    ຋༁จ
    -3
    ॱํ޲
    ຋༁Ϟσϧ
    3-
    ॱํ޲
    ຋༁Ϟσϧ
    -3
    ٯํ޲
    ຋༁Ϟσϧ
    3-
    ٯํ޲
    ຋༁Ϟσϧ
    ϚεΫ
    ݴޠϞσϧ
    ୯ํ޲
    ݴޠϞσϧ
    ࠷ऴ຋༁݁Ռ

    View full-size slide

  18. ϋΠύʔύϥϝʔλͷௐ੔
    • ϞσϧɿTransformer [Vaswani+2017]
    • ۙ೥ͷσϑΝΫτతͳϞσϧͰ͋Γɺ࢖Θͳ͍ͱ͍͏બ୒ࢶ͸ͳ͍
    • ϑΟʔυϑΥϫʔυ૚ͷ࣍ݩ਺Λഒɾ૚ͷ਺Λ 6 à 9ʹઃఆ͠ɺ
    ΑΓଟ͘ͷσʔλΛֶश͢Δ͜ͱΛͶΒ͏
    • ௒ڊେόοναΠζ [Ott+2018]
    • ௨ৗ 4,000 τʔΫϯ à 512,000 τʔΫϯ΁
    • ऩଋ଎౓UPɾ൚Խੑೳ޲্
    • ܦݧతʹֶश΋҆ఆ͢Δ
    • Update delay ʢผ໊ ΰʔετόονʣΛ׆༻͢Δ͜ͱͰ࣮ݱ
    • ڊେֶश཰ [Ott+2018]
    • AdamͷεςοϓαΠζ 0.0005 à0.001
    • ऩଋ଎౓UP
    • ڊେόοναΠζͱͷ૊Έ߹Θ͕ͤඇৗʹॏཁ
    • νΣοΫϙΠϯτฏۉ๏
    • ద౰ͳ୯Ґʢྫ: ຖEpoch, 2k UpdatesʣͰϞσϧΛอଘ͓ͯ͘͠
    • ֶशޙɺอଘͨ͠Ϟσϧͷฏۉ஋Λܭࢉ͠ɺਪ࿦ʹ༻͍Δ
    • BLEUείΞ͕0.1~0.2΄Ͳվળ͢Δ [Popel+2018]
    • Pre-layer-normalization
    • ϑΟʔυϑΥϫʔυ૚ͱΞςϯγϣϯ૚ͷલͰLayerNormΛܭࢉ
    • ଟ૚Transformerͷֶश͕҆ఆ͢Δͱͷใࠂ [Xiong+2020]
    20
    Under review as a conference paper at ICLR 2020
    the warm-up stage happens in the first several iterations, we investigate the optimization behavior at
    initialization of the Post-LN Transformer. According to our theoretical analysis, when putting the
    layer normalization between the residual blocks, the expected gradients of the parameters near the
    output layer are large. Therefore, without the warm-up stage, directly using a large learning rate to
    those parameters may not lead to an improved model and can even make the optimization process
    unstable. Using a warm-up stage and training the model from small learning rates practically avoid
    this problem.
    Figure 1: (a) Post-LN Transformer layer; (b) Pre-
    LN Transformer layer.
    As the location of the layer normalization plays
    a crucial role in controlling the gradient scales,
    we investigate whether there are some other
    ways of positioning the layer normalization that
    lead to better-normalized gradients. In par-
    ticular, we study another variant, the Trans-
    former with Pre-Layer Normalization (Pre-LN)
    (Klein et al., 2018). The Pre-LN Transformer
    puts the layer normalization inside the residual
    connection and equips with an additional final-
    layer normalization before prediction (Please
    see Figure 1 for the differences between the two
    variants of the Transformer architectures). In
    this paper, we show that the gradients are well-
    behaved without any exploding or vanishing at
    initialization for the Pre-LN Transformer both
    theoretically and empirically.
    Given the gradients are well-behaved in the Pre-
    LN Transformer, it is natural to consider re-
    moving the learning rate warm-up stage during
    training. We conduct extensive experiments,
    including IWSLT14 German-English transla-
    tion, WMT14 English-German translation, and
    BERT pre-training tasks. We show that, in all
    Figure 1: The Transformer - model architecture.
    3.1 Encoder and Decoder Stacks
    Encoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has
    sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, posit
    wise fully connected feed-forward network. We employ a residual connection [11] around each
    the two sub-layers, followed by layer normalization [1]. That is, the output of each sub-laye
    LayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-la
    itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedd
    layers, produce outputs of dimension dmodel = 512.
    Decoder: The decoder is also composed of a stack of N = 6 identical layers. In addition to the
    sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-h
    attention over the output of the encoder stack. Similar to the encoder, we employ residual connecti
    around each of the sub-layers, followed by layer normalization. We also modify the self-attent
    sub-layer in the decoder stack to prevent positions from attending to subsequent positions. T
    masking, combined with fact that the output embeddings are offset by one position, ensures that
    predictions for position i can depend only on the known outputs at positions less than i.
    3.2 Attention
    An attention function can be described as mapping a query and a set of key-value pairs to an out
    where the query, keys, values, and output are all vectors. The output is computed as a weighted s
    of the values, where the weight assigned to each value is computed by a compatibility function of
    query with the corresponding key.
    [Vaswani+2017]ΑΓҾ༻
    [Xiong+2020]ΑΓҾ༻

    View full-size slide

  19. ࠷ઌ୺/.5͸lటष͍zٕज़ͷू߹ମ
    21
    ର༁ίʔύε͚ͩͰ͸
    σʔλ͕଍Γͳ͍
    ୯ҰݴޠίʔύεΛ࢖ͬͯ
    σʔλΛ૿΍͍ͨ͠
    ⇛ٯ຋༁ʹΑΔσʔλ֦ு
    ୯Ұݴޠ
    ίʔύε
    ٯ຋༁Ϟσϧ
    ର༁ίʔύε
    ٙࣅ
    ର༁ίʔύε
    ର༁ίʔύε
    ର৅υϝΠϯ
    ίʔύε
    ςετσʔλ
    ຋༁Ϟσϧ
    ग़ྗީิ/จ
    ຋༁จ
    -3
    ॱํ޲
    ຋༁Ϟσϧ
    3-
    ॱํ޲
    ຋༁Ϟσϧ
    -3
    ٯํ޲
    ຋༁Ϟσϧ
    3-
    ٯํ޲
    ຋༁Ϟσϧ
    ϚεΫ
    ݴޠϞσϧ
    ୯ํ޲
    ݴޠϞσϧ
    ࠷ऴ຋༁݁Ռ

    View full-size slide

  20. ٯ຋༁ʹΑΔσʔλ֦ு
    • .PSFEBUB CFUUFSQFSGPSNBODF
    • ࣗવݴޠॲཧʹ͓͚Δඪޠ
    • ΋ͬͱଟ͘ͷର༁σʔλ͕͋Ε͹ੑೳ޲্͢Δ͸ͣ
    • ͱ͸͍͑ɺͲ͏΍ͬͯσʔλΛ૿΍͢ͷ͔ʁ
    22
    ٯ຋༁

    View full-size slide

  21. ٯ຋༁ͱ͸Կ͔ʁ
    • ٯ຋༁ #BDLUSBOTMBUJPO#5
    <4FOOSJDI>
    • ୯Ұݴޠίʔύε͔Βٙࣅର༁ίʔύεΛ
    ੜ੒͢ΔͨΊͷํ๏࿦
    • /.5༻σʔλ֦ுͷσϑΝΫτతͳଘࡏ
    • ٯ຋༁ϞσϧΛ༻͍ͯ໨తݴޠͷจΛݪݴޠʹ
    ʮٯʯ຋༁
    23
    ೔→ӳ
    ຋༁Ϟσϧ
    ୯ݴޠίʔύεʢ೔ʣ ຋༁ࡁΈίʔύεʢӳʣ
    ӳ೔ͷٙࣅର༁ίʔύε
    ӳ೔ϞσϧΛ࡞Δ৔߹ʜ

    View full-size slide

  22. ٯ຋༁ͷϓϩηε
    24
    ೔ӳͷର༁ίʔύε
    ೔→ӳ
    ຋༁Ϟσϧ
    ܇࿅
    ᶃ ٯ຋༁Ϟσϧͷ܇࿅
    ᶄ ೔ຊޠ୯ݴޠίʔύεΛ຋༁͠ɺٙࣅσʔλΛੜ੒
    ೔→ӳ
    ຋༁Ϟσϧ
    ୯ݴޠίʔύεʢ೔ʣ ຋༁ࡁΈίʔύεʢӳʣ
    ӳ೔ͷٙࣅର༁ίʔύε
    ᶅ ٙࣅσʔλΛ༻ֶ͍ͯश
    ӳ೔ͷର༁ίʔύε
    ӳ→೔
    ຋༁Ϟσϧ
    ܇࿅
    ӳ೔ͷٙࣅର༁ίʔύε

    View full-size slide

  23. ࠷ઌ୺/.5͸lటष͍zٕज़ͷू߹ମ
    26
    ࡾਓدΕ͹จघͷ஌ܙʂ
    ଞͷϞσϧͷҙݟ΋औΓ
    ೖΕͯग़ྗΛܾΊ͍ͨ
    ⇛ϦϥϯΩϯά
    ୯Ұݴޠ
    ίʔύε
    ٯ຋༁Ϟσϧ
    ର༁ίʔύε
    ٙࣅ
    ର༁ίʔύε
    ର༁ίʔύε
    ର৅υϝΠϯ
    ίʔύε
    ςετσʔλ
    ຋༁Ϟσϧ
    ग़ྗީิ/จ
    ຋༁จ
    -3
    ॱํ޲
    ຋༁Ϟσϧ
    3-
    ॱํ޲
    ຋༁Ϟσϧ
    -3
    ٯํ޲
    ຋༁Ϟσϧ
    3-
    ٯํ޲
    ຋༁Ϟσϧ
    ϚεΫ
    ݴޠϞσϧ
    ୯ํ޲
    ݴޠϞσϧ
    ࠷ऴ຋༁݁Ռ

    View full-size slide

  24. ϦϥϯΩϯάͰީิจ͔Βྑ͍຋༁ΛબͿ
    • ϦϥϯΩϯάແ͠ͷ৔߹
    1. ϏʔϜαʔνʹΑͬͯީิจΛ/จੜ੒
    2. είΞͷ࠷΋ߴ͍จΛग़ྗ
    • ߴείΞ = ࠷΋ྑ͍຋༁ Ͱ͸ͳ͍
    • ଞͷީิจͷ΄͏͕ྑ͍຋༁ʹͳ͍ͬͯΔՄೳੑ
    • ϦϥϯΩϯάɿྑ͍຋༁Λݟ͚ͭग़ͨ͢Ίͷޙॲཧ
    27
    I have been extremely lucky.
    9.5
    1.1
    ީิNจ είΞ
    ຋༁Ϟσϧ
    ͱͯ΋޾ӡͰͨ͠
    ඇৗʹӡ͕ྑ͔ͬͨɻ
    ۃΊͯ޾ӡͰ͋ͬͨ
    ࢲ͸ຊ౰ʹ޾ӡͰͨ͠
    ࢲ͸ɺඇৗʹ޾ӡͩͬͨ
    8.2
    4.2
    2.9
    ϦϥϯΩϯάͳ͠ͷ৔߹
    ͜ͷจΛग़ྗ
    ຊ౰͸͜ͷจΛग़ྗ͍ͨ͠

    View full-size slide

  25. Ϟσϧͷू߹஌Ͱྑ͍຋༁Λ໨ࢦ͢
    28
    ᶃ ީิ/จͷੜ੒ XϏʔϜαʔν
    ᶄ /จΛ֤ϞδϡʔϧͰείΞ෇͚ είΞͷ߹ܭͰιʔτ
    I have been extremely lucky. ຋༁Ϟσϧ
    ͱͯ΋޾ӡͰͨ͠
    ඇৗʹӡ͕ྑ͔ͬͨɻ
    ۃΊͯ޾ӡͰ͋ͬͨ
    ࢲ͸ຊ౰ʹ޾ӡͰͨ͠
    ࢲ͸ɺඇৗʹ޾ӡͩͬͨ
    ͱͯ΋޾ӡͰͨ͠
    ඇৗʹӡ͕ྑ͔ͬͨɻ
    ۃΊͯ޾ӡͰ͋ͬͨ
    ࢲ͸ຊ౰ʹ޾ӡͰͨ͠
    ࢲ͸ɺඇৗʹ޾ӡͩͬͨ ͱͯ΋޾ӡͰͨ͠
    ඇৗʹӡ͕ྑ͔ͬͨɻ
    ۃΊͯ޾ӡͰ͋ͬͨ
    ࢲ͸ຊ౰ʹ޾ӡͰͨ͠
    ࢲ͸ɺඇৗʹ޾ӡͩͬͨ
    είΞ
    Ϟδϡʔϧ
    ୯ํ޲ݴޠϞσϧ
    ૒ํ޲ݴޠϞσϧ
    ٯ຋༁Ϟσϧ
    ٯํ޲຋༁Ϟσϧ
    ͳͲͳͲ…

    View full-size slide

  26. ID Setting EnàDe DeàEn EnàJa JaàEn
    (a) ϕʔεϥΠϯ 42.4 42.0 19.7 21.6
    (b) ϕʔεϥΠϯ+ٯ຋༁ 42.7 42.5 22.0 23.9
    (c) (b)+ϑΝΠϯνϡʔχϯά 44.9 42.3 23.1 24.4
    (d) (c) x 4 (Ξϯαϯϒϧ) 45.5 42.8 23.9 25.4
    (e) (d)+ϦϥϯΩϯά 45.7 43.8 24.9 26.2
    - લ೥౓ͷ༏উγεςϜ 44.9 42.8 - -
    ࣮ݧ݁Ռ
    29
    • ֤ٕज़ͷ૊Έ߹ΘͤͰੑೳ޲্Λୡ੒
    • ࠷ઌ୺ͷੑೳΛग़ͨ͢Ίʹ͸ෳࡶͳγεςϜ͕ඞཁ

    View full-size slide

  27. ෼͔ͬͨ͜ͱᶄ
    ๲େͳϦιʔε͕ඞཁ
    30

    View full-size slide

  28. ߏஙͨ͠γεςϜɿ৭ʑͳ෺͕૿͍͑ͯΔ
    • ܇࿅σʔλͷྔ͕૿͍͑ͯΔ
    • ௨ৗɿ࠷େͰ . ఔ౓
    • ࠓճɿӳಠͰ͸ .
    • Ϟσϧͷύϥϝʔλ͕૿͍͑ͯΔ
    • ௨ৗɿΤϯίʔμͱσίʔμͰͦΕͧΕ૚ͣͭ
    • ࠓճɿͦΕͧΕ૚
    • Ϟσϧͷ਺͕૿͍͑ͯΔ
    • ΞϯαϯϒϧɾϦϥϯΩϯά༻ʹෳ਺ͷϞσϧ͕ඞཁ
    • ֤ݴޠͰ8Ϟσϧඞཁ → ߹ܭϞσϧ
    31
    γεςϜߏஙʹඞཁͳϦιʔεͷ૿Ճ

    View full-size slide

  29. ඞཁͳ΋ͷ͸ɺ͍͢͝(16ͱ࣌ؒ
    32
    1ݸͷϞσϧΛ࡞ΔͨΊʹඞཁͳ΋ͷ
    V100 32GB (Ұຕ100ສԁ)
    GPU ࣌ؒ
    16೔ؒ
    16೔ؒͰऴΘΔͳΒ
    ଴ͯͳ͘΋ͳ͍͚Ͳ…
    ࣮͸32Ϟσϧ
    ඞཁͳͷͰ…

    View full-size slide

  30. ඞཁͳ΋ͷ͸ɺ͍͢͝(16ͱ௕͍࣌ؒ
    33
    32ݸͷϞσϧΛ࡞ΔͨΊʹඞཁͳ΋ͷ
    V100 32GB (Ұຕ100ສԁ)
    GPU ࣌ؒ
    512೔ؒ
    Ұ೥Ҏ্ඞཁͳΜͯ
    ίϯϖ͕ऴΘΔͰ͠ΐ
    ฒྻԽͯ͠଎͘͠·͢…

    View full-size slide

  31. ඞཁͳ΋ͷ͸ɺ͔ͳΓ͍͢͝(16ͱͦͦ͜͜ͷ࣌ؒ
    34
    32ݸͷϞσϧΛ࡞ΔͨΊʹඞཁͳ΋ͷ
    DGX-2
    V100 16ຕࢗ͠Ϛγϯ (????ສԁ)
    GPU ࣌ؒ
    32೔ؒ
    ͭ·ΓDGX-2͕10୆͋Ε͹
    3೔ͰऴΘΔΜͩͶ ͦͷ௨ΓͰ͕͢ࠓ౓͸…

    View full-size slide

  32. ඞཁͳ΋ͷ͸ɺ݁ہ͓ۚ
    • DGX-2 ૬౰ͷϚγϯ͸ AWS Ͱ 60υϧ/hour
    • ͭ·Γɺ1Ϟσϧ࡞Δͷʹ1440υϧඞཁ
    • ͭ·Γɺ32Ϟσϧ࡞Δͷʹ46080υϧඞཁ
    • ͜Ε͸೔ຊԁʹͯ͠500ສԁऑ
    • ్தͷࢼߦࡨޡʹֹ͔͔ͬͨۚΛ߹ΘͤΔͱ
    ?ઍສԁ΄Ͳ͔͔ͬͨܭࢉ
    • ͋͘·Ͱ΋AWSͰGPUΛआΓͨ৔߹ͷ֓ࢉ
    • ॴଐ૊৫ͷϚγϯΛ࢖ͬͨͨΊɺ࣮ࡍͷֹۚͱ͸
    ҟͳΓ·͢
    35
    ݁ہ͍͘Β࢖ͬͨͷʁ
    ൿಗࣄ߲Ͱ͢

    View full-size slide

  33. ʢ༨ஊʣ্͍ͭͩͬͯʹ͸্͕͍Δ
    • GPT-3Λ࡞ΔͨΊʹඞཁͳ΋ͷ
    • σʔλ͸ https://lambdalabs.com/blog/demystifying-gpt-3/ ΑΓҾ༻
    36
    V100 32GB (Ұຕ100ສԁ)
    GPU ࣌ؒ
    355೥
    ͝ઌ૆༷ͷ࡞Γ࢝Ίͨ
    GPT-3͕དྷि׬੒Ͱ͢
    ͜ͷલग़ͨGPT-371ͷ
    ؒҧ͍ͱ͔Ͱ͸ͳ͘ʁ

    View full-size slide

  34. ෼͔ͬͨ͜ͱᶅ
    lటष͍z/.5ͷઌʹ
    ৽͍͠ੈք͕ݟ͖͍͑ͯͯΔ
    37

    View full-size slide

  35. ;ͨͨͼɺ࣮ݧ݁Ռ
    38
    ID Setting EnàDe DeàEn EnàJa JaàEn
    (a) ϕʔεϥΠϯ 42.4 42.0 19.7 21.6
    (b) ϕʔεϥΠϯ+ٯ຋༁ 42.7 42.5 22.0 23.9
    (c) (b)+ϑΝΠϯνϡʔχϯά 44.9 42.3 23.1 24.4
    (d) (c) x 4 (Ξϯαϯϒϧ) 45.5 42.8 23.9 25.4
    (e) (d)+ϦϥϯΩϯά 45.7 43.8 24.9 26.2
    - લ೥౓ͷ༏উγεςϜ 44.9 42.8 - -

    View full-size slide

  36. ؍࡯ɿٯ຋༁͸ޮՌͳ͠ʁ
    • ٯ຋༁Ͱ܇࿅σʔλ͸໿10ഒʹ
    • ӳಠͰ͸BLEUείΞ͸΄ͱΜͲ޲্ͤͣ
    • ྫ͑͹ 42.4 à 42.7
    • ੑೳήΠϯ͕࿑ྗʹݟ߹͍ͬͯͳ͍…
    • More data, better same performance ͳͷ͔ʁ
    39
    ID Setting EnàDe DeàEn EnàJa JaàEn
    (a) ϕʔεϥΠϯ 42.4 42.0 19.7 21.6
    (b) ϕʔεϥΠϯ+ٯ຋༁ 42.7 42.5 22.0 23.9
    (c) (b)+ϑΝΠϯνϡʔχϯά 44.9 42.3 23.1 24.4
    (d) (c) x 4 (Ξϯαϯϒϧ) 45.5 42.8 23.9 25.4
    (e) (d)+ϦϥϯΩϯά 45.7 43.8 24.9 26.2
    - લ೥౓ͷ༏উγεςϜ 44.9 42.8 - -

    View full-size slide

  37. ٯ຋༁ͷޮՌ͸#-&6ͰଌΕͳ͍
    • ࠷ઌ୺NMTʹ͓͚Δٯ຋༁ͷޮՌʹ͍ͭͯ
    • [Edunov+2020] [Bogoychev+2019]
    40
    😀 ٯ຋༁͸ແବͰ͸ͳ͔ͬͨ
    😩 ਓखධՁͱ#-&6͕૬ؔ͠ͳ͍ੈքʹͳ͍ͬͯΔ…
    BLEU ٯ຋༁͋Γ ٯ຋༁ͳ͠

    ٯ຋༁͋Γ ٯ຋༁ͳ͠
    >
    ٙࣅର༁ίʔύεͰͷֶशʹΑΓ
    ग़ྗ͕ྲྀெʹͳ͍ͬͯΔ
    [Edunov+2020]
    ਓखධՁ

    View full-size slide

  38. ؍࡯ɿϦϥϯΩϯάͷޮՌ͕ബ͍
    • ϦϥϯΩϯάʹ͸ɺഒҎ্ͷϞσϧ͕ඞཁ
    • ͭ·ΓɺഒҎ্ͷ͓͕͔͔͍ۚͬͯΔ
    • ͔͠͠ɺBLEUείΞ͕ࢥͬͨΑ͏ʹ޲্͠ͳ͍
    • ྫ͑͹ 45.5 à 45.7
    • ·ͨͯ͠΋ੑೳήΠϯ͕࿑ྗʹݟ߹͍ͬͯͳ͍…
    42
    ID Setting EnàDe DeàEn EnàJa JaàEn
    (a) ϕʔεϥΠϯ 42.4 42.0 19.7 21.6
    (b) ϕʔεϥΠϯ+ٯ຋༁ 42.7 42.5 22.0 23.9
    (c) (b)+ϑΝΠϯνϡʔχϯά 44.9 42.3 23.1 24.4
    (d) (c) x 4 (Ξϯαϯϒϧ) 45.5 42.8 23.9 25.4
    (e) (d)+ϦϥϯΩϯά 45.7 43.8 24.9 26.2
    - લ೥౓ͷ༏উγεςϜ 44.9 42.8 - -

    View full-size slide

  39. ղ͚ͳ͍໰୊Λղ͜͏ͱ͍ͯ͠Δʁ
    • ݪจͱީิจ͔Β͚ͩͰ͸ྑ͍຋༁จΛ൑அͰ͖
    ͳ͍ͷͰ͸ʁ
    • ਓؒʹ΋ʮྑ͍຋༁จʯ͸Θ͔Βͳ͍
    • ൑அ͢Δ͚ͩͷ৘ใ͕଍Γ͍ͯͳ͍
    • ͲΜͳ৘ใ͕͋Ε͹ྑ͍͔ ⇛ จ຺ʁ
    43
    ࿨ฏϓϩηεʹӨڹΛٴ΅ͨ͘͠͸ͳ͍
    ࿨ฏϓϩηεʹӨڹΛ༩͑ͨ͋͘Γ·ͤΜɻ
    ࿨ฏϓϩηεʹӨڹΛٴ΅ͯ͠ཉ͘͠ͳ͍
    ࿨ฏϓϩηεʹӨڹΛ༩͑ͨ͘ͳ͍ͷͰ͢ɻ
    ࿨ฏϓϩηεʹӨڹ͕ग़ͳ͍Α͏ʹ͍ͨ͠ɻ
    ࿨ฏϓϩηεʹӨڹΛٴ΅ͨ͋͘͠Γ·ͤΜ
    ࿨ฏϓϩηεʹӨڹΛ༩͑Δ͜ͱ͸๬·ͳ͍ɻ
    ࿨ฏϓϩηεʹӨڹΛ༩͑Δ͜ͱΛ๬·ͳ͍ɻ
    ຋༁γεςϜ

    View full-size slide

  40. ·ͱΊ
    • ࠷ઌ୺ͷNMTγεςϜߏங͔ΒΘ͔ͬͨ͜ͱ3ͭΛ
    ঺հ
    • ᶃ ࠷ઌ୺ͷNMTγεςϜ͸టष͍
    • ᶄ ๲େͳϦιʔε͕ඞཁ
    • ᶅ lటष͍zNMTͷઌʹ৽͍͠ੈք͕ݟ͖͍͑ͯͯΔ
    • ೔ຊͷ૊৫Ͱ΋ɺϦιʔε͕͋Ε͹ੈք͸औΕΔʂ
    45

    View full-size slide

  41. ࢀߟจݙ
    • [Avramidis+2020]: Avramidis, E., Macketanz, V., Lommel, A., & Uszkoreit, H. (2018). Fine-grained evaluation of
    Quality Estimation for Machine translation based on a linguistically motivated Test Suite. In Proceedings of the
    AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing (pp. 243–248). Association for
    Machine Translation in the Americas.
    • [Vaswani+2017]: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, ., & Polosukhin, I.
    (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems 31 (NIPS 2017) (pp. 5998–
    6008).
    • [Ott+2018]: Ott, M., Edunov, S., Grangier, D., & Auli, M. (2018). Scaling Neural Machine Translation. In Proceedings
    of the Third Conference on Machine Translation: Research Papers (pp. 1–9). Association for Computational
    Linguistics.
    • [Xiong+2020]: Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan
    Lan, Liwei Wang, & Tie-Yan Liu (2020). On Layer Normalization in the Transformer Architecture. In Proceedings of
    the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (pp. 10524–
    10533). PMLR.
    • [Sennrich+2016]: Sennrich, R., Haddow, B., & Birch, A. (2016). Improving Neural Machine Translation Models with
    Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
    (Volume 1: Long Papers) (pp. 86–96). Association for Computational Linguistics.
    • [Edunov+2020]: Edunov, S., Ott, M., Ranzato, M., & Auli, M. (2020). On The Evaluation of Machine Translation
    Systems Trained With Back-Translation. In Proceedings of the 58th Annual Meeting of the Association for
    Computational Linguistics (pp. 2836–2846). Association for Computational Linguistics.
    • [Bogoychev+2019]: Nikolay Bogoychev, & Rico Sennrich (2019). Domain, Translationese and Noise in Synthetic
    Data for Neural Machine Translation CoRR, abs/1911.03362.
    • [Freitag+2020]: Freitag, M., Grangier, D., & Caswell, I. (2020). BLEU might be Guilty but References are not
    Innocent. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
    (pp. 61–71). Association for Computational Linguistics.
    46

    View full-size slide