Slide 33
Slide 33 text
34
⚫ 低資源言語でも本当に精度が
良いのか?
– 大半の翻訳方向で比較手法を
上回り、相対的な良さでは Yes
– ただし、低資源言語での COMET
スコアが信頼できるかは微妙※
– BLEU の結果から、多くのモデル
は ”en→xx” での品質が低いこと
が示唆される
[Xu+ ’24b] X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
(5’) モデル学習方法 ― 既存研究② [補足]
[COMET-22] en-is en-no en-mk en-mg en-gu en-mr en-ne en-az en-ky en-uz avg.
NLLB-3.3B 84.6 88.9 88.8 81.6 87.2 74.3 76.5 86.9 88.1 89.8 84.7
LLaMAX3-Alpaca-8B 81.2 87.8 87.4 56.8 82.7 69.5 78.4 80.0 82.9 74.5 78.1
Aya-101 84.3 87.5 88.7 81.1 83.9 69.5 77.5 85.6 86.6 88.6 83.3
X-ALMA 87.2 90.8 90.6 82.1 88.9 76.5 84.7 88.4 88.8 90.1 86.8
[COMET-22] is-en no-en mk-en mg-en gu-en mr-en ne-en az-en ky-en uz-en avg.
NLLB-3.3B 64.2 80.7 84.3 63.3 90.2 87.0 89.7 77.5 81.6 60.7 77.9
LLaMAX3-Alpaca-8B 85.6 88.5 87.2 76.0 66.0 87.3 89.3 70.0 84.5 86.1 82.1
Aya-101 82.3 88.1 84.3 79.8 82.3 85.2 84.9 85.2 83.0 84.9 84.0
X-ALMA 87.2 89.5 88.4 81.9 90.3 88.6 90.7 87.0 85.7 87.4 87.7
[BLEU] en-is en-no en-mk en-mg en-gu en-mr en-ne en-az en-ky en-uz avg.
NLLB-3.3B 24.5 33.0 34.4 17.7 24.3 17.1 16.4 14.0 13.2 18.6 21.3
LLaMAX3-Alpaca-8B 18.3 28.0 29.3 2.4 13.7 10.1 10.7 7.3 7.9 6.8 13.5
Aya-101 20.9 26.9 30.7 16.1 15.6 10.3 10.5 11.5 10.4 12.0 16.5
X-ALMA 27.4 34.2 37.6 16.1 24.7 17.9 21.5 14.0 12.8 15.5 22.2
[BLEU] is-en no-en mk-en mg-en gu-en mr-en ne-en az-en ky-en uz-en avg.
NLLB-3.3B 16.2 32.1 37.1 13.5 42.3 34.0 38.0 15.1 20.1 5.3 25.4
LLaMAX3-Alpaca-8B 32.5 41.8 39.8 19.6 9.9 30.6 32.9 7.9 20.4 27.9 26.3
Aya-101 27.2 39.5 33.7 27.7 28.0 30.1 31.2 21.5 20.4 28.1 28.7
X-ALMA 37.4 45.7 45.6 29.2 40.6 37.7 41.4 25.5 24.7 33.5 36.1
Appendix E (Table 7-15) の Flores 結果
を低資源10言語について集約
低資源 10 言語:
Islandic (is), Norwegian (no), Macedonian (mk), Malagasy (mg),
Gujarati (gu), Marathi (mr), Nepali (ne), Azerbaijani (az),
Kyrgyz (ky), Uzbek (uz)
※COMET-22 論文 [Rei+ ‘22] では高資源言語対について
しか評価モデルの評価が行われていない