Modality-Preserving Phrase-Based Statistical Machine Translation

.PEBMJUZ1SFTFSWJOH 1ISBTF#BTFE 4UBUJTUJDBM.BDIJOF5SBOTMBUJPO Masamichi Ideue, Masao Utiyama, Eiichiro Sumita and
Kazuhide Yamamoto (Nagaoka University of Technology and NICT)

Purpose of our study Japanese to English translation preserving negation
and question modality by Phrase- based SMT. Input ࢲ͸ΓΜ͕͝޷͖Ͱ͸͋Γ·ͤΜɻ Translation I don’t like apples. The MT users would not be able to detect a modality error. MT Translation I like apples.

Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et
al., 2009] • Discriminative Reranking for SMT using Various Global Features [Goh et al., 2010] Our study focused on characteristic modality words in negations and questions. Neither of the studies discussed what expressions inﬂuence modalities.

Proposed Method Add feature functions considered characteristic words of negation
and question.

Added feature functions The number of phrase pairs including characteristic
words of question (negation) in Japanese phrase and English phrase. )ZQPUIFTJTe8IFSFJTUIFQVSTF *OQVUfࡒ෍͸Ͳ͜ʹ͋Γ·͔͢ʁ

Characteristic Words Extraction • Manual extraction • Automatic extraction •
Using LLR(Log-likelihood ratio) score Extract characteristic words from the parallel corpus in travel domain.

Manual Extraction &OHMJTI /FHBUJPO 2VFTUJPO OPU `U EPO %PO IBWFO
JTO /P XPO XBTO EPFTO EJEO DBOOPU IBEO 8IZ 8JMM 8IBU $PVME *T )PX %PFT $BO %P "SF 8IJDI 8IFO 8IFSF )BWF %PFT %JE 8BT .BZ

Manual Extraction +BQBOFTF /FHBUJPO 2VFTUJPO ͳ͍ OBJ ·ͤΜ NBTFO ͔ɻ
LB • The characteristic words that clearly express the modalities are few. • Whether a word expresses modality or not, there is tendency to depends on the domain.

Automatic Extraction • Automatic extraction is based on LLR. •
LLR is convenient for extracting characteristic words in travel domain (Chujo et al., 2006). 8JMM $PVME )PX $BO Extract top N words from the ranking by LLR score as the characteristic words. 0SEFSCZ--3TDPSF 2VFTUJPO

Calculation of LLR *ODBTFPGOFHBUJPO *GBXPSEUFOETUPPDDVSJOOFHBUJPOPOMZ UIF--3TDPSFCFDPNFTIJHI /FHBUJPO "⒏SNBUJPO X
B C B C X D E D E B D C E O B C D EPDDVSSFODFGSFRVFODZJOFBDIDPOEJUJPO

Sentence type classification 5PCVJMEUIFDPOUJOHFODZUBCMF XF EJWJEFETFOUFODFTJOUIFQBSBMMFM DPSQVTXJUINBOVBMMZFYUSBDUFE &OHMJTIDIBSBDUFSJTUJDXPSET &OHMJTI +BQBOFTF
5ZQF )FJTOPUBOBSUJTU ൴͸ܳज़ՈͰ͸ͳ͍ɻ OFHBUJPO *MJLFBQQMFT ࢲ͸ΓΜ͕͝޷͖Ͱ͢ɻ B⒏SNBUJPO "SFZPVBEPDUPS ͋ͳͨ͸ҩऀͰ͔͢ɻ RVFTUJPO

Extracted Words by LLR &OHMJTI /FHBUJPO 2VFTUJPO EP BOZ UIFSF
IBWF UIJT EPO MPOH JU JTO EJE ZPVS NVDI IPX UJNF DBO ZFU BOZ CVU LOPX XPSSZ * BOZUIJOH JU TP BGSBJE VOEFSTUBOE XIBU FOPVHI

Extracted Words by LLR +BQBOFTF /FHBUJPO 2VFTUJPO ͔ Ͳ͜ Կ
Ͳ͏ ͍͘Β ͸ ͍͚ͨͩ Ͳͷ Կ࣌ ͋Γ Ͱ͠ΐ ΋Β͑ ͍͔͕ ͲΜͳ ·ͤ ͳ͍ Μ ͸ ͳ͔ͬ ͋·Γ ·ͩ ͋Γ Ͱ͖ ͡Ό ͍͍͑ ͦΜͳʹ ͦΜͳ ͨ͘

Experiments 4.55PPMLJU .PTFT 5VOJOH .JOJNVN&SSPS3BUF5SBJOJOH 1BSBMMFMDPSQVT #BTJD5SBWFM&YQSFTTJPO $PSQVT #5&$ QBJST
5FTUTFU TFOUFODFT JODMVEFETFOUFODFTGPSOFHBUJPO RVFTUJPO BOEB⒏SNBUJPO %FWFMPQNFOU TFU TFOUFODFT JOUIFTBNFXBZBT UFTUTFU

Experiments • From preliminary experimental evaluation with BLEU, the N
is decided as 30 (LLR30). • Baseline method is no additional features.

Manual Evaluation • To verify effectiveness of translation quality when
add the proposed features. • To verify accuracy of each modality. 8FSBOEPNMZFYUSBDUFEQBJSTUPUFTUUIF NFUIPETGPSFBDINPEBMJUZ UPUBMQBJST

Translation Quality (PPE 4 " # 4 " # $
% #BTFMJOF /PBEEJUJPOBMGFBUVSFT .BOVBMMZ&YUSBDUJPO --3 "MMUIFNFUIPETIBWFUIFTBNFUSBOTMBUJPO RVBMJUZJG4 "BOE#BSFBTTVNFEHPPE USBOTMBUJPO OVNCFSPGTFOUFODFT

Accuracy of each modality "⒎ /FH 2VF #BTFMJOF
.BOVBMMZ&YUSBDUJPO --3 1FSDFOUBHFPGUIFPVUQVUTQSFTFSWFEUIFNPEBMJUZPGUIFJOQVU w1SPQPTFENFUIPETJOEJDBUFEBNBSLFEJNQSPWFNFOU JOOFHBUJPONPEBMJUZ w5IFBDDVSBDZPG--3XBTCFUUFSUIBOUIFBDDVSBDZ PGUIFCBTFMJOFJOBMMNPEBMJUJFT

Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO 8IJDIPOFTIBMMXFHPUPUIFDJSDVTBOE [PP 0 *OQVU
2VFTUJPO αʔΧεͱಈ෺ԂɺͲͬͪʹߦ͜͏͔ɻ #BTFMJOF -FU`THPUPUIFDJSDVTBOE UIF[PP 9

Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO *EPO`UNJOEJGZPVDBODFMJU 9 *OQVU 2VFTUJPO
Ωϟϯηϧͯ͠΋͔·͍·ͤΜ͔ɻ #BTFMJOF .BZ*DBODFM 0 NBTFO OFHBUJPO NBTFOLB RVFTUJPO 8FIBWFUPUSFBUXPSEDPNCJOBUJPOT

Conclusion • We proposed additional feature considering characteristic words for
modality-preserving PBSMT. • Produced more translations preserved the modality of the input sentence than baseline without decrease of translation quality. • Automatic extraction performed the same as or better than manual extraction.

LLR /FHBUJPO "⒏SNBUJPO X B C B C X D
E D E B D C E O

Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO 1MFBTFHPFBTZ 0 *OQVU "⒏SNBUJPO
΍͘͞͠ଧ͍ͬͯͩ͘͞Ͷɻ 1SPQPTFENUIPE &OHMJTITJEFPOMZ 1MFBTFHPFBTZ JTO`UJU 9

Calculation of LLR • Pr(D|H_indep) is the probability under the
null hypothesis that the occurrences of a word w in the negative and afﬁrmative sentences are independent of one another. • Pr(D|H_dep) is the case in which the occurrences are dependent. *ODBTFPGOFHBUJPO *GBXPSEUFOETUPPDDVSJOOFHBUJPO POMZ UIF--3TDPSFCFDPNFTIJHI

Calculation of LLR /FHBUJPO "⒏SNBUJPO X B C B C
X D E D E B D C E O B C D EPDDVSSFODFGSFRVFODZJOFBDIDPOEJUJPO

Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et
al., 2009] • 2 models are trained for question sentence and other sentence. • Discriminative Reranking for SMT using Various Global Features [Goh et al., 2010] • Probabilities of sentence types such as negations and questions are used. Our study focused on characteristic modality words in negations and questions. Neither of the studies discussed what expressions inﬂuence modalities.

Modality-Preserving Phrase-Based Statistical Ma...

Modality-Preserving Phrase-Based Statistical Machine Translation

自然言語処理研究室

More Decks by 自然言語処理研究室

Other Decks in Research

Featured

Transcript

.PEBMJUZ1SFTFSWJOH 1ISBTF#BTFE 4UBUJTUJDBM.BDIJOF5SBOTMBUJPO Masamichi Ideue, Masao Utiyama, Eiichiro Sumita and

Purpose of our study Japanese to English translation preserving negation

Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et

Proposed Method Add feature functions considered characteristic words of negation

Added feature functions The number of phrase pairs including characteristic

Characteristic Words Extraction • Manual extraction • Automatic extraction •

Manual Extraction &OHMJTI /FHBUJPO 2VFTUJPO OPU `U EPO %PO IBWFO

Manual Extraction +BQBOFTF /FHBUJPO 2VFTUJPO ͳ͍ OBJ ·ͤΜ NBTFO ͔ɻ

Automatic Extraction • Automatic extraction is based on LLR. •

Calculation of LLR ODBTFPGOFHBUJPO GBXPSEUFOETUPPDDVSJOOFHBUJPOPOMZ UIF--3TDPSFCFDPNFTIJHI /FHBUJPO "⒏SNBUJPO X

Sentence type classification 5PCVJMEUIFDPOUJOHFODZUBCMF XF EJWJEFETFOUFODFTJOUIFQBSBMMFM DPSQVTXJUINBOVBMMZFYUSBDUFE &OHMJTIDIBSBDUFSJTUJDXPSET &OHMJTI +BQBOFTF

Extracted Words by LLR &OHMJTI /FHBUJPO 2VFTUJPO EP BOZ UIFSF

Extracted Words by LLR +BQBOFTF /FHBUJPO 2VFTUJPO ͔ Ͳ͜ Կ

Experiments 4.55PPMLJU .PTFT 5VOJOH .JOJNVN&SSPS3BUF5SBJOJOH 1BSBMMFMDPSQVT #BTJD5SBWFM&YQSFTTJPO $PSQVT #5&$ QBJST

Experiments • From preliminary experimental evaluation with BLEU, the N

Manual Evaluation • To verify effectiveness of translation quality when

Translation Quality (PPE 4 " # 4 " # $

Accuracy of each modality "⒎ /FH 2VF #BTFMJOF

Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO 8IJDIPOFTIBMMXFHPUPUIFDJSDVTBOE [PP 0 *OQVU

Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO EPO`UNJOEJGZPVDBODFMJU 9 OQVU 2VFTUJPO

Conclusion • We proposed additional feature considering characteristic words for

LLR

LLR /FHBUJPO "⒏SNBUJPO X B C B C X D

Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO 1MFBTFHPFBTZ 0 *OQVU "⒏SNBUJPO

Calculation of LLR • Pr(D|H_indep) is the probability under the

Calculation of LLR /FHBUJPO "⒏SNBUJPO X B C B C

Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et