Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modality-Preserving Phrase-Based Statistical Machine Translation

Modality-Preserving Phrase-Based Statistical Machine Translation

Masamichi Ideue, Masao Utiyama, Eiichiro Sumita and Kazuhide Yamamoto. Modality-Preserving Phrase-Based Statistical Machine Translation. Proceedings of the International Conference on Asian Language Processing (IALP 2012), pp.129-132 (2012.11)

自然言語処理研究室

November 30, 2012
Tweet

More Decks by 自然言語処理研究室

Other Decks in Research

Transcript

  1. Purpose of our study Japanese to English translation preserving negation

    and question modality by Phrase- based SMT. Input ࢲ͸ΓΜ͕͝޷͖Ͱ͸͋Γ·ͤΜɻ Translation I don’t like apples. The MT users would not be able to detect a modality error.  MT Translation I like apples.
  2. Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et

    al., 2009] • Discriminative Reranking for SMT using Various Global Features [Goh et al., 2010] Our study focused on characteristic modality words in negations and questions. Neither of the studies discussed what expressions influence modalities. 
  3. Added feature functions The number of phrase pairs including characteristic

    words of question (negation) in Japanese phrase and English phrase. )ZQPUIFTJTe8IFSFJTUIFQVSTF *OQVUfࡒ෍͸Ͳ͜ʹ͋Γ·͔͢ʁ  
  4. Characteristic Words Extraction • Manual extraction • Automatic extraction •

    Using LLR(Log-likelihood ratio) score Extract characteristic words from the parallel corpus in travel domain. 
  5. Manual Extraction &OHMJTI /FHBUJPO 2VFTUJPO OPU `U EPO %PO IBWFO

    JTO /P XPO XBTO EPFTO EJEO DBOOPU IBEO 8IZ 8JMM 8IBU $PVME *T )PX %PFT $BO %P "SF 8IJDI 8IFO 8IFSF )BWF %PFT %JE 8BT .BZ 
  6. Manual Extraction +BQBOFTF /FHBUJPO 2VFTUJPO ͳ͍ OBJ ·ͤΜ NBTFO ͔ɻ

    LB • The characteristic words that clearly express the modalities are few. • Whether a word expresses modality or not, there is tendency to depends on the domain. 
  7. Automatic Extraction • Automatic extraction is based on LLR. •

    LLR is convenient for extracting characteristic words in travel domain (Chujo et al., 2006).   8JMM  $PVME  )PX  $BO   Extract top N words from the ranking by LLR score as the characteristic words. 0SEFSCZ--3TDPSF 2VFTUJPO 
  8. Sentence type classification 5PCVJMEUIFDPOUJOHFODZUBCMF XF EJWJEFETFOUFODFTJOUIFQBSBMMFM DPSQVTXJUINBOVBMMZFYUSBDUFE &OHMJTIDIBSBDUFSJTUJDXPSET &OHMJTI +BQBOFTF

    5ZQF )FJTOPUBOBSUJTU ൴͸ܳज़ՈͰ͸ͳ͍ɻ OFHBUJPO *MJLFBQQMFT ࢲ͸ΓΜ͕͝޷͖Ͱ͢ɻ B⒏SNBUJPO "SFZPVBEPDUPS ͋ͳͨ͸ҩऀͰ͔͢ɻ RVFTUJPO 
  9. Extracted Words by LLR &OHMJTI /FHBUJPO 2VFTUJPO EP BOZ UIFSF

    IBWF UIJT EPO MPOH JU JTO EJE ZPVS NVDI IPX UJNF DBO ZFU BOZ CVU LOPX XPSSZ * BOZUIJOH JU TP BGSBJE VOEFSTUBOE XIBU FOPVHI 
  10. Extracted Words by LLR +BQBOFTF /FHBUJPO 2VFTUJPO ͔ Ͳ͜ Կ

    Ͳ͏ ͍͘Β ͸ ͍͚ͨͩ Ͳͷ Կ࣌ ͋Γ Ͱ͠ΐ ΋Β͑ ͍͔͕ ͲΜͳ ·ͤ ͳ͍ Μ ͸ ͳ͔ͬ ͋·Γ ·ͩ ͋Γ Ͱ͖ ͡Ό ͍͍͑ ͦΜͳʹ ͦΜͳ ͨ͘ 
  11. Experiments 4.55PPMLJU .PTFT 5VOJOH .JOJNVN&SSPS3BUF5SBJOJOH 1BSBMMFMDPSQVT #BTJD5SBWFM&YQSFTTJPO $PSQVT #5&$ QBJST

    5FTUTFU  TFOUFODFT JODMVEFETFOUFODFTGPSOFHBUJPO  RVFTUJPO BOEB⒏SNBUJPO %FWFMPQNFOU TFU  TFOUFODFT JOUIFTBNFXBZBT UFTUTFU 
  12. Experiments • From preliminary experimental evaluation with BLEU, the N

    is decided as 30 (LLR30). • Baseline method is no additional features. 
  13. Manual Evaluation • To verify effectiveness of translation quality when

    add the proposed features. • To verify accuracy of each modality. 8FSBOEPNMZFYUSBDUFEQBJSTUPUFTUUIF NFUIPETGPSFBDINPEBMJUZ UPUBMQBJST  
  14. Translation Quality (PPE 4 " # 4 " # $

    % #BTFMJOF /PBEEJUJPOBMGFBUVSFT       .BOVBMMZ&YUSBDUJPO       --3       "MMUIFNFUIPETIBWFUIFTBNFUSBOTMBUJPO RVBMJUZJG4 "BOE#BSFBTTVNFEHPPE USBOTMBUJPO OVNCFSPGTFOUFODFT 
  15. Accuracy of each modality "⒎ /FH 2VF #BTFMJOF  

     .BOVBMMZ&YUSBDUJPO    --3    1FSDFOUBHFPGUIFPVUQVUTQSFTFSWFEUIFNPEBMJUZPGUIFJOQVU w1SPQPTFENFUIPETJOEJDBUFEBNBSLFEJNQSPWFNFOU JOOFHBUJPONPEBMJUZ w5IFBDDVSBDZPG--3XBTCFUUFSUIBOUIFBDDVSBDZ  PGUIFCBTFMJOFJOBMMNPEBMJUJFT 
  16. Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO  8IJDIPOFTIBMMXFHPUPUIFDJSDVTBOE [PP  0 *OQVU

    2VFTUJPO  αʔΧεͱಈ෺ԂɺͲͬͪʹߦ͜͏͔ɻ #BTFMJOF -FU`THPUPUIFDJSDVTBOE UIF[PP  9 
  17. Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO  *EPO`UNJOEJGZPVDBODFMJU  9 *OQVU 2VFTUJPO

     Ωϟϯηϧͯ͠΋͔·͍·ͤΜ͔ɻ #BTFMJOF .BZ*DBODFM  0 NBTFO OFHBUJPO NBTFOLB RVFTUJPO 8FIBWFUPUSFBUXPSEDPNCJOBUJPOT 
  18. Conclusion • We proposed additional feature considering characteristic words for

    modality-preserving PBSMT. • Produced more translations preserved the modality of the input sentence than baseline without decrease of translation quality. • Automatic extraction performed the same as or better than manual extraction. 
  19. LLR

  20. Translation Example 1SPQPTFENFUIPE .BOVBMMZ&YUSBDUJPO  1MFBTFHPFBTZ 0 *OQVU "⒏SNBUJPO 

    ΍͘͞͠ଧ͍ͬͯͩ͘͞Ͷɻ 1SPQPTFENUIPE &OHMJTITJEFPOMZ  1MFBTFHPFBTZ JTO`UJU  9
  21. Calculation of LLR • Pr(D|H_indep) is the probability under the

    null hypothesis that the occurrences of a word w in the negative and affirmative sentences are independent of one another. • Pr(D|H_dep) is the case in which the occurrences are dependent. *ODBTFPGOFHBUJPO *GBXPSEUFOETUPPDDVSJOOFHBUJPO POMZ UIF--3TDPSFCFDPNFTIJHI
  22. Calculation of LLR /FHBUJPO "⒏SNBUJPO X B C B C

    X D E D E B D C E O B C D EPDDVSSFODFGSFRVFODZJOFBDIDPOEJUJPO
  23. Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et

    al., 2009] • 2 models are trained for question sentence and other sentence. • Discriminative Reranking for SMT using Various Global Features [Goh et al., 2010] • Probabilities of sentence types such as negations and questions are used. Our study focused on characteristic modality words in negations and questions. Neither of the studies discussed what expressions influence modalities.