Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing (11) Machine Transl...

Natural Language Processing (11) Machine Translation (1)

自然言語処理研究室

November 29, 2013
Tweet

More Decks by 自然言語処理研究室

Other Decks in Education

Transcript

  1. 1 1 / 21 Natural Language Processing (11) Machine translation

    (1) Kazuhide Yamamoto Dept. of Electrical Engineering Nagaoka University of Technology
  2. 2 / 21 Translation is difficult (1) This train will

    take you to Tokyo. His words surprised me. This medicine will cure your cold. 彼は雨に降られた。 「私の部屋に来てください」「すぐ行きます」
  3. 4 / 21 Translation is difficult (3) • Translation is

    not synthesis of constituents. – Not just like math; (3+5)*(9-7)=8*2=16 – Hot (熱い)+ water (水) = hot water (熱い水?) • It depends on context and/or situation. – 「はい」 • It also depends on one's cultural/social background. – Not like image / audio processing.
  4. 5 / 21 Machine Translation : Some terms • machine

    translation, MT – also called automatic translation • source language – the input language of the translator • target language – the output language of the translator
  5. 6 / 21 MT methods • syntactic transfer method –

    syntactic structure of the source language is transferred into the target language. • semantic transfer method – semantic representation of the source language is transferred into the target language. • interlingua method – interlingua, language-independent semantic representation, is defined and all of the input expressions in any source language is once converted into the interlingua, and then the target language is generated based on it.
  6. 7 / 21 The MT pyramid input sentence parsing semantic

    analysis interlingua syntactic transfer semantic transfer (direct transfer) language universalization output sentence
  7. 8 / 21 Direct translation • Direct translation method replaces

    words in the source sentences into its corresponding target language without analysis. • It may work well in similar language pairs, such as in Spanish and Portuguese, or in Malay and Indonesian. Although it is regarded as obsolete approach, it is receiving more attention again. (explained next time)
  8. 9 / 21 MT based on syntactic transfer • Machine

    translation based on syntactic transformation parses input sentences and change its structure according to some transformation rules, before replacing words. • Most commonly used in commercial-based MT systems.
  9. 10 / 21 Syntactic transfer: example He(S) likes(V) banana(O) 彼(S)

    好きだ(V) バナナ(O) Transfer Rule: S V O S O V ⇒ (は) (が)
  10. 11 / 21 MT based on semantic transfer • Machine

    translation based on semantic transformation is partially introduced into commercial MT system. – e.g., whether forecast • Full semantic analysis of the input sentences is still a difficult problems that should be solved. • Partial semantic analysis, particularly word sense disambiguation, is realized also in a commercial system.
  11. 12 / 21 Semantic transfer: example He(S) likes(V) banana(O) 彼はバナナが好きだ

    Predicate: like (好きだ) Agent: he (彼) Theme: banana (バナナ) Generation Rule: Agent は Theme が Predicate Analysis: S Agent, V Theme if V is “like” ⇒ ⇒
  12. 13 / 21 Transfer-based method: problem Number of language pairs

    are combinatorially exploded; EU Official language: (Main foreign language) • Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, (Russian, not official), Slovak, Slovene, Spanish and Swedish EU Semi-official: • Catalan, Galician, Basque.
  13. 14 / 21 It's time for Interlingua! • Interlingual translation

    is a method to translate twice; – source to interlingua; – and interlingua to target. • In this sense it seems NOT to be efficient, but it is indeed efficient in case number of language pairs is increased.
  14. 16 / 21 But what actually is interlingua? English, Esperanto,

    and an original semantic representation are the candidates of interlingua. • English – largest number of speakers in the world. Easy to develop a system since many people can use it. Fair in a sense. • Esperanto – most famous artificial language. Little exception in grammar rules thus easy to be implemented. • any semantic representation – we can define whatever we hope.
  15. 17 / 21 Interlingua is ... • (As far as

    I know) semantic representation is mostly used as interlingua. • There are also attempts to use English as interlingua. • However, all of those are still experimental. – impossible? unrealized dream? – European people can not abandon its attempt since they have serious problems in language communication.
  16. 18 / 21 Design of interlingua Designing interlingua is not

    easy; it depends on culture and many others. • no vagueness – direction (north/south), number, season, etc. • different granuality – Japanese: 氷/水/湯, English: water and ice, Malay: air • looks similar but different – English: hip/waist, Japanese: 腰 • culture-dependent – こたつ、浴衣、畳
  17. 19 / 21 Design of interlingua (cont'd) • Designing complete

    interlingua is considered to be difficult – or hopeless • since it has to includes any concept in any language • even if a language has so minute concept.
  18. 20 / 21 Practical solution of interlingua As mentioned it

    is hard to design an interlingua in general, but it is still useful in a limited domain that is independent to one's culture. • in some specific tasks we don't need to consider cultural difference so that we can design a language-independent representation. – flight reservation, sightseeing guide, negotiation in business, and so on. • in a restricted region (such as European countries) many concepts are regarded as same or similar, that may enable designing common concepts.
  19. 21 / 21 Summary: today's key words • types of

    machine translation • difficulties of MT • interlingua and its problem