Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2011 tekom conference

2011 tekom conference

Presentation by CPSL and tauyou at the tekom annual conference. It provides the case of a successful implementation of machine translation in a mid-size Language Service Provider (LSP).

More Decks by tauyou <language technology>

Other Decks in Technology

Transcript

  1. Speaker: Speaker: Bel Belé én n Garc Garcí ía a-

    -Ochoa (CPSL) Ochoa (CPSL) Co Co- -speaker: Diego speaker: Diego Bartolom Bartolomé é ( (tauyou tauyou <language technology>) <language technology>) Implementation of a Machine Implementation of a Machine Translation Engine at CPSL Translation Engine at CPSL
  2. The The speaker speaker Localization Director at CPSL CPSL is

    a Multilingual Service Provider since 1963 Headquarters in Barcelona-Spain Other Offices in: Madrid-Spain Germany UK CPSL staff includes over 50 people Belén García-Ochoa
  3. The The co co- -speaker speaker CEO tauyou <language technology>

    tauyou provides language technologies for the localization industry since 2006 Main clients: medium-sized LSPs Headquarters in Barcelona Diego Bartolomé
  4. CPSL and Machine Translation Post-editing services provided to a software

    company for a huge project Lots of translated words in a tight timeframe
  5. Main Main difficulties difficulties found found Lots Lots of of

    clients clients Different Different subject subject matters matters Different Different language language combinations combinations
  6. Workaround Workaround Lots Lots of of clients clients: : A

    A list list of of the the most most appropiate appropiate clients clients for for using using the the engine engine was was created created Based Based on on this this list list, , we we established established the the Different Different subject subject matters matters And And the the Different Different language language combinations combinations
  7. Human Human post post- -editing editing vs. vs. human human

    translation translation The The standard standard words words that that a a translator translator can do can do per per day day is is 2,500 2,500. . The The standard standard words words that that a a reviewer reviewer of of human human translation translation can do can do per per day day is is 12,000. 12,000. An An average average of of the the words words that that can be can be post post- -edited edited per per day day is is 8,000. 8,000.
  8. Dedicated hybrid machine translation Dedicated hybrid machine translation engine that

    is continuously customized engine that is continuously customized Corpus Corpus- -based with rules for pre based with rules for pre- - and and post post- -processing processing Data confidentiality is guaranteed Data confidentiality is guaranteed Translation speed Translation speed The tauyou solution The tauyou solution
  9. Any type of document Any type of document Glossary priorization

    Glossary priorization Fast domain creation/update Fast domain creation/update Fully customizable Fully customizable Quality metrics computation Quality metrics computation Terminology extraction Terminology extraction Main characteristics Main characteristics
  10. gather in gather in- -domain data domain data train the

    translation solution train the translation solution enrich solution with related text enrich solution with related text terminology priorization terminology priorization update the translation solution update the translation solution add rules to enhance quality add rules to enhance quality weekly updates weekly updates Optimum domain creation Optimum domain creation
  11. Optimize translation quality for a client Optimize translation quality for

    a client gather client data gather client data train the translation solution train the translation solution add rules to enhance quality add rules to enhance quality continuous improvement continuous improvement CPSL workflow 1 CPSL workflow 1
  12. General purpose translator General purpose translator gather clients data gather

    clients data add generic texts to provide a good sample add generic texts to provide a good sample train the translation solution train the translation solution add rules to enhance quality add rules to enhance quality periodical improvement periodical improvement CPSL workflow 2 CPSL workflow 2
  13. Data creation and enhancement Data creation and enhancement user defined

    user defined unaligned translated documents unaligned translated documents generic translations generic translations optimum corpus/memories creation optimum corpus/memories creation rule rule- -based extension/filtering based extension/filtering Other use cases Other use cases
  14. Detailed analysis of translated documents Detailed analysis of translated documents

    Several customized parameters, including word Several customized parameters, including word error rate, number of word edits, tag differences, etc error rate, number of word edits, tag differences, etc Useful in machine translation but also in normal Useful in machine translation but also in normal quality process quality process Quality metrics Quality metrics
  15. Unilingual and bilingual terminology lists Unilingual and bilingual terminology lists

    Customized according to position in the sentence, Customized according to position in the sentence, word type, number of words, etc word type, number of words, etc Feed the MT engine or tool for human translator Feed the MT engine or tool for human translator Terminology extraction Terminology extraction
  16. Increase usage of translation memories Increase usage of translation memories

    Automatic domain classification Automatic domain classification Source text enhancement Source text enhancement spelling, grammar, structure, terminology ... spelling, grammar, structure, terminology ... Special words detection Special words detection New domains/language pairs creation New domains/language pairs creation The future The future