Presentation by CPSL and tauyou at the tekom annual conference. It provides the case of a successful implementation of machine translation in a mid-size Language Service Provider (LSP).
-Ochoa (CPSL) Ochoa (CPSL) Co Co- -speaker: Diego speaker: Diego Bartolom Bartolomé é ( (tauyou tauyou <language technology>) <language technology>) Implementation of a Machine Implementation of a Machine Translation Engine at CPSL Translation Engine at CPSL
a Multilingual Service Provider since 1963 Headquarters in Barcelona-Spain Other Offices in: Madrid-Spain Germany UK CPSL staff includes over 50 people Belén García-Ochoa
tauyou provides language technologies for the localization industry since 2006 Main clients: medium-sized LSPs Headquarters in Barcelona Diego Bartolomé
A list list of of the the most most appropiate appropiate clients clients for for using using the the engine engine was was created created Based Based on on this this list list, , we we established established the the Different Different subject subject matters matters And And the the Different Different language language combinations combinations
translation translation The The standard standard words words that that a a translator translator can do can do per per day day is is 2,500 2,500. . The The standard standard words words that that a a reviewer reviewer of of human human translation translation can do can do per per day day is is 12,000. 12,000. An An average average of of the the words words that that can be can be post post- -edited edited per per day day is is 8,000. 8,000.
is continuously customized engine that is continuously customized Corpus Corpus- -based with rules for pre based with rules for pre- - and and post post- -processing processing Data confidentiality is guaranteed Data confidentiality is guaranteed Translation speed Translation speed The tauyou solution The tauyou solution
translation solution train the translation solution enrich solution with related text enrich solution with related text terminology priorization terminology priorization update the translation solution update the translation solution add rules to enhance quality add rules to enhance quality weekly updates weekly updates Optimum domain creation Optimum domain creation
clients data add generic texts to provide a good sample add generic texts to provide a good sample train the translation solution train the translation solution add rules to enhance quality add rules to enhance quality periodical improvement periodical improvement CPSL workflow 2 CPSL workflow 2
user defined unaligned translated documents unaligned translated documents generic translations generic translations optimum corpus/memories creation optimum corpus/memories creation rule rule- -based extension/filtering based extension/filtering Other use cases Other use cases
Several customized parameters, including word Several customized parameters, including word error rate, number of word edits, tag differences, etc error rate, number of word edits, tag differences, etc Useful in machine translation but also in normal Useful in machine translation but also in normal quality process quality process Quality metrics Quality metrics
Customized according to position in the sentence, Customized according to position in the sentence, word type, number of words, etc word type, number of words, etc Feed the MT engine or tool for human translator Feed the MT engine or tool for human translator Terminology extraction Terminology extraction
Automatic domain classification Automatic domain classification Source text enhancement Source text enhancement spelling, grammar, structure, terminology ... spelling, grammar, structure, terminology ... Special words detection Special words detection New domains/language pairs creation New domains/language pairs creation The future The future