Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2012 GALA annual conference

2012 GALA annual conference

Presentation by tauyou at the annual GALA conference, in the event organized by TAUS for the MosesCore project.

More Decks by tauyou <language technology>

Other Decks in Technology


  1. © 2012 #2 outline before starting with machine translation what

    happens when you go live how to minimize the risks practical hints + some numbers
  2. © 2012 #3 is machine translation for us? <LSP> <tauyou>

    translation memories open-source corpora previous documents documentation alignment websites of clients public information language-specific rules programming of rules TAUS data terminology extraction <some issues> minimum amount of data need for data classification language pairs
  3. © 2012 #4 for sure it is! <data cleaning +

    selection> translation tables and language models data and parameters for tuning test measures <engines creation> several + pruning afterwards <engine validation> by professional translators <continuous improvement> new files, new corpora, new rules, etc.
  4. © 2012 #5 the production process (I) statistical MT decoding

    convert file format segment text NLP tasks tokenize rewrite source lowercase
  5. © 2012 #6 the production process (II) statistical MT decoding

    translated file reformat detokenize rewrite target uppercase evaluate
  6. © 2012 #7 risk minimization <tauyou> quality metrics computation <LSP>

    time and cost analysis <LSP> + <tauyou> track the evolution over time
  7. © 2012 #8 practical hints bigger clients languages with highest

    translation volumes with similar structure with specific terminology/needs MT-friendly translators start moving
  8. © 2012 #9 some numbers more than 1,500 million words

    per month in latin languages ES, FR, PT, CA, GA, IT, RO EN as source or target is the star ES, FR, DE, PT, IT, DA, SV, ZH, AR, JP... LSPs are translating +3 million words per month investment pays off if you translate +50,000 words per month
  9. © 2012 #10 Thanks! // Diego Bartolomé, PhD <address> C/

    Les Planes 39 – 08201 Sabadell – Spain <phone> +34 93 711 29 96 <cell> +34 670 331 225 <email> [email protected] <www> tauyou.com