Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2013 ALC Conference

2013 ALC Conference

10 Decisions to make before starting to use Machine Translation (MT).

Details on how to improve MT engines.

More Decks by tauyou <language technology>

Other Decks in Technology


  1. Your Trained Moses SMT System doesn't work. What can you

    do? Diego Bartolome, CEO tauyou <language technology> [email protected] @diegobartolome
  2. Why Machine Translation? Strategic decision Increase sales Shorten delivery times

    Reduce costs Differentiation Forced decision Clients ask for it!
  3. Decision 4: Domains Who is willing to pay? Where does

    your revenue come from? What are your key skills? What domains achieve good quality?
  4. Decision 5: Workflow Use MT as a secondary TM Bilingual

    pre-translated translation files CAT tool integration Differentiated workflow
  5. Decision 6: Feedback Qualitative Use updated TMs in new trainings

    Immediate (incremental) retraining Rule-based automatic post-editing Selective pre- and/or post-processing
  6. Decision 8: Metrics SMT metrics: BLEU, NIST Feedback from translators

    Translation time vs. Post-editing time Word Error Rate (WER) or Edit Distance Cost reduction
  7. Let's play with Moses Best resource to start www.statmt.org/moses TAUS

    tutorial www.translationautomation.com tauyou slides www.speakerdeck.com/tauyoucom
  8. Everything is clear! Gather TMs and other linguistic assets Select

    domains Train systems BLEU score is great … but … Translation quality is awful
  9. Why? Not enough data Too much data Unclean TMs Misalignments

    Difficult language pairs Selection of wrong parameters Suboptimal techniques
  10. Some steps Maximum exploitation of existing assets Source content optimization

    Data selection and cleaning Improvement of the models Linguistic processing Continuous improvement
  11. Linguistic assets Translation memory sharing Clients, Partners, EU, UN, TAUS

    Relevant on-line data retrieval Advanced TM techniques Sub-segment matching Parts of Speech replacement
  12. Source optimization (I) Spell check Grammar check Style check Terminology

    check Client checklist new doc proposed doc + html report
  13. Data selection + cleaning Clean translation memories Length, punctuation, terminology,

    … Inconsistencies, repetitions, ... Segment splitting Optimize weight of most frequent n-grams Validate their translations Add out-of-domain data
  14. Models optimization Filter the translation tables Remove the garbage +

    tune weights Optimize language models Adapt them to the translation purpose Tune parameters correctly Tune set, test set, optimization parameters Improve recasing
  15. Linguistic processing In the source and/or target language Grammar checking

    Entities detection Proper nouns, alphanumeric words, ... Compound words splitting Sentence reordering
  16. Life is about the people you meet and the things

    you create with them. So go out and start creating Part of the Holstee Manifesto Diego Bartolome CEO tauyou <language technology> [email protected] @diegobartolome