2013 ALC Conference

2013 ALC Conference

10 Decisions to make before starting to use Machine Translation (MT).

Details on how to improve MT engines.

Transcript

  1. Your Trained Moses SMT System doesn't work. What can you

    do? Diego Bartolome, CEO tauyou <language technology> diego.bartolome@tauyou.com @diegobartolome
  2. Where are you now?

  3. Where are you now?

  4. Why Machine Translation? Strategic decision Increase sales Shorten delivery times

    Reduce costs Differentiation Forced decision Clients ask for it!
  5. Dare – change 5

  6. Welcome to the jungle

  7. Decision 1: Internal – external Core competence Resources ROI Time

    to market
  8. Decision 1: Internal – external Core competence Resources ROI Time

    to market
  9. MT Costs Internal development Free tools DOiY solutions Traditional pricing

    model tauyou managed solution
  10. Decision 2: MT Type (I) Rule-based MT Statistical MT Hybrid

    MT
  11. Decision 2: MT Type (II) Do we really care?

  12. Decision 3: Languages (I) Source: translate.autodesk.com

  13. Decision 3: Languages (II) Source: Philipp Koehn

  14. Decision 4: Domains Who is willing to pay? Where does

    your revenue come from? What are your key skills? What domains achieve good quality?
  15. Decision 5: Workflow Use MT as a secondary TM Bilingual

    pre-translated translation files CAT tool integration Differentiated workflow
  16. Decision 6: Feedback Qualitative Use updated TMs in new trainings

    Immediate (incremental) retraining Rule-based automatic post-editing Selective pre- and/or post-processing
  17. Decision 7: Post-editors What are the skills needed? Post-editing guidelines

    How do we pay them?
  18. Decision 8: Metrics SMT metrics: BLEU, NIST Feedback from translators

    Translation time vs. Post-editing time Word Error Rate (WER) or Edit Distance Cost reduction
  19. Decision 9: Business Model

  20. None
  21. Decision 10: Start!

  22. Let's play with Moses

  23. Let's play with Moses Best resource to start www.statmt.org/moses TAUS

    tutorial www.translationautomation.com tauyou slides www.speakerdeck.com/tauyoucom
  24. Everything is clear! Gather TMs and other linguistic assets Select

    domains Train systems BLEU score is great … but … Translation quality is awful
  25. Why? Not enough data Too much data Unclean TMs Misalignments

    Difficult language pairs Selection of wrong parameters Suboptimal techniques
  26. None
  27. Some steps Maximum exploitation of existing assets Source content optimization

    Data selection and cleaning Improvement of the models Linguistic processing Continuous improvement
  28. Linguistic assets Translation memory sharing Clients, Partners, EU, UN, TAUS

    Relevant on-line data retrieval Advanced TM techniques Sub-segment matching Parts of Speech replacement
  29. Source optimization (I) Spell check Grammar check Style check Terminology

    check Client checklist new doc proposed doc + html report
  30. Summarization % to reduce Use translation memories Project Client All

    new doc proposed doc + html report
  31. Data selection + cleaning Clean translation memories Length, punctuation, terminology,

    … Inconsistencies, repetitions, ... Segment splitting Optimize weight of most frequent n-grams Validate their translations Add out-of-domain data
  32. Models optimization Filter the translation tables Remove the garbage +

    tune weights Optimize language models Adapt them to the translation purpose Tune parameters correctly Tune set, test set, optimization parameters Improve recasing
  33. Linguistic processing In the source and/or target language Grammar checking

    Entities detection Proper nouns, alphanumeric words, ... Compound words splitting Sentence reordering
  34. Life is about the people you meet and the things

    you create with them. So go out and start creating Part of the Holstee Manifesto Diego Bartolome CEO tauyou <language technology> diego.bartolome@tauyou.com @diegobartolome
  35. Thank you! Diego Bartolome CEO tauyou <language technology> diego.bartolome@tauyou.com @diegobartolome