Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2013 ALC Conference

2013 ALC Conference

10 Decisions to make before starting to use Machine Translation (MT).

Details on how to improve MT engines.

More Decks by tauyou <language technology>

Other Decks in Technology

Transcript

  1. Your Trained Moses SMT
    System doesn't work.
    What can you do?
    Diego Bartolome, CEO tauyou
    [email protected]
    @diegobartolome

    View full-size slide

  2. Where are you now?

    View full-size slide

  3. Where are you now?

    View full-size slide

  4. Why Machine Translation?
    Strategic decision
    Increase sales
    Shorten delivery times
    Reduce costs
    Differentiation
    Forced decision
    Clients ask for it!

    View full-size slide

  5. Dare – change
    5

    View full-size slide

  6. Welcome to the jungle

    View full-size slide

  7. Decision 1: Internal – external
    Core competence
    Resources
    ROI
    Time to market

    View full-size slide

  8. Decision 1: Internal – external
    Core competence
    Resources
    ROI
    Time to market

    View full-size slide

  9. MT Costs
    Internal development
    Free tools
    DOiY solutions
    Traditional pricing model
    tauyou managed solution

    View full-size slide

  10. Decision 2: MT Type (I)
    Rule-based MT
    Statistical MT
    Hybrid MT

    View full-size slide

  11. Decision 2: MT Type (II)
    Do we really care?

    View full-size slide

  12. Decision 3: Languages (I)
    Source: translate.autodesk.com

    View full-size slide

  13. Decision 3: Languages (II)
    Source: Philipp Koehn

    View full-size slide

  14. Decision 4: Domains
    Who is willing to pay?
    Where does your revenue come from?
    What are your key skills?
    What domains achieve good quality?

    View full-size slide

  15. Decision 5: Workflow
    Use MT as a secondary TM
    Bilingual pre-translated translation files
    CAT tool integration
    Differentiated workflow

    View full-size slide

  16. Decision 6: Feedback
    Qualitative
    Use updated TMs in new trainings
    Immediate (incremental) retraining
    Rule-based automatic post-editing
    Selective pre- and/or post-processing

    View full-size slide

  17. Decision 7: Post-editors
    What are the skills needed?
    Post-editing guidelines
    How do we pay them?

    View full-size slide

  18. Decision 8: Metrics
    SMT metrics: BLEU, NIST
    Feedback from translators
    Translation time vs. Post-editing time
    Word Error Rate (WER) or Edit Distance
    Cost reduction

    View full-size slide

  19. Decision 9: Business Model

    View full-size slide

  20. Decision 10: Start!

    View full-size slide

  21. Let's play with Moses

    View full-size slide

  22. Let's play with Moses
    Best resource to start
    www.statmt.org/moses
    TAUS tutorial
    www.translationautomation.com
    tauyou slides
    www.speakerdeck.com/tauyoucom

    View full-size slide

  23. Everything is clear!
    Gather TMs and other linguistic assets
    Select domains
    Train systems
    BLEU score is great
    … but …
    Translation quality is awful

    View full-size slide

  24. Why?
    Not enough data
    Too much data
    Unclean TMs
    Misalignments
    Difficult language pairs
    Selection of wrong parameters
    Suboptimal techniques

    View full-size slide

  25. Some steps
    Maximum exploitation of existing assets
    Source content optimization
    Data selection and cleaning
    Improvement of the models
    Linguistic processing
    Continuous improvement

    View full-size slide

  26. Linguistic assets
    Translation memory sharing
    Clients, Partners, EU, UN, TAUS
    Relevant on-line data retrieval
    Advanced TM techniques
    Sub-segment matching
    Parts of Speech replacement

    View full-size slide

  27. Source optimization (I)
    Spell check
    Grammar check
    Style check
    Terminology check
    Client checklist
    new
    doc
    proposed
    doc
    + html
    report

    View full-size slide

  28. Summarization
    % to reduce
    Use translation memories
    Project
    Client
    All
    new
    doc
    proposed
    doc
    + html
    report

    View full-size slide

  29. Data selection + cleaning
    Clean translation memories
    Length, punctuation, terminology, …
    Inconsistencies, repetitions, ...
    Segment splitting
    Optimize weight of most frequent n-grams
    Validate their translations
    Add out-of-domain data

    View full-size slide

  30. Models optimization
    Filter the translation tables
    Remove the garbage + tune weights
    Optimize language models
    Adapt them to the translation purpose
    Tune parameters correctly
    Tune set, test set, optimization parameters
    Improve recasing

    View full-size slide

  31. Linguistic processing
    In the source and/or target language
    Grammar checking
    Entities detection
    Proper nouns, alphanumeric words, ...
    Compound words splitting
    Sentence reordering

    View full-size slide

  32. Life is about the people you meet and
    the things you create with them.
    So go out and start creating
    Part of the Holstee Manifesto
    Diego Bartolome
    CEO tauyou
    [email protected]
    @diegobartolome

    View full-size slide

  33. Thank you!
    Diego Bartolome
    CEO tauyou
    [email protected]
    @diegobartolome

    View full-size slide