Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2012 GALA annual conference

2012 GALA annual conference

Presentation by tauyou at the annual GALA conference, in the event organized by TAUS for the MosesCore project.

More Decks by tauyou <language technology>

Other Decks in Technology

Transcript

  1. © 2012 #1
    friendly machine translation
    Diego Bartolomé, CEO

    View full-size slide

  2. © 2012 #2
    outline
    before starting with machine translation
    what happens when you go live
    how to minimize the risks
    practical hints + some numbers

    View full-size slide

  3. © 2012 #3
    is machine translation for us?

    translation memories open-source corpora
    previous documents documentation alignment
    websites of clients public information
    language-specific rules programming of rules
    TAUS data terminology extraction

    minimum amount of data
    need for data classification
    language pairs

    View full-size slide

  4. © 2012 #4
    for sure it is!

    translation tables and language models
    data and parameters for tuning
    test measures

    several + pruning afterwards

    by professional translators

    new files, new corpora, new rules, etc.

    View full-size slide

  5. © 2012 #5
    the production process (I)
    statistical MT decoding
    convert
    file format
    segment
    text
    NLP
    tasks
    tokenize
    rewrite
    source
    lowercase

    View full-size slide

  6. © 2012 #6
    the production process (II)
    statistical MT decoding
    translated
    file
    reformat detokenize
    rewrite
    target
    uppercase
    evaluate

    View full-size slide

  7. © 2012 #7
    risk minimization

    quality metrics computation

    time and cost analysis
    +
    track the evolution over time

    View full-size slide

  8. © 2012 #8
    practical hints
    bigger clients
    languages
    with highest translation volumes
    with similar structure
    with specific terminology/needs
    MT-friendly translators
    start moving

    View full-size slide

  9. © 2012 #9
    some numbers
    more than 1,500 million words per month
    in latin languages ES, FR, PT, CA, GA, IT, RO
    EN as source or target is the star
    ES, FR, DE, PT, IT, DA, SV, ZH, AR, JP...
    LSPs are translating +3 million words per month
    investment pays off if you translate
    +50,000 words per month

    View full-size slide

  10. © 2012 #10
    Thanks!
    // Diego Bartolomé, PhD
    C/ Les Planes 39 – 08201 Sabadell – Spain
    +34 93 711 29 96
    +34 670 331 225
    [email protected]
    tauyou.com

    View full-size slide