Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2013 GALA Miami conference

2013 GALA Miami conference

The Latin American market is composed of a mix of various Spanish dialects. If a company really wants to reach a specific audience in Latin America, it must use the right dialect. But how is it possible to translate marketing materials into four or five Spanish dialects without dramatically increasing costs? This session will discuss how a joint effort to create an MT engine for translating international Spanish into specific Latin American dialects (Spanish for Argentina, Chile, Columbia, Mexico, and Puerto Rico) made this challenge feasible, economical, and replicable.

More Decks by tauyou <language technology>

Other Decks in Technology

Transcript

  1. An MT Case Study:
    Breaking into Latin American Markets
    on a Small Budget
    María Azqueta (SeproTec) & Diego Bartolomé (tauyou)

    View full-size slide

  2. Spanish Worldwide
    Spanish Language:
    • Also known as Castellano.
    • Latin-derived Romance language.
    • Spanish is one of the six official languages of
    the United Nations and an official language of
    the European Union.

    View full-size slide

  3. Spanish Worldwide

    View full-size slide

  4. Spanish Worldwide
    0 200 400 600 800 1000 1200
    Mandarin Chinese
    Spanish
    English
    Hindi/Urdu
    407 million
    311 million
    955 million
    360 million
    Second most spoken language by number of native speakers

    View full-size slide

  5. Spanish Worldwide
    • For demographic reasons, the percentage of the
    orld’s populatio that speaks Spa ish as a ati e
    language is increasing, while the percentage of
    Chinese and English speakers is decreasing.
    • Withi three or four ge eratio s, % of the orld’s
    population will communicate in Spanish.
    • I 5 , the U ited States ill e the orld’s
    foremost Spanish speaking country.

    View full-size slide

  6. Spanish on the Internet
    • Spanish is the third most widely used language on
    the Net.
    • The use of Spanish on the Net has experienced a
    growth rate of 807.4% between 2000 and 2011.
    • Spain and Mexico are among the 20 countries with
    the highest number of internet users.
    • The demand for documents in Spanish is the fourth
    largest fro a o g the orld’s la guages.

    View full-size slide

  7. Spanish Worldwide and its Differences
    High demand for translations into Spanish.
    But… is the same Spanish spoken
    everywhere?

    View full-size slide

  8. Spanish Worldwide and its Differences
    RAE (Royal Spanish Academy) :
    – Created in the 18th century, it is widely seen as
    the arbiter of what is considered standard
    Spanish.
    – It produces authoritative dictionaries and
    grammar guides.
    – Although its decisions are not formally binding,
    they are widely followed in both Spain and Latin
    America.

    View full-size slide

  9. Spanish Worldwide and its Differences
    Lexical
    variations
    Grammatical
    differences
    Idioms
    Different dialects and many differences:

    View full-size slide

  10. Spanish Worldwide and its Differences
    ‘Neutral’ or
    ‘International’
    Spanish
    Latin American
    Spanish &
    European
    Spanish
    Market Trend:

    View full-size slide

  11. Why Adapt to the
    Local Spanish of Each Country?
    To reach different markets
    People are most likely to buy when a product is
    advertised in their dialect

    View full-size slide

  12. Why Adapt to the
    Local Spanish of Each Country?
    EN: Take a card from the deck
    ES: Coge una carta de la baraja
    Client A (Gaming Industry)

    View full-size slide

  13. Why Adapt to the
    Local Spanish of Each Country?
    ES: Coge una carta de la baraja
    AR: Agarrá una carta del mazo
    CL: Toma una carta del naipe
    CO: Coge una carta de la baraja
    MX: Saca una carta de la baraja
    PR: Coge una carta de la baraja

    View full-size slide

  14. Coger (32 entries)
    http://rae.es/rae.html
    1.tr. Asir, agarrar o tomar. U. t. c. prnl.
    31. intr. vulg. Am. Realizar el acto sexual
    Why Adapt to the
    Local Spanish of Each Country?

    View full-size slide

  15. Advise Clients
    If you really want to break into a specific
    market, you must decide which country
    you want to target and localize your
    material for the different Spanish dialects
    spoken in each individual country.

    View full-size slide

  16. The Main Problems Clients Face

    View full-size slide

  17. Is there a cost-efficient solution
    on the market?

    View full-size slide

  18. tauyou MT Solution at SeproTec
    Hybrid machine translation since January 2011
    La guages: EN, ES, PT, GA, FR, IT…
    Do ai s: Legal, Te h i al…
    Glossaries and forbidden words lists
    Average translated words per month: 700,000

    View full-size slide

  19. Initial Brainstorming
    MT from
    EN > different ES dialects
    Extensive post-editing
    would be required

    View full-size slide

  20. Final Scope of the Project
    Human translation + revision
    English > Spanish (Spain)
    MT of Spanish (Spain) into
    Spanish from:
    • Argentina
    • Chile
    • Colombia
    • Mexico
    • Puerto Rico

    View full-size slide

  21. Initial Approach for Latin American MT
    Traditional Workflow
    . Gather tra slatio e ories (EN → ES-XX)
    2. Add generic material
    3. Develop engine
    4. Add linguistic pre- and post-processing
    5. Improve quality over time

    View full-size slide

  22. Drawbacks
    Varying MT Quality
    Depending on the domain and dialect
    Initial Inconsistencies among Dialects
    Handled with glossaries
    Medium Post-Editing Effort
    Could be improved over time

    View full-size slide

  23. New Approach
    Translate EN to Standard ES
    Via standard high-quality human translation
    Convert Standard ES to Latin American Variants
    From Spanish to Spanish
    Better final quality is achieved

    View full-size slide

  24. Specifications
    Countries
    Argentina, Chile, Colombia, Mexico, Puerto Rico
    Internal Glossaries to Handle Lexical Variations
    It corrects discordance
    Idioms
    Grammatical Differences
    It adapts verb tenses

    View full-size slide

  25. Testing the Prototype Engine
    Extraction of several texts (fashion, real-
    estate, human resources, automobile)
    Sent to linguists and/or translators in
    each target country for localization
    Performance of the same localizations
    by the engine
    Comparison and contrasting of human
    and machine localization results

    View full-size slide

  26. First Bug Report
    Not all terms
    were localized
    Concordance
    issues
    (masc./fem.;
    sing./pl.)
    Verbal tenses
    for Argentina
    Human vs. Machine
    MT: 7.78 % error rate

    View full-size slide

  27. First Bug Report
    Some terms were changed/localized by the
    engine, but not by the humans.
    (example)
    Human error or MT error?

    View full-size slide

  28. Testing the Prototype Engine
    A glossary was created by
    extracting the terms localized by the
    linguists/translators.
    This glossary was then sent to
    the same people who localized
    the texts to verify that all the
    terms were correctly localized
    and nothing was missing.

    View full-size slide

  29. Testing the Prototype Engine
    The glossary grew by 36.91%!

    View full-size slide

  30. Testing the Prototype Engine
    People can miss things.
    Although many different variants of Spanish
    exist, Spanish speakers understand many
    terms that are foreign to their own dialect
    when they read them in context,
    sometimes to the point of accepting them
    as their own. I believe that this may be
    due to the phenomenon of globalization
    and the internet.

    View full-size slide

  31. Latest Bug Report
    MT: 1.21% error rate

    View full-size slide

  32. Achievements
    Very little post-editing needed
    Reduced error rate
    Shortened deadlines
    Significant cost reduction

    View full-size slide

  33. Conclusions
    Human localization is not perfect.
    MT is not perfect either.
    Combining human and machine translation
    helps achieve high quality and reduce cost.

    View full-size slide

  34. Further Work
    Improving Glossaries
    Through a simple web interface for PE
    Extending Spanish Language Coverage
    More dialects
    Traductor.cervantes.es
    Incorporating more languages
    English, French and Portuguese

    View full-size slide

  35. Bibliography
    Yule, G. (2006). The Study of Language: Third
    Edition, Cambridge University New York.
    RAE
    Instituto Cervantes
    http://www.linguapress.com

    View full-size slide

  36. THANK YOU FOR
    YOUR TIME!
    María Azqueta
    [email protected]
    Diego Bartolomé
    [email protected]

    View full-size slide