Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2013 GALA Miami conference

2013 GALA Miami conference

The Latin American market is composed of a mix of various Spanish dialects. If a company really wants to reach a specific audience in Latin America, it must use the right dialect. But how is it possible to translate marketing materials into four or five Spanish dialects without dramatically increasing costs? This session will discuss how a joint effort to create an MT engine for translating international Spanish into specific Latin American dialects (Spanish for Argentina, Chile, Columbia, Mexico, and Puerto Rico) made this challenge feasible, economical, and replicable.

More Decks by tauyou <language technology>

Other Decks in Technology

Transcript

  1. An MT Case Study: Breaking into Latin American Markets on

    a Small Budget María Azqueta (SeproTec) & Diego Bartolomé (tauyou)
  2. Spanish Worldwide Spanish Language: • Also known as Castellano. •

    Latin-derived Romance language. • Spanish is one of the six official languages of the United Nations and an official language of the European Union.
  3. Spanish Worldwide 0 200 400 600 800 1000 1200 Mandarin

    Chinese Spanish English Hindi/Urdu 407 million 311 million 955 million 360 million Second most spoken language by number of native speakers
  4. Spanish Worldwide • For demographic reasons, the percentage of the

    orld’s populatio that speaks Spa ish as a ati e language is increasing, while the percentage of Chinese and English speakers is decreasing. • Withi three or four ge eratio s, % of the orld’s population will communicate in Spanish. • I 5 , the U ited States ill e the orld’s foremost Spanish speaking country.
  5. Spanish on the Internet • Spanish is the third most

    widely used language on the Net. • The use of Spanish on the Net has experienced a growth rate of 807.4% between 2000 and 2011. • Spain and Mexico are among the 20 countries with the highest number of internet users. • The demand for documents in Spanish is the fourth largest fro a o g the orld’s la guages.
  6. Spanish Worldwide and its Differences High demand for translations into

    Spanish. But… is the same Spanish spoken everywhere?
  7. Spanish Worldwide and its Differences RAE (Royal Spanish Academy) :

    – Created in the 18th century, it is widely seen as the arbiter of what is considered standard Spanish. – It produces authoritative dictionaries and grammar guides. – Although its decisions are not formally binding, they are widely followed in both Spain and Latin America.
  8. Why Adapt to the Local Spanish of Each Country? To

    reach different markets People are most likely to buy when a product is advertised in their dialect
  9. Why Adapt to the Local Spanish of Each Country? EN:

    Take a card from the deck ES: Coge una carta de la baraja Client A (Gaming Industry)
  10. Why Adapt to the Local Spanish of Each Country? ES:

    Coge una carta de la baraja AR: Agarrá una carta del mazo CL: Toma una carta del naipe CO: Coge una carta de la baraja MX: Saca una carta de la baraja PR: Coge una carta de la baraja
  11. Coger (32 entries) http://rae.es/rae.html 1.tr. Asir, agarrar o tomar. U.

    t. c. prnl. 31. intr. vulg. Am. Realizar el acto sexual Why Adapt to the Local Spanish of Each Country?
  12. Advise Clients If you really want to break into a

    specific market, you must decide which country you want to target and localize your material for the different Spanish dialects spoken in each individual country.
  13. tauyou MT Solution at SeproTec Hybrid machine translation since January

    2011 La guages: EN, ES, PT, GA, FR, IT… Do ai s: Legal, Te h i al… Glossaries and forbidden words lists Average translated words per month: 700,000
  14. Final Scope of the Project Human translation + revision English

    > Spanish (Spain) MT of Spanish (Spain) into Spanish from: • Argentina • Chile • Colombia • Mexico • Puerto Rico
  15. Initial Approach for Latin American MT Traditional Workflow . Gather

    tra slatio e ories (EN → ES-XX) 2. Add generic material 3. Develop engine 4. Add linguistic pre- and post-processing 5. Improve quality over time
  16. Drawbacks Varying MT Quality Depending on the domain and dialect

    Initial Inconsistencies among Dialects Handled with glossaries Medium Post-Editing Effort Could be improved over time
  17. New Approach Translate EN to Standard ES Via standard high-quality

    human translation Convert Standard ES to Latin American Variants From Spanish to Spanish Better final quality is achieved
  18. Specifications Countries Argentina, Chile, Colombia, Mexico, Puerto Rico Internal Glossaries

    to Handle Lexical Variations It corrects discordance Idioms Grammatical Differences It adapts verb tenses
  19. Testing the Prototype Engine Extraction of several texts (fashion, real-

    estate, human resources, automobile) Sent to linguists and/or translators in each target country for localization Performance of the same localizations by the engine Comparison and contrasting of human and machine localization results
  20. First Bug Report Not all terms were localized Concordance issues

    (masc./fem.; sing./pl.) Verbal tenses for Argentina Human vs. Machine MT: 7.78 % error rate
  21. First Bug Report Some terms were changed/localized by the engine,

    but not by the humans. (example) Human error or MT error?
  22. Testing the Prototype Engine A glossary was created by extracting

    the terms localized by the linguists/translators. This glossary was then sent to the same people who localized the texts to verify that all the terms were correctly localized and nothing was missing.
  23. Testing the Prototype Engine People can miss things. Although many

    different variants of Spanish exist, Spanish speakers understand many terms that are foreign to their own dialect when they read them in context, sometimes to the point of accepting them as their own. I believe that this may be due to the phenomenon of globalization and the internet.
  24. Conclusions Human localization is not perfect. MT is not perfect

    either. Combining human and machine translation helps achieve high quality and reduce cost.
  25. Further Work Improving Glossaries Through a simple web interface for

    PE Extending Spanish Language Coverage More dialects Traductor.cervantes.es Incorporating more languages English, French and Portuguese
  26. Bibliography Yule, G. (2006). The Study of Language: Third Edition,

    Cambridge University New York. RAE Instituto Cervantes http://www.linguapress.com