[mercari GEARS 2025] The Journey of User-Generated Content Translation

The Journey of User-Generated Content Translation  How Mercari Crosses Borders
(and Uses LLMs)  aymeric  Mercari / Engineering Manager 

Aymeric      Aymeric joined Mercari in 2021 and has
been leading Mercari's international expansion since 2023. Prior to joining Mercari, he worked in the automotive and defense industries.  Engineering Manager 

What I'll Talk About  • Static content vs. user generated
content  • Understand the product and users  • Classic translation models vs. LLMs  • DeepL  • ChatGPT  • Gemini  • Scaling LLMs  • Non-AI features 

Static Content vs. User Generated Content 

Static content vs. user generated content    User Generated Content 
• Product title and description  • User proﬁle  • Comments on products  • Buyer/Seller reviews  • Search queries 

Understand the Product and Users 

Understand the product and users      • B2C: Translate
once, sell many times  • C2C: Translate once, sell once → Larger impact on cost  • From Japanese to multiple other languages  • When to translate? When listed? When visited? On-demand?  • And when the content is updated?  • Some users try to game the search system  • The marketing also has requirements    • We chose a hybrid approach 

Demo 

Caching  translations 

Classic Translation Models vs. LLMs 

• Classic translation services: DeepL / Google Translate  ◦ Pay-as-you-go
based on input characters  ◦ High rate limits  ◦ Low latency  ◦ Additional features such as glossaries  ◦ Consistent results    Classic translation models vs. LLMs                  • Large Language Models (LLMs)  ◦ Pay-as-you-go or reserved capacity, based on input / output tokens  ◦ Stricter rate limits  ◦ Inconsistent latency  ◦ No glossary  ◦ Inconsistent results 

Models 

DeepL 

• We started around the time LLMs became common (~GPT3) 
• High quality Japanese translation  • Similar price point as LLMs at the time  • High rate limits, low latency, safer choice          • Price point: 100 units  • A/B test results: +5.3% Buyer Conversion Rate  DeepL       

ChatGPT 

      • GPT-3.5 Turbo-0125  ◦ Price point: 70
units  • Large Languages models got cheaper  • Opportunity to learn about LLMs and run them in production  ChatGPT                  • GPT-4o mini  ◦ Price point: 10 units                • A/B tests results: no impact 

The prompt        • Counts against input token
cost  • Start simple, improve later  * Original text will be delimited by ###\ * Original text is in Japanese\ * Your task is to translate it to Traditional Chinese ### <the product’s title or description>

The prompt       

Gemini 

• Motivated by engineering maintenance eﬀort  ◦ Mercari mainly uses
GCP, not Microsoft Azure  • Same price point  • Gemini 1.5 Flash    • Motivated by engineering maintenance eﬀort  ◦ Mercari mainly uses GCP, not Microsoft Azure  • Same price point  • Gemini 1.5 Flash    • Price point: 1 unit 🎉  • A/B test result: no impact  Gemini       

• Model now get deprecated  • 1.5 Flash 001 to
002  • No need to A/B test, it's the same model    Gemini       

Gemini       

• 2.0 Flash Lite  • Latency?    • Price point:
1.2 units    Gemini       

• Gemini 2.0 Flash Lite: "Here is the translation:"   

• Time to change the prompt    Gemini     
  You are a Japanese-to-English translation API. 1. **Task:** Translate the content of the user's <xb-text> tag. 2. **Output:** Your entire response MUST be the result, wrapped in <xb-text> tags. Add no other text. <content of title or description>

Scaling LLMs 

Scaling LLMs        • LLMs had restrictive pay-as-you-go
rate limits  • Traﬃc is uneven 

Scaling LLMs        • Model providers oﬀer pre-paid
reserved computing resources  • Microsoft Azure Provisioned Throughput Units (PTU)  • Google Generative AI Scaling Units (GSU) 

Non-AI Features 

Non-AI features: Evaluating user experience        • Collect
and review reports, and improve 

Non-AI features: Glossary        • Why do you
need it? カビゴン → Kabigon or Snorlax?  • It's complicated  • Tokenize  • English: Replace in text then translate  • Traditional Chinese: Provide keywords in the prompt 

Takeaways and Future Work 

Takeaways and future work        • Takeaways  ◦
Start from the user experience  ◦ Start simple, iterate, and A/B test  ◦ Newer models don't impact business metrics  ◦ Monitor new models and expiry dates 

• Future work  ◦ Translate more user-generated content  ◦ Improve
quality of translations  ◦ Reduce latency  ◦ Improve Search in diﬀerent languages  Takeaways and future work       

Thank You!  Credits for the work to   Amit Raj
Baral and Christophe Labonne.  Read more details in this article   

[mercari GEARS 2025] The Journey of User-Genera...

[mercari GEARS 2025] The Journey of User-Generated Content Translation

More Decks by mercari

Other Decks in Technology

Featured

Transcript