[mercari GEARS 2025] Leveraging LLMs in Mercari Hallo

Leveraging LLMs in   Mercari Hallo  arr0w  Machine Learning 

arr0w      Souzoh → Gen AI Team → Mercari
Hallo  2024 New Grad  Machine Learning Engineer 

About Mercari Hallo  (March 2024 - December 2025) 

Concept of Mercari Hallo  だれでも、すぐに、かんたんに Partner Crew

12M+ Users in 15 Months Since Launch 

LLM Product Implementation in Mercari Hallo  • The on-demand work
domain is rich in structured/unstructured text and image data → Highly compatible with LLM    • Examples of text and image data in Mercari Hallo:  ◦ Job posting descriptions created by partners  ◦ Sales materials and business meeting data for partners    • LLM applications in Mercari Hallo:  ◦ Easy job posting creation  ◦ Job posting risk prediction   ◦ Sales Productivity Improvement 

• Background  ◦ Market conditions and service usage evaluation  
  • What We Accomplished  ◦ Validated AI-Native approach to on-demand work    • Technical expertise and infrastructure will power future AI initiatives across Mercari Group  Service Closing: December 18, 2025 

Easy Job Posting Creation 

Easy Job Posting Creation (かんたん求人作成 )    株式会社メルカリ Appeal Point
  Partner company without knowledge of job posting creation can create high-quality job posting easily.  • Automatically generate job details simply by selecting the required ﬁelds (job category, role, beneﬁts…)  Partner simply selects items and inputs appeal points and job duties in free text style. 

• Background  ◦ Enable all businesses to consistently create attractive
job postings that facilitate matching and capture interest  ◦ Reduce job posting creation costs for partners  ◦ Limitations of template (low diversity, etc.)    • Easy Job Posting Creation (かんたん求人作成 )  ◦ Just answering selective questions (beneﬁts, category, etc.,)  → LLM generates original job posting content in seconds.   ◦ This enables easy creation of job postings personalized to each partner company.  Easy Job Posting Creation (かんたん求人作成 ) 

• System Architecture  ◦ Calling OpenAI API in Real-time from
Go Backend              • Prompt Creation  ◦ Leveraged few-shot prompting to create prompts that consistently generate high-quality job postings  Implementation  GraphQL Server (Go) Cloud SQL Partner Company Retrieve job information such as business name, work location, etc. Generate job posting draft using LLM based on obtained information

• Main motivations for Easy Job Posting Creation:  Enable creation
of attractive "better job postings"  → Defined job posting quality in 3 levels through interviews with sales staff and analysis of past job postings    • Criteria for defining job posting quality (3 levels):  ◦ Whether information about job content, workplace atmosphere, and expected candidate profile is included  ◦ Whether information about benefits (dress code, meals ,etc)   ◦ Considerations for readability such as emojis, section titles, and other formatting elements, etc.  Easy Job Posting Creation 

• LLM as a Judge [1]   ◦ Evaluate job
posting quality in 3 levels using LLM as a Judge  ◦ Prompt for “LLM as a Judge”:   Modiﬁed the original paper's prompt for job postings eval  • Quality Comparison: Easy Job Posting vs Manual Creation        • Limitation:   ◦ Validates quality consistency based on internal standards, not matching rates.   Experiments on Job Quality Evaluation by “LLM as a Judge”    Easy Job Positing  Manual Creation  Score Average (100 cases)  2.34  1.32 

Job posting risk prediction 

Job Posting Review at Mercari Hallo  • "Safety & Security"
(あんしん・あんぜん ) commitment:   Only legally compliant job postings published to ensure crew’s peace of mind.    • Strict review criteria:  ◦ Is job content appropriate? Any inappropriate expressions?  ◦ All postings undergo rigorous review    • Dual-check system:   Human + LLM review for faster, more accurate screening. 

Why LLMs Excel at Risk Prediction  • Strong contextual understanding: 
◦ High performance in various NLP tasks like text classiﬁcation.    • High explainability:  ◦ Can output reasons for high risk in natural language.    • Rapid adaptation to new risks  ◦ No labeled training data required  ◦ Simply update prompts—fast & ﬂexible 

Job Posting Risk Prediction Using LLMs  • Many tech companies
have already adopted LLM-based risk prediction:  ◦ Spotify: Music content fraud detection  ◦ Google: Ad content fraud detection    • Mercari Hallo's approach:  ◦ Real-time risk prediction at job posting creation  ◦ Tech Stack: GKE, Cloud Pub/Sub, LLM API   ◦ High-risk items flagged for human review first   → Faster, more efficient screening 

• The balancing act:  ◦ False Negatives (misses):  Major business
risk → Recall is the top priority KPI  ◦ False Positives (false alarms):   Poor user experience → Precision must be maintained    • Key point:   Prompt quality management is critical for eﬀective risk prediction  Challenges in Risk Prediction Using LLMs 

PromptOps Strategy Powering LLM Features 

• Multiple LLM features live in production at Mercari Hallo 
→ Managing 50+ types of prompts!    • Prompt quality = Direct business impact 😱  → Quality management & continuous improvement system    • Solution:  ◦ Prompt quality monitoring system  ◦ Automated evaluation infrastructure  → Ensuring consistent prompt quality  PromptOps at Mercari Hallo 

Automated Prompt Evaluation Infrastructure  • Tech Stack:   ◦ Cloud
Composer, LiteLLM, BigQuery … etc  • PdMs & ML Engineers can easily validate prompt quality by inputting prompts through the Airﬂow UI without writing code   

Automated Prompt Evaluation Infrastructure    • Submit prompts for validation
through Cloud Composer (Apache Airﬂow)  

Automated Prompt Evaluation Infrastructure  • Prompt Evaluation Job (Python) executes: 
Loads evaluation dataset from GCS & Runs prompt evaluation  • Eval Job uses LiteLLM to enable easy validation of multiple LLMs. 

• Upon completion:   ◦ Evaluation results → Slack notiﬁcation 
◦ Input/output for each case → Stored in BigQuery  Automated Prompt Evaluation Infrastructure   

Automated Prompt Evaluation Infrastructure    • Datasets are regularly updated
using Cloud Composer as well. 

Continuous Monitoring & Improvement Cycle  • Continuous monitoring:  ◦ Looker
Studio dashboards to track key KPIs  ◦ Weekly cross-functional reviews (PdM/Engineer/Biz)    • PDCA cycle for improvement:  ◦ Identify degraded prompts via dashboard  → Tune using automated evaluation infrastructure    • Cost management:  ◦ SRE-led LLM API cost monitoring 

Conclusion 

• Mercari Hallo (Mar 2024 - Dec 2025)   ◦
Served 12M+ users with LLM-powered features  ◦ Easy Job Posting Creation – Quality postings for everyone  ◦ Job Posting Risk Prediction – Safety & security    • Key Learnings from Building an AI-Native Product  → PromptOps infrastructure is critical for AI-Native products    • These experiences and infrastructure live on  → Contributing to AI adoption across Mercari Group  Conclusion 

Thank You! 

[mercari GEARS 2025] Leveraging LLMs in Mercari...

[mercari GEARS 2025] Leveraging LLMs in Mercari Hallo

mercari PRO

More Decks by mercari

Other Decks in Technology

Featured

Transcript

Leveraging LLMs in   Mercari Hallo  arr0w  Machine Learning

arr0w      Souzoh → Gen AI Team → Mercari

About Mercari Hallo  (March 2024 - December 2025)

4

Concept of Mercari Hallo  だれでも、すぐに、かんたんに Partner Crew

12M+ Users in 15 Months Since Launch

LLM Product Implementation in Mercari Hallo  • The on-demand work

• Background  ◦ Market conditions and service usage evaluation

Easy Job Posting Creation

Easy Job Posting Creation (かんたん求人作成 )    株式会社メルカリ Appeal Point

• Background  ◦ Enable all businesses to consistently create attractive

• System Architecture  ◦ Calling OpenAI API in Real-time from

• Main motivations for Easy Job Posting Creation:  Enable creation

• LLM as a Judge [1]   ◦ Evaluate job

Job posting risk prediction

Job Posting Review at Mercari Hallo  • "Safety & Security"

Why LLMs Excel at Risk Prediction  • Strong contextual understanding:

Job Posting Risk Prediction Using LLMs  • Many tech companies

• The balancing act:  ◦ False Negatives (misses):  Major business

PromptOps Strategy Powering LLM Features

• Multiple LLM features live in production at Mercari Hallo

Automated Prompt Evaluation Infrastructure  • Tech Stack:   ◦ Cloud

Automated Prompt Evaluation Infrastructure    • Submit prompts for validation

Automated Prompt Evaluation Infrastructure  • Prompt Evaluation Job (Python) executes:

• Upon completion:   ◦ Evaluation results → Slack notiﬁcation

Automated Prompt Evaluation Infrastructure    • Datasets are regularly updated

Continuous Monitoring & Improvement Cycle  • Continuous monitoring:  ◦ Looker

Conclusion

• Mercari Hallo (Mar 2024 - Dec 2025)   ◦

Thank You!