Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[mercari GEARS 2025] Leveraging LLMs in Mercari...

Avatar for mercari mercari PRO
November 14, 2025

[mercari GEARS 2025] Leveraging LLMs in Mercari Hallo

Avatar for mercari

mercari PRO

November 14, 2025
Tweet

More Decks by mercari

Other Decks in Technology

Transcript

  1. arr0w
 
 
 Souzoh → Gen AI Team → Mercari

    Hallo
 2024 New Grad
 Machine Learning Engineer

  2. 4

  3. LLM Product Implementation in Mercari Hallo
 • The on-demand work

    domain is rich in structured/unstructured text and image data → Highly compatible with LLM
 
 • Examples of text and image data in Mercari Hallo:
 ◦ Job posting descriptions created by partners
 ◦ Sales materials and business meeting data for partners
 
 • LLM applications in Mercari Hallo:
 ◦ Easy job posting creation
 ◦ Job posting risk prediction 
 ◦ Sales Productivity Improvement

  4. • Background
 ◦ Market conditions and service usage evaluation 


    
 • What We Accomplished
 ◦ Validated AI-Native approach to on-demand work
 
 • Technical expertise and infrastructure will power future AI initiatives across Mercari Group
 Service Closing: December 18, 2025

  5. Easy Job Posting Creation (かんたん求人作成 )
 
 株式会社メルカリ Appeal Point

    
 Partner company without knowledge of job posting creation can create high-quality job posting easily.
 • Automatically generate job details simply by selecting the required fields (job category, role, benefits…)
 Partner simply selects items and inputs appeal points and job duties in free text style.

  6. • Background
 ◦ Enable all businesses to consistently create attractive

    job postings that facilitate matching and capture interest
 ◦ Reduce job posting creation costs for partners
 ◦ Limitations of template (low diversity, etc.)
 
 • Easy Job Posting Creation (かんたん求人作成 )
 ◦ Just answering selective questions (benefits, category, etc.,)
 → LLM generates original job posting content in seconds. 
 ◦ This enables easy creation of job postings personalized to each partner company.
 Easy Job Posting Creation (かんたん求人作成 )

  7. • System Architecture
 ◦ Calling OpenAI API in Real-time from

    Go Backend
 
 
 
 
 
 
 • Prompt Creation
 ◦ Leveraged few-shot prompting to create prompts that consistently generate high-quality job postings
 Implementation
 GraphQL Server (Go) Cloud SQL Partner Company Retrieve job information such as business name, work location, etc. Generate job posting draft using LLM based on obtained information
  8. • Main motivations for Easy Job Posting Creation:
 Enable creation

    of attractive "better job postings"
 → Defined job posting quality in 3 levels through interviews with sales staff and analysis of past job postings
 
 • Criteria for defining job posting quality (3 levels):
 ◦ Whether information about job content, workplace atmosphere, and expected candidate profile is included
 ◦ Whether information about benefits (dress code, meals ,etc) 
 ◦ Considerations for readability such as emojis, section titles, and other formatting elements, etc.
 Easy Job Posting Creation

  9. • LLM as a Judge [1] 
 ◦ Evaluate job

    posting quality in 3 levels using LLM as a Judge
 ◦ Prompt for “LLM as a Judge”: 
 Modified the original paper's prompt for job postings eval
 • Quality Comparison: Easy Job Posting vs Manual Creation
 
 
 
 • Limitation: 
 ◦ Validates quality consistency based on internal standards, not matching rates. 
 Experiments on Job Quality Evaluation by “LLM as a Judge”
 
 Easy Job Positing
 Manual Creation
 Score Average (100 cases)
 2.34
 1.32

  10. Job Posting Review at Mercari Hallo
 • "Safety & Security"

    (あんしん・あんぜん ) commitment: 
 Only legally compliant job postings published to ensure crew’s peace of mind.
 
 • Strict review criteria:
 ◦ Is job content appropriate? Any inappropriate expressions?
 ◦ All postings undergo rigorous review
 
 • Dual-check system: 
 Human + LLM review for faster, more accurate screening.

  11. Why LLMs Excel at Risk Prediction
 • Strong contextual understanding:


    ◦ High performance in various NLP tasks like text classification.
 
 • High explainability:
 ◦ Can output reasons for high risk in natural language.
 
 • Rapid adaptation to new risks
 ◦ No labeled training data required
 ◦ Simply update prompts—fast & flexible

  12. Job Posting Risk Prediction Using LLMs
 • Many tech companies

    have already adopted LLM-based risk prediction:
 ◦ Spotify: Music content fraud detection
 ◦ Google: Ad content fraud detection
 
 • Mercari Hallo's approach:
 ◦ Real-time risk prediction at job posting creation
 ◦ Tech Stack: GKE, Cloud Pub/Sub, LLM API 
 ◦ High-risk items flagged for human review first 
 → Faster, more efficient screening

  13. • The balancing act:
 ◦ False Negatives (misses):
 Major business

    risk → Recall is the top priority KPI
 ◦ False Positives (false alarms): 
 Poor user experience → Precision must be maintained
 
 • Key point: 
 Prompt quality management is critical for effective risk prediction
 Challenges in Risk Prediction Using LLMs

  14. • Multiple LLM features live in production at Mercari Hallo


    → Managing 50+ types of prompts!
 
 • Prompt quality = Direct business impact 😱
 → Quality management & continuous improvement system
 
 • Solution:
 ◦ Prompt quality monitoring system
 ◦ Automated evaluation infrastructure
 → Ensuring consistent prompt quality
 PromptOps at Mercari Hallo

  15. Automated Prompt Evaluation Infrastructure
 • Tech Stack: 
 ◦ Cloud

    Composer, LiteLLM, BigQuery … etc
 • PdMs & ML Engineers can easily validate prompt quality by inputting prompts through the Airflow UI without writing code
 

  16. Automated Prompt Evaluation Infrastructure
 • Prompt Evaluation Job (Python) executes:


    Loads evaluation dataset from GCS & Runs prompt evaluation
 • Eval Job uses LiteLLM to enable easy validation of multiple LLMs.

  17. • Upon completion: 
 ◦ Evaluation results → Slack notification


    ◦ Input/output for each case → Stored in BigQuery
 Automated Prompt Evaluation Infrastructure
 

  18. Continuous Monitoring & Improvement Cycle
 • Continuous monitoring:
 ◦ Looker

    Studio dashboards to track key KPIs
 ◦ Weekly cross-functional reviews (PdM/Engineer/Biz)
 
 • PDCA cycle for improvement:
 ◦ Identify degraded prompts via dashboard
 → Tune using automated evaluation infrastructure
 
 • Cost management:
 ◦ SRE-led LLM API cost monitoring

  19. • Mercari Hallo (Mar 2024 - Dec 2025) 
 ◦

    Served 12M+ users with LLM-powered features
 ◦ Easy Job Posting Creation – Quality postings for everyone
 ◦ Job Posting Risk Prediction – Safety & security
 
 • Key Learnings from Building an AI-Native Product
 → PromptOps infrastructure is critical for AI-Native products
 
 • These experiences and infrastructure live on
 → Contributing to AI adoption across Mercari Group
 Conclusion