Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Auto-evaluation of ranking model by LLM

Auto-evaluation of ranking model by LLM

Discover an innovative Python framework for automating offline search engine ranking evaluation using LLM. By enabling search engineers and data scientists to perform offline evaluations rapidly, our method minimizes human bias, accelerates evaluation, and optimizes ranking algorithms swiftly, leading to improved search performance and more relevant user results.

LegalOn Technologies, Inc

October 30, 2023
Tweet

More Decks by LegalOn Technologies, Inc

Transcript

  1. Who am I - Machine Learning Engineer for Search and

    GenAI at LegalOn Technologies - Our product: Help users review their contract and identify potential risks - My job: Help our users retrieve contracts or articles - Build better ranking models for legal documents
  2. Process of building ranking model - Ideas (read papers) -

    Build a PoC model - Benchmark x 100 - On a set of static datasets - Offline evaluation x 5 - Summit queries on production dataset - Ask domain experts which model is better (baseline model or new model) - Implementation - Online evaluation - A/B testing - Multi-arm bandit, or - Reinforcement learning - …
  3. Cost of evaluation - Benchmark - CICD or MLOps -

    Few minutes to 1 hour - Offline evaluation - Web interface or Spreadsheet - Few hours to days (even a week) - Online evaluation - A/B testing - Two weeks to a month - Multi-arm bandit, or - Few weeks for exploration - Reinforcement learning - Few weeks for warming up
  4. Cost of evaluation - Benchmark - CICD or MLOps -

    Few minutes to 1 hour - Offline evaluation - Web interface or Spreadsheet - Few hours to days (even a week) - Online evaluation - A/B testing - Two weeks to a month - Multi-arm bandit, or - Few weeks for exploration - Reinforcement learning - Few weeks for warming up
  5. Offline evaluation - For n queries - Top k results

    of model A - Top k results of model B - Your colleague: - Score of model A: 5 - Score of model B: 10 - Aggregate and calculate the win rate - Model A: 50% - Tie: 10% - Model B: 40% - Model B is the new model - Go back and change the parameters
  6. LLM evaluation - Top k results of model A -

    Top k results of model B - LLM: - Score of model A: 5 - Score of model B: 10 - Calculate the win rate - Next step: - Integrate it into your MLOps - Repeat the evaluation 100 time with different hyper-parameters