Slide 1

Slide 1 text

Evaluation of ranking model Petrie Wong

Slide 2

Slide 2 text

Who am I - Machine Learning Engineer for Search and GenAI at LegalOn Technologies - Our product: Help users review their contract and identify potential risks - My job: Help our users retrieve contracts or articles - Build better ranking models for legal documents

Slide 3

Slide 3 text

Process of building ranking model - Ideas (read papers) - Build a PoC model - Benchmark x 100 - On a set of static datasets - Offline evaluation x 5 - Summit queries on production dataset - Ask domain experts which model is better (baseline model or new model) - Implementation - Online evaluation - A/B testing - Multi-arm bandit, or - Reinforcement learning - …

Slide 4

Slide 4 text

Cost of evaluation - Benchmark - CICD or MLOps - Few minutes to 1 hour - Offline evaluation - Web interface or Spreadsheet - Few hours to days (even a week) - Online evaluation - A/B testing - Two weeks to a month - Multi-arm bandit, or - Few weeks for exploration - Reinforcement learning - Few weeks for warming up

Slide 5

Slide 5 text

Cost of evaluation - Benchmark - CICD or MLOps - Few minutes to 1 hour - Offline evaluation - Web interface or Spreadsheet - Few hours to days (even a week) - Online evaluation - A/B testing - Two weeks to a month - Multi-arm bandit, or - Few weeks for exploration - Reinforcement learning - Few weeks for warming up

Slide 6

Slide 6 text

Offline evaluation - For n queries - Top k results of model A - Top k results of model B - Your colleague: - Score of model A: 5 - Score of model B: 10 - Aggregate and calculate the win rate - Model A: 50% - Tie: 10% - Model B: 40% - Model B is the new model - Go back and change the parameters

Slide 7

Slide 7 text

Solution?

Slide 8

Slide 8 text

Ask your colleague work harder!

Slide 9

Slide 9 text

import openai

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

LLM evaluation - Top k results of model A - Top k results of model B - LLM: - Score of model A: 5 - Score of model B: 10 - Calculate the win rate - Next step: - Integrate it into your MLOps - Repeat the evaluation 100 time with different hyper-parameters

Slide 12

Slide 12 text

Pay the bill

Slide 13

Slide 13 text

We're hiring! Join our Team! https://recruit.legalontech.jp/engineer/