Slide 25
Slide 25 text
© ideas engineering 2023 25
1. Classify User query types
2. Manually list down a few hundred user prompts and expected correct answers → Test Set
3. Classify response types
4. Use the existing system to answer the test set
5. Use Judge LLM(s) to match generated answers with expected answers
6. Get a score
HeyBild eval
I d e a s R e v i e w