Slide 33
Slide 33 text
LLMs
Are Flawed.
Evaluation Matters.
No One Evaluates LLMs Well.
Evaluating with Ground Truth is Hard.
Evaluating with Annotators is Hard.
Evaluating with LLMs is Hard.
Evaluating with User Preferences is Hard.
All We Can Do is Our Best.
yo dawg i heard yu like
LLMs & eval,
so i put eval LLMs
in yr LLM eval