Slide 21
Slide 21 text
Compare: 複数のRunを比較して評価
from ranx import Qrels, Run, evaluate, compare
qrels = Qrels.from_file("qrels.trec", kind="trec")
run_1 = Run.from_file("run_openai-text-embedding-ada-002.trec", kind="trec")
run_2 = Run.from_file("run_openai-text-embedding-3-large.trec", kind="trec")
report = compare(qrels, runs=[run_1, run_2], metrics=["hit_rate@5", "mrr"], max_p=0.01)
report
# Model Hit Rate@5 MRR
--- ----------------------------- ------------ ------
a openai-text-embedding-ada-002 0.850 0.734
b openai-text-embedding-3-large 0.906ᵃ 0.814ᵃ
複数のRunを渡す