Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Information Retrieval Performance Measurement Using Extrapolated Precision (DataPhilly)

Clustify
October 26, 2016
95

Information Retrieval Performance Measurement Using Extrapolated Precision (DataPhilly)

Performance measures like the F1-score make strong assumptions about the trade-off between recall and precision that are not a good fit for some contexts like e-discovery. This presentation advocates comparing performance at different recall levels, when necessary, by using a novel method for extrapolating a single precision-recall point to a different level of recall. With this approach, the constant-performance contours are a parameterized family of reference precision-recall curves.

Clustify

October 26, 2016
Tweet

Transcript

  1. Information Retrieval Performance Measurement Using Extrapolated Precision or How to

    Extrapolate From One Data Point Bill Dimm Data Philly October 25, 2016
  2. Precision's Relationship with Cost • Precision is meaningful – inversely

    proportional to number of docs to review: n = ρNR/P
  3. Extrapolation Limitations • P < 0.99 • R < 0.99

    • P >= 2ρ / (1 + ρ + R*(1 - ρ))
  4. Summary • Proportionality dictates recall – Need performance measure less

    sensitive to recall • Extrapolate precision-recall point to target recall level using model curves • Model precision-recall curves are constant performance contours • When close to target recall, performance measure inversely proportional to review cost