Upgrade to Pro — share decks privately, control downloads, hide ads and more …

03 LL4IR Architecture

LiLa'16
March 20, 2016

03 LL4IR Architecture

LiLa'16

March 20, 2016
Tweet

More Decks by LiLa'16

Other Decks in Research

Transcript

  1. Anne Schuth (Blendle / University of Amsterdam, The Netherlands) Krisztian

    Balog (University of Stavanger, Norway) Tutorial at ECIR 2016 in Padua, Italy LL4IR Architecture
  2. Overview • Overall goal: make information retrieval evaluation more realistic

    • Evaluate retrieval methods in a live setting with real users in their natural task environments • Focus: medium to large sized organizations with fair amount of search volume • Typically lack their own R&D department, but would gain much from improved approaches • Or, would like to collaborate with academic researchers
  3. Key idea • Focus on frequent (head) queries • Enough

    traffic on them (both real-time and historical) • Ranked result lists can be generated offline • An API orchestrates all data exchange between 
 live sites and experimental systems • Head First: Living Labs for Ad-hoc Search Evaluation. Balog et al. CIKM’14.
  4. Methodology • Queries, candidate documents, historical search and click data

    made available • Rankings are generated for each query and uploaded through an API • When any of the test queries is fired, the live site request rankings from the API and interleaves them with that of the production system • Participants get detailed feedback on user interactions (clicks) • Ultimate measure is the number of “wins” against the production system
  5. Methodology Participant Participant Site Site Living Labs API Researcher query

    ranking query ranking ranking Queries Documents click click click Queries Documents
  6. Limitations • Head queries only: Considerable portion of traffic, but

    only popular info needs • Lack of context: No knowledge of the searcher’s location, previous searches, etc. • No real-time feedback: API provides detailed feedback, but it’s not immediate • Limited control: Experimentation is limited to single searches, where results are interleaved with those of the production system; no control over the entire result list • Ultimate measure of success: Search is only a means to an end, it is not the ultimate goal
  7. Evaluation • Train queries • ‘Immediate’ feedback • Raw and

    aggregated feedback • Test queries • No updates during test period • Feedback after test period • Only Aggregated feedback • Metric: Team Draft Interleaving • Fraction of wins against production
  8. Documents Documents Sets of frequent Queries and candidates Documents Period

    Train Test Query Type Train Test - Feedback Available - Individual Feedback - Update possible - Feedback Available - No Individual Feedback - Update possible - No Feedback Available - No Individual Feedback - Update not possible ch Test/Train Queries/Periods
  9. Resources • API source code • https://bitbucket.org/living-labs/ll-api • Documentation •

    http://doc.living-labs.net/en/latest/ • Dashboard • CLEF: http://dashboard.living-labs.net/ • TREC: http://dashboard.trec-open-search.org/