03 LL4IR Architecture

Anne Schuth (Blendle / University of Amsterdam, The Netherlands) Krisztian
Balog (University of Stavanger, Norway) Tutorial at ECIR 2016 in Padua, Italy LL4IR Architecture

Overview • Overall goal: make information retrieval evaluation more realistic
• Evaluate retrieval methods in a live setting with real users in their natural task environments • Focus: medium to large sized organizations with fair amount of search volume • Typically lack their own R&D department, but would gain much from improved approaches • Or, would like to collaborate with academic researchers

Key idea • Focus on frequent (head) queries • Enough
trafﬁc on them (both real-time and historical) • Ranked result lists can be generated ofﬂine • An API orchestrates all data exchange between   live sites and experimental systems • Head First: Living Labs for Ad-hoc Search Evaluation. Balog et al. CIKM’14.

Methodology • Queries, candidate documents, historical search and click data
made available • Rankings are generated for each query and uploaded through an API • When any of the test queries is ﬁred, the live site request rankings from the API and interleaves them with that of the production system • Participants get detailed feedback on user interactions (clicks) • Ultimate measure is the number of “wins” against the production system

Methodology Participant Participant Site Site Living Labs API Researcher query
ranking query ranking ranking Queries Documents click click click Queries Documents

Limitations • Head queries only: Considerable portion of trafﬁc, but
only popular info needs • Lack of context: No knowledge of the searcher’s location, previous searches, etc. • No real-time feedback: API provides detailed feedback, but it’s not immediate • Limited control: Experimentation is limited to single searches, where results are interleaved with those of the production system; no control over the entire result list • Ultimate measure of success: Search is only a means to an end, it is not the ultimate goal

Further details in the paper

Evaluation • Train queries • ‘Immediate’ feedback • Raw and
aggregated feedback • Test queries • No updates during test period • Feedback after test period • Only Aggregated feedback • Metric: Team Draft Interleaving • Fraction of wins against production

Documents Documents Sets of frequent Queries and candidates Documents Period
Train Test Query Type Train Test - Feedback Available - Individual Feedback - Update possible - Feedback Available - No Individual Feedback - Update possible - No Feedback Available - No Individual Feedback - Update not possible ch Test/Train Queries/Periods

Resources • API source code • https://bitbucket.org/living-labs/ll-api • Documentation •
http://doc.living-labs.net/en/latest/ • Dashboard • CLEF: http://dashboard.living-labs.net/ • TREC: http://dashboard.trec-open-search.org/

please report issues here! Open Source https://bitbucket.org/living-labs/ll-api

Documentation

Guide for CLEF

Guide for TREC

03 LL4IR Architecture

03 LL4IR Architecture

LiLa'16

More Decks by LiLa'16

Other Decks in Research

Featured

Transcript

Anne Schuth (Blendle / University of Amsterdam, The Netherlands) Krisztian

Overview • Overall goal: make information retrieval evaluation more realistic

Key idea • Focus on frequent (head) queries • Enough

Methodology • Queries, candidate documents, historical search and click data

Methodology Participant Participant Site Site Living Labs API Researcher query

Limitations • Head queries only: Considerable portion of trafﬁc, but

Further details in the paper

Evaluation • Train queries • ‘Immediate’ feedback • Raw and

Documents Documents Sets of frequent Queries and candidates Documents Period

Resources • API source code • https://bitbucket.org/living-labs/ll-api • Documentation •

please report issues here! Open Source https://bitbucket.org/living-labs/ll-api

Documentation

Guide for CLEF

Guide for TREC