Unsupervised Context Retrieval for Long-tail Entities

Unsupervised Context Retrieval for Long-tail Entities Darío Garigliotti, Dyaa Albakour,
Miguel Martinez, and Krisztian Balog IAI, University of Stavanger, Norway + Signal AI, UK The 2019 ACM SIGIR International Conference on the Theory of Information Retrieval Santa Clara, CA

Motivation • Monitoring entities in media streams often relies on
rich entity representations, like structured information available in a knowledge base. • Long-tail entities are hard to monitor, due to their limited, if not entirely missing, representation in the reference knowledge base. Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Problem Statement Unsupervised Context Retrieval for Long-tail Entities - ICTIR
2019 • Given a (long-tail) entity e • And a set of contexts (here, sentences) • Context retrieval problem: to rank each context c in the set according to how likely e is actually mentioned in c

2019 • Example: context retrieval for the entity Isai (an investment fund) "Capital ﬁrm Isai just raised a new $175 million fund." "S.J. Surya's Isai begins with a curious disclaimer." Isai (Investment fund) Isai (Movie)

2019 Isai (Investment fund) Isai (Movie) "Capital ﬁrm Isai just raised a new $175 million fund." "S.J. Surya's Isai begins with a curious disclaimer."

2019 Context Retrieval e = Isai "Capital ﬁrm Isai just raised a new $175 million fund.", "S.J. Surya's Isai begins with a curious disclaimer.", C = { … } "Capital ﬁrm Isai just raised a new $175 million fund." "S.J. Surya's Isai begins with a curious disclaimer." Isai (Investment fund) Isai (Movie)

Approach Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019
Context Retrieval e = Isai "Capital ﬁrm Isai just raised a new $175 million fund.", "S.J. Surya's Isai begins with a curious disclaimer.", C = { … }

Approach Support Entity Ranking: importance of a support entity e~
for e Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019 Context Retrieval SER ~ e 1 = Dorm Room Fund ~ e 2 = Fundica ~ e 3 = Venture Partners … e = Isai "Capital ﬁrm Isai just raised a new $175 million fund.", "S.J. Surya's Isai begins with a curious disclaimer.", C = { … }

Approach Support Context Ranking: importance of a support context c~
for a support entity e~ Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019 Context Retrieval SCR SER ~ e 1 = Dorm Room Fund ~ e 2 = Fundica ~ e 3 = Venture Partners … … ~ C 2 = { "Fundica held the ﬁnals of their Roadshow.", … } ~ C 3 = { c 3,1 , c 3,2 , … } ~ ~ ~ C 1 = { c 1,1 , c 1,2 , … } ~ ~ e = Isai "Capital ﬁrm Isai just raised a new $175 million fund.", "S.J. Surya's Isai begins with a curious disclaimer.", C = { … }

Approach Context-to-Context Ranking: importance of a support context c~ for
c, given that an alias of e is mentioned in c Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019 Context Retrieval SCR SER ~ e 1 = Dorm Room Fund ~ e 2 = Fundica ~ e 3 = Venture Partners … … ~ C 2 = { "Fundica held the ﬁnals of their Roadshow.", … } ~ C 3 = { c 3,1 , c 3,2 , … } ~ ~ ~ C 1 = { c 1,1 , c 1,2 , … } ~ ~ e = Isai "Capital ﬁrm Isai just raised a new $175 million fund.", "S.J. Surya's Isai begins with a curious disclaimer.", C = { … } CCR

Approach Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019
Context Retrieval SCR SER ~ e 1 = Dorm Room Fund ~ e 2 = Fundica ~ e 3 = Venture Partners … … ~ C 2 = { "Fundica held the ﬁnals of their Roadshow.", … } ~ C 3 = { c 3,1 , c 3,2 , … } ~ ~ ~ C 1 = { c 1,1 , c 1,2 , … } ~ ~ e = Isai "Capital ﬁrm Isai just raised a new $175 million fund.", "S.J. Surya's Isai begins with a curious disclaimer.", C = { … } CCR

Approach • Our framework enables to estimate P(c|e), the probability
that the alias mentioned in a context c refers to the long-tail entity e Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019 P(c|e) = X ˜ e X ˜ c P(c|e,˜ c)P(˜ c|˜ e) P(˜ e|e) CCR SCR SER

Component Estimators • Component: Support Entity Ranking - Basic: BM25
(k1=1.2, b=0.8), using the description of e as a query over the opening_text ﬁeld of each Wikipedia article in an index - Pop: the basic score is multiplied by the popularity of the support entity - Types: from basic, entities not having common types with e are removed Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Component Estimators • Component: Context-to-Context Ranking - Retrieval score (BM25)
of c with c~ as a query over a context index - Semantic (cosine) similarity between the term- averaged word2vec vectors for c and c~ Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Test Collection • 165 long-tail entities, 73 out of which
did not have a corresponding Wikipedia article by 2018-10-01 • For each entity e, 5k- contexts (from a proprietary collection of news articles) with an alias of e are ranked with each combination of estimators • Top 20 contexts per ranking are pooled, leading to 4,536 contexts annotated with binary relevance Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Experimental Results • RQ: What is the best way to
estimate each component in our approach? - Basic SER setting outperforms its pop and types variants, with high significance, in terms of MAP and MRR, for each CCR setting - The semantic definition leads to a more robust CCR • RQ: How does our approach perform for context retrieval? - Our method outperforms the baseline [1] with high significance • RQ: How does it perform for entities with and without a corresponding representation in Wikipedia? - It outperforms the baseline [1] in both subsets, and in particular is robust for the long-tail entities [1] Roi Blanco and Hugo Zaragoza. 2010. Finding Support Sentences for Entities. In Proc. of SIGIR. 339–346. Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Get the paper: Unsupervised Context Retrieval for Long-tail Entities

Unsupervised Context Retrieval for Long-tail En...

Unsupervised Context Retrieval for Long-tail Entities

Darío Garigliotti

More Decks by Darío Garigliotti

Other Decks in Research

Featured

Transcript

Unsupervised Context Retrieval for Long-tail Entities Darío Garigliotti, Dyaa Albakour,

Motivation • Monitoring entities in media streams often relies on

Problem Statement Unsupervised Context Retrieval for Long-tail Entities - ICTIR

Problem Statement Unsupervised Context Retrieval for Long-tail Entities - ICTIR

Problem Statement Unsupervised Context Retrieval for Long-tail Entities - ICTIR

Problem Statement Unsupervised Context Retrieval for Long-tail Entities - ICTIR

Approach Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Approach Support Entity Ranking: importance of a support entity e~

Approach Support Context Ranking: importance of a support context c~

Approach Context-to-Context Ranking: importance of a support context c~ for

Approach Unsupervised Context Retrieval for Long-tail Entities - ICTIR 2019

Approach • Our framework enables to estimate P(c|e), the probability

Component Estimators • Component: Support Entity Ranking - Basic: BM25

Component Estimators • Component: Context-to-Context Ranking - Retrieval score (BM25)

Test Collection • 165 long-tail entities, 73 out of which

Experimental Results • RQ: What is the best way to

Get the paper: Unsupervised Context Retrieval for Long-tail Entities