Slide 1

Slide 1 text

DAT630 Entity linking II. Faegheh Hasibi | University of Stavanger 09/11/2016

Slide 2

Slide 2 text

Recap • Entity linking is task of linking free text to entities • Entity linking is generally performed in a pipeline of three steps: • Mention detection: identifying candidate mention-entity pairs • Entity ranking: ranking entities of each mention • Disambiguation: selecting one entity or none for a mention

Slide 3

Slide 3 text

Entity linking evaluation Mid-level evaluation: • Only for evaluating the first two steps • Rank based metrics: Recall@k, P1, MAP, etc. End-to-end evaluation • Set based metrics: Precision, recall, F-measure Entity linking performance is evaluated using set-based metrics.

Slide 4

Slide 4 text

Entity linking evaluation ground truth system annotation A ˆ A Košice is the biggest city in eastern Slovakia and in 2013 was the European Capital of Culture together with Marseille, France. It is situated on the river Hornád at the eastern reaches of the Slovak Ore Mountains, near the border with Hungary. Košice is the biggest city in eastern Slovakia and in 2013 was the European Capital of Culture together with Marseille, France. It is situated on the river Hornád at the eastern reaches of the Slovak Ore Mountains, near the border with Hungary.

Slide 5

Slide 5 text

Košice is the biggest city in eastern Slovakia and in 2013 was the European Capital of Culture together with Marseille, France. It is situated on the river Hornád at the eastern reaches of the Slovak Ore Mountains, near the border with Hungary. Košice is the biggest city in eastern Slovakia and in 2013 was the European Capital of Culture together with Marseille, France. It is situated on the river Hornád at the eastern reaches of the Slovak Ore Mountains, near the border with Hungary. Entity linking evaluation ground truth system annotation A ˆ A

Slide 6

Slide 6 text

Entity linking evaluation Matching criteria • Both mention and entity should be considered • There are two variations: 1. Perfect match: linked entity and the mention offsets must match 2. Relaxed match: the linked entity must match, it is sufficient if the mention overlaps with the gold standard

Slide 7

Slide 7 text

Entity linking evaluation Matching criteria: • Both mention and entity should be considered • Perfect match: the linked entity and the mention must exactly match the gold standard • Relaxed match: the linked entity must match, it is sufficient if the mention overlaps with the gold standard Aggregation: • metrics are computed over a collection of documents • Micro-averaged: aggregated across mentions • Macro-averaged: aggregated across documents

Slide 8

Slide 8 text

Micro-averaged: • computed across all the mention-entity pairs Macro-averaged: • computed for each document and then averaged over all documents F1 score: Evaluation metrics

Slide 9

Slide 9 text

Exercise Entity linking evaluation

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Macro-averaged metrics

Slide 15

Slide 15 text

Micro-averaged metrics

Slide 16

Slide 16 text

Entity linking in practice

Slide 17

Slide 17 text

TAGME system • A very popular entity linking system • Designed for annotating short texts • Method: • Mention detection: builds dictionary form Wikipedia; keyphraseness for filtering • Entity ranking: uses relatedness weighted by commonness • Disambiguation: Pruning by threshold Accessible at http://tagme.di.unipi.it/

Slide 18

Slide 18 text

TAGME system Pruning threshold

Slide 19

Slide 19 text

Entity linking in queries

Slide 20

Slide 20 text

Entity linking in queries the governator movie person

Slide 21

Slide 21 text

Entity linking in queries the governator movie movie

Slide 22

Slide 22 text

Entity linking in queries france world cup 1998 Two interpretations: • {France, FIFA world cup} • {France national football team, FIFA world cup}

Slide 23

Slide 23 text

Entity linking in queries Input: • Search queries (short and noisy text fragments) • Limited (or even no) context is provided Requirements: • Should be done fast • Multiple interpretations Detecting entity linking interpretations of the query, where each interpretation consists of a set of mention-entity pairs.

Slide 24

Slide 24 text

Approach Similar pipeline approach • Should consider between efficiency and effectiveness • Entity ranking step plays an important role • Entity relatedness features are less important here • each query mostly contain one or two entities • textual similarity features are more effective Mention detection Entity Ranking query Dismabiguation annotated query

Slide 25

Slide 25 text

Evaluation France, FIFA world cup ground truth system annotation ˆ I I france world cup 1998 France football teem, FIFA world cup France, FIFA world cup FIFA world cup

Slide 26

Slide 26 text

Evaluation Evaluating a single query: P = |I T ˆ I| |I| R = |I T ˆ I| |ˆ I| F = 2 · P · R P + R Evaluating multiple queries: F = 2 · P · R P + R