Knowledge Base 3 Beam, Andrew L., et al. "Clinical Concept Embeddings Learned from Massive Sources of Medical Data." arXiv:1804.01486 (2018). entity Knowledge Base
(e.g. [Gillick et al., ‘19] : 9 million) • They can’t cope with new domains and KB for them. • Prior-based candidate generation works only in general domain. [Gillick et al., ‘19]
for candidates. • This can’t solve cold-start problem, where no annotation exists. • Also, previous studies only links againsts Wikipedia. (If we link mentions to Wikipedia, prior can be utilized.)
recommendation system are trained, improve annotation process for human? Human-In-The-Loop Approach (proposed) ▪ Main focus is candidate ranking step after candidates are generated. ▪ All entities in ref-KB are supposed to have title and description.
ACL ‘18] N character-gram features. ・TFIDF character Ngram(N=1~5) + L2 normalized + cossim. • In this From-Zero-to-Hero paper, they leveraged WordPiece tokenization of BERT. • Still searching other examples ...
variance in surface form, Avg. Amb is lower. • For WWO and 1641, fuzzy search are conducted, which results in the increase of Avg. Cand. : both stands for same name
are trained, improve annotation process for human? • For validating their research question (as the following.) • Evaluation 1: Performance of Recommender ・V.S. Non-interactive ranking performance 2: Simulating User annotation 3: Real User’s annotation performance ・Speed and needness for searching query
the gold is included in cands, that method can specify the gold with high performance. High Recall : Does that CG method can catch gold in candidates? If recall is high, gold may be in cands with high confidence.
the gold is included in cands, that method can specify the gold with high performance. High Recall : Does that CG method can catch gold in candidates? If recall is high, gold may be in cands with high confidence.
the gold is included in cands, that method can specify the gold with high performance. High Recall : Does that CG method can catch gold in candidates? If recall is high, gold may be in cands with high confidence. For noisy text, use Levenshtein.
is due to the fact that wikidata has low length of entity desc. • Levenshtein ML for WWO and 1641 is because for creating Human-in-the-Loop situation, they create KB from mentions in documents. • Sentence Cos-sim. is useful over three datasets.