Save 37% off PRO during our Black Friday Sale! »

Using Linked Data to Mine RDF from Wikipedia’s Tables [POSTER]

175389e8c3ad885108fc33f8f05ba9bd?s=47 Emir Muñoz
February 25, 2014

Using Linked Data to Mine RDF from Wikipedia’s Tables [POSTER]

In 7th ACM Web Search and Data Mining Conference (WSDM 2014), New York City, New York, 24-28 February


Emir Muñoz

February 25, 2014


  1. ACKNOWLEDGEMENTS: This work was supported in part by Fujitsu (Ireland)

    Ltd., by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004, and by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289. Using Linked Data to Mine RDF from Wikipedia’s Tables Emir Muñoz Fujitsu (Ireland) Limited, Galway Aidan Hogan DCC, Universidad de Chile Alessandra Mileo INSIGHT NUI Galway MOTIVATION player …. SPARQL QUERY: SELECT ?player WHERE { ?player dbp:currentclub dbr:Manchester_United_F.C } … INCOMPLETE RESULTS! PROPOSAL dbo:birthPlace dbp:position dbp:position dbp:position dbp:currentclub … (1) dbr:David_de_Gea dbo:birthPlace dbr:Spain . (2) dbr:Fabio_Pereira_de_Silva dbo:birthPlace dbr:Brazil . (3) dbr:Fabio_Pereira_de_Silva dbp:currentclub dbr:Manchester_United_F.C . SUGGESTED TRIPLES: WIKITABLE SURVEY TABLE TAXONOMY: DISTRIBUTIONS: RESULTS (1) EXTRACTED 34.9 MILLION UNIQUE & NOVEL TRIPLES FROM 1.14 MILLION WIKITABLES (8 MACHINES: 4GB RAM, 2.2 GHZ SINGLE CORE; 12 DAYS) (2) INITIAL EVALUATION: (MANUAL ANNOTATION; THREE JUDGES; 750 TRIPLES EACH) (3) MACHINE LEARNING CLASSIFIERS: (CONSENSUS GOLD STANDARD; VARIETY OF FEATURES) FROM 1.14 MILLION WIKITABLES: BAGGING DECISION TREES: SUPPORT VECTOR MACHINES: 1.14 MILLION WIKITABLES: 7.9 MILLION TRIPLES 81.5% PREC., 78.1% ACC., 77.4% REC. 15.3 MILLION TRIPLES 72.4% PREC., 72.6%ACC., 75.8% REC. …