The presentation for our entry to the Linked Data Mining Challenge 2016 organized by Know@LOD workshop at ESWC 2016
A Hybrid Method for RatingPrediction Using Linked DataFeatures and Text ReviewsSemih, Y., Emir M., Pasquale, M., Erdogan, D., Halife, K.Linked Data Mining Challenge 2016 - Know@LODESWC 2016, Heraklion, Crete, Greece
View Slide
“What makes a good/bad album ofmusic?Can Linked Open Data help with theclassification of music albums as“good” or “bad”?
1.HyphotesesWhat is our intuition?
Bands vs.SingersBands are moresuccessful than singleartists rdf:type ofdbo:artist
MusicgenresSome genres are morepopular than others dbo:genrehttp://hpo.org/two-things-you-need-to-know-about-genre-hopping/
LanguageAlbums in English aremore likely to bepopular dbo:language
RuntimeLonger albums tend tobe more popular dbo:runtime
ReviewsWords used for goodalbums differ from theones used for badalbumshttp://www.youtube.com
AwardwinnersAlbums of award winningartists are likely to bemore successful #awards of dbo:artist
2.Datasets and Method
Datasets◎ Training dataset: 1,280 album URIs◎ Test dataset: 320 album URIs◎ DBpedia◎ Metacritic.com
Method
3.Experiments andResults
Experimental Setup◎ Python 3.5.1◎ Beautiful Soup Library 4.4.0 (WebScraping)◎ scikit-learn Library 0.17 (Data Mining)◎ Jena Fuseki (LD Caching)
Different Classifiers and Different Feature SetsFeature SetLinearSVMKNNRBFSVMDec.TreeRand.ForestAdaBoostNaïveBayesLD 76.64% 60.47% 48.05% 72.66% 53.91% 75.00% 76.41%LDA 54.53% 52.58% 54.69% 54.45% 48.91% 54.53% 52.89%LD+LDA 76.72% 60.23% 48.05% 72.66% 52.34% 75.00% 76.41%TEXT 85.00% 50.00% 47.27% 67.27% 52.81% 78.91% 68.44%LD+LDA+TEXT 87.81% 52.81% 47.27% 72.03% 52.58% 82.50% 77.19%+ = 90%test set
Thanks!Any questions?Find our solution in GitHubhttps://github.com/semihyumusak/KNOW2016