Slide 1

Slide 1 text

A Hybrid Method for Rating Prediction Using Linked Data Features and Text Reviews Semih, Y., Emir M., Pasquale, M., Erdogan, D., Halife, K. Linked Data Mining Challenge 2016 - Know@LOD ESWC 2016, Heraklion, Crete, Greece

Slide 2

Slide 2 text

“ What makes a good/bad album of music? Can Linked Open Data help with the classification of music albums as “good” or “bad”?

Slide 3

Slide 3 text

1. Hyphoteses What is our intuition?

Slide 4

Slide 4 text

Bands vs. Singers Bands are more successful than single artists  rdf:type of dbo:artist

Slide 5

Slide 5 text

Music genres Some genres are more popular than others  dbo:genre http://hpo.org/two-things-you-need-to-know-about-genre-hopping/

Slide 6

Slide 6 text

Language Albums in English are more likely to be popular  dbo:language

Slide 7

Slide 7 text

Runtime Longer albums tend to be more popular  dbo:runtime

Slide 8

Slide 8 text

Reviews Words used for good albums differ from the ones used for bad albums http://www.youtube.com

Slide 9

Slide 9 text

Award winners Albums of award winning artists are likely to be more successful  # awards of dbo:artist

Slide 10

Slide 10 text

2. Datasets and Method

Slide 11

Slide 11 text

Datasets ◎ Training dataset: 1,280 album URIs ◎ Test dataset: 320 album URIs ◎ DBpedia ◎ Metacritic.com

Slide 12

Slide 12 text

Method

Slide 13

Slide 13 text

3. Experiments and Results

Slide 14

Slide 14 text

Experimental Setup ◎ Python 3.5.1 ◎ Beautiful Soup Library 4.4.0 (Web Scraping) ◎ scikit-learn Library 0.17 (Data Mining) ◎ Jena Fuseki (LD Caching)

Slide 15

Slide 15 text

Different Classifiers and Different Feature Sets Feature Set Linear SVM KNN RBF SVM Dec. Tree Rand. Forest AdaBoost Naïve Bayes LD 76.64% 60.47% 48.05% 72.66% 53.91% 75.00% 76.41% LDA 54.53% 52.58% 54.69% 54.45% 48.91% 54.53% 52.89% LD+LDA 76.72% 60.23% 48.05% 72.66% 52.34% 75.00% 76.41% TEXT 85.00% 50.00% 47.27% 67.27% 52.81% 78.91% 68.44% LD+LDA+TEXT 87.81% 52.81% 47.27% 72.03% 52.58% 82.50% 77.19% + = 90% test set

Slide 16

Slide 16 text

Thanks! Any questions? Find our solution in GitHub https://github.com/semihyumusak/KNOW2016