A Hybrid Method for Rating Prediction Using Linked Data Features and Text Reviews

A Hybrid Method for Rating Prediction Using Linked Data Features and Text Reviews

The presentation for our entry to the Linked Data Mining Challenge 2016 organized by Know@LOD workshop at ESWC 2016

175389e8c3ad885108fc33f8f05ba9bd?s=128

Emir Muñoz

May 30, 2016
Tweet

Transcript

  1. A Hybrid Method for Rating Prediction Using Linked Data Features

    and Text Reviews Semih, Y., Emir M., Pasquale, M., Erdogan, D., Halife, K. Linked Data Mining Challenge 2016 - Know@LOD ESWC 2016, Heraklion, Crete, Greece
  2. “ What makes a good/bad album of music? Can Linked

    Open Data help with the classification of music albums as “good” or “bad”?
  3. 1. Hyphoteses What is our intuition?

  4. Bands vs. Singers Bands are more successful than single artists

     rdf:type of dbo:artist
  5. Music genres Some genres are more popular than others 

    dbo:genre http://hpo.org/two-things-you-need-to-know-about-genre-hopping/
  6. Language Albums in English are more likely to be popular

     dbo:language
  7. Runtime Longer albums tend to be more popular  dbo:runtime

  8. Reviews Words used for good albums differ from the ones

    used for bad albums http://www.youtube.com
  9. Award winners Albums of award winning artists are likely to

    be more successful  # awards of dbo:artist
  10. 2. Datasets and Method

  11. Datasets ◎ Training dataset: 1,280 album URIs ◎ Test dataset:

    320 album URIs ◎ DBpedia ◎ Metacritic.com
  12. Method

  13. 3. Experiments and Results

  14. Experimental Setup ◎ Python 3.5.1 ◎ Beautiful Soup Library 4.4.0

    (Web Scraping) ◎ scikit-learn Library 0.17 (Data Mining) ◎ Jena Fuseki (LD Caching)
  15. Different Classifiers and Different Feature Sets Feature Set Linear SVM

    KNN RBF SVM Dec. Tree Rand. Forest AdaBoost Naïve Bayes LD 76.64% 60.47% 48.05% 72.66% 53.91% 75.00% 76.41% LDA 54.53% 52.58% 54.69% 54.45% 48.91% 54.53% 52.89% LD+LDA 76.72% 60.23% 48.05% 72.66% 52.34% 75.00% 76.41% TEXT 85.00% 50.00% 47.27% 67.27% 52.81% 78.91% 68.44% LD+LDA+TEXT 87.81% 52.81% 47.27% 72.03% 52.58% 82.50% 77.19% + = 90% test set
  16. Thanks! Any questions? Find our solution in GitHub https://github.com/semihyumusak/KNOW2016