Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Hybrid Method for Rating Prediction Using Linked Data Features and Text Reviews

A Hybrid Method for Rating Prediction Using Linked Data Features and Text Reviews

The presentation for our entry to the Linked Data Mining Challenge 2016 organized by Know@LOD workshop at ESWC 2016

Emir Muñoz

May 30, 2016
Tweet

More Decks by Emir Muñoz

Other Decks in Research

Transcript

  1. A Hybrid Method for Rating
    Prediction Using Linked Data
    Features and Text Reviews
    Semih, Y., Emir M., Pasquale, M., Erdogan, D., Halife, K.
    Linked Data Mining Challenge 2016 - Know@LOD
    ESWC 2016, Heraklion, Crete, Greece

    View Slide


  2. What makes a good/bad album of
    music?
    Can Linked Open Data help with the
    classification of music albums as
    “good” or “bad”?

    View Slide

  3. 1.
    Hyphoteses
    What is our intuition?

    View Slide

  4. Bands vs.
    Singers
    Bands are more
    successful than single
    artists  rdf:type of
    dbo:artist

    View Slide

  5. Music
    genres
    Some genres are more
    popular than others 
    dbo:genre
    http://hpo.org/two-things-you-need-to-know-about-genre-hopping/

    View Slide

  6. Language
    Albums in English are
    more likely to be
    popular 
    dbo:language

    View Slide

  7. Runtime
    Longer albums tend to
    be more popular 
    dbo:runtime

    View Slide

  8. Reviews
    Words used for good
    albums differ from the
    ones used for bad
    albums
    http://www.youtube.com

    View Slide

  9. Award
    winners
    Albums of award winning
    artists are likely to be
    more successful  #
    awards of dbo:artist

    View Slide

  10. 2.
    Datasets and Method

    View Slide

  11. Datasets
    ◎ Training dataset: 1,280 album URIs
    ◎ Test dataset: 320 album URIs
    ◎ DBpedia
    ◎ Metacritic.com

    View Slide

  12. Method

    View Slide

  13. 3.
    Experiments and
    Results

    View Slide

  14. Experimental Setup
    ◎ Python 3.5.1
    ◎ Beautiful Soup Library 4.4.0 (Web
    Scraping)
    ◎ scikit-learn Library 0.17 (Data Mining)
    ◎ Jena Fuseki (LD Caching)

    View Slide

  15. Different Classifiers and Different Feature Sets
    Feature Set
    Linear
    SVM
    KNN
    RBF
    SVM
    Dec.
    Tree
    Rand.
    Forest
    AdaBoost
    Naïve
    Bayes
    LD 76.64% 60.47% 48.05% 72.66% 53.91% 75.00% 76.41%
    LDA 54.53% 52.58% 54.69% 54.45% 48.91% 54.53% 52.89%
    LD+LDA 76.72% 60.23% 48.05% 72.66% 52.34% 75.00% 76.41%
    TEXT 85.00% 50.00% 47.27% 67.27% 52.81% 78.91% 68.44%
    LD+LDA+TEXT 87.81% 52.81% 47.27% 72.03% 52.58% 82.50% 77.19%
    + = 90%
    test set

    View Slide

  16. Thanks!
    Any questions?
    Find our solution in GitHub
    https://github.com/semihyumusak/KNOW2016

    View Slide