$30 off During Our Annual Pro Sale. View Details »

Should We Care About Content? Recommending by Proxy with Big Metadata

Ben Fields
October 01, 2012

Should We Care About Content? Recommending by Proxy with Big Metadata

When constructing a music recommender system, which is more important: a musicological understanding of the catalog of music in a system or the number of times two particular songs were played one after the other and were `liked’? Even better, if a system knows the latter, does the former even matter? Do machines that predict behavior need to learn to listen? Or is observing behavior enough?

Ben Fields

October 01, 2012
Tweet

More Decks by Ben Fields

Other Decks in Technology

Transcript

  1. Should we care
    about content?
    Recommending by proxy with big metadata
    B e n F i e l d s
    @ a l s o t h i n g s

    View Slide

  2. Goal:

    View Slide

  3. how to make a
    recommender

    View Slide

  4. not so much how

    View Slide

  5. Rather: ‘what’

    View Slide

  6. what data should be used
    to make a recommender?

    View Slide

  7. First, Examples

    View Slide

  8. Image

    View Slide

  9. http://www.flickr.com/photos/dexxus/5652914929/

    View Slide

  10. http://www.flickr.com/photos/msojka/5285298402/

    View Slide

  11. http://www.flickr.com/photos/pinkertons/7730547912/

    View Slide

  12. http://www.flickr.com/photos/nicholas_t/626009491/

    View Slide

  13. http://www.flickr.com/photos/dexxus/5652914929/

    View Slide

  14. http://www.flickr.com/photos/dexxus/5652914929/ http://www.flickr.com/photos/msojka/5285298402/
    http://www.flickr.com/photos/pinkertons/7730547912/ http://www.flickr.com/photos/nicholas_t/626009491/

    View Slide

  15. http://www.flickr.com/photos/dexxus/5652914929/ http://www.flickr.com/photos/msojka/5285298402/
    http://www.flickr.com/photos/pinkertons/7730547912/ http://www.flickr.com/photos/nicholas_t/626009491/
    Same Camera Model
    Colour Palette Distance
    Location

    View Slide

  16. Music

    View Slide

  17. A small playlist

    View Slide

  18. What should follow
    this?
    Meat Loaf -
    I'd Do Anything for Love

    View Slide

  19. maybe?
    Beethoveen’s Piano Sonata No. 14
    in C Sharp Minor, Op. 27, No. 2

    View Slide

  20. how about?
    Journey - Don’t Stop Believin’

    View Slide

  21. A small playlist
    Meat Loaf -
    I'd Do Anything for Love
    Beethoveen Journey

    View Slide

  22. Recommending?

    View Slide

  23. prediction of opinions

    View Slide

  24. prediction of opinions
    about things

    View Slide

  25. prediction of Actions
    on things

    View Slide

  26. Metadata?

    View Slide

  27. Data About Data

    View Slide

  28. Content descriptors
    are metadata

    View Slide

  29. Ratings are metadata

    View Slide

  30. But it isn’t always
    neat and Tidy

    View Slide

  31. human
    curated
    Content descriptors
    are metadata

    View Slide

  32. In the wild

    View Slide

  33. @alsothings 33

    View Slide

  34. @alsothings 34

    View Slide

  35. @alsothings 35

    View Slide

  36. @alsothings 36

    View Slide

  37. @alsothings 37

    View Slide

  38. @alsothings 38

    View Slide

  39. View Slide

  40. Contests

    View Slide

  41. Netflix Prize

    View Slide

  42. Netflix Prize
    500K people’s ratings
    of 18k movies
    Take these and predict
    the ratings of movies
    that haven’t been
    Rated

    View Slide

  43. BellKor

    View Slide

  44. gradient boosted
    decision trees

    View Slide

  45. Any Value in
    Content?

    View Slide

  46. No.

    View Slide

  47. Million Song
    Dataset (kaggle)

    View Slide

  48. Million Song Dataset
    challenge (kaggle)
    The partial listening
    history of 1M people
    Predict the tracks
    that are missing

    View Slide

  49. Collaborative
    filtering

    View Slide

  50. Any Value in
    Content?

    View Slide

  51. No.

    View Slide

  52. so?

    View Slide

  53. moar data >
    deeper Analysis

    View Slide

  54. Simple use of
    metadata from far
    and wide gets you
    further than Deep
    understanding of
    core-data

    View Slide

  55. But it doesn’t
    help you explain
    things to the
    user.

    View Slide

  56. http://www.netflixprize.com/community/viewtopic.php?id=1537
    http://www.kaggle.com/c/msdchallenge
    http://mir-in-action.blogspot.co.uk/2012/09/collaborative-
    filtering-still-rules.html
    http://www.dtic.upf.edu/~ocelma/PhD/
    http://www.slideshare.net/plamere/music-recommendation-
    and-discovery
    Ben Fields
    @alsothings
    Questions?

    View Slide