Should We Care About Content? Recommending by Proxy with Big Metadata

Should we care about content? Recommending by proxy with big
metadata B e n F i e l d s @ a l s o t h i n g s

how to make a recommender

not so much how

Rather: ‘what’

what data should be used to make a recommender?

First, Examples

http://www.ﬂickr.com/photos/dexxus/5652914929/

http://www.ﬂickr.com/photos/msojka/5285298402/

http://www.ﬂickr.com/photos/pinkertons/7730547912/

http://www.ﬂickr.com/photos/nicholas_t/626009491/

http://www.ﬂickr.com/photos/dexxus/5652914929/

http://www.flickr.com/photos/dexxus/5652914929/ http://www.flickr.com/photos/msojka/5285298402/ http://www.flickr.com/photos/pinkertons/7730547912/ http://www.flickr.com/photos/nicholas_t/626009491/

http://www.flickr.com/photos/dexxus/5652914929/ http://www.flickr.com/photos/msojka/5285298402/ http://www.flickr.com/photos/pinkertons/7730547912/ http://www.flickr.com/photos/nicholas_t/626009491/ Same Camera Model Colour Palette Distance
Location

A small playlist

What should follow this? Meat Loaf - I'd Do Anything
for Love

maybe? Beethoveen’s Piano Sonata No. 14 in C Sharp Minor,
Op. 27, No. 2

how about? Journey - Don’t Stop Believin’

A small playlist Meat Loaf - I'd Do Anything for
Love Beethoveen Journey

Recommending?

prediction of opinions

prediction of opinions about things

prediction of Actions on things

Metadata?

Data About Data

Content descriptors are metadata

Ratings are metadata

But it isn’t always neat and Tidy

human curated Content descriptors are metadata

In the wild

@alsothings 33

@alsothings 34

@alsothings 35

@alsothings 36

@alsothings 37

@alsothings 38

Contests

Netflix Prize

Netflix Prize 500K people’s ratings of 18k movies Take these
and predict the ratings of movies that haven’t been Rated

BellKor

gradient boosted decision trees

Any Value in Content?

Million Song Dataset (kaggle)

Million Song Dataset challenge (kaggle) The partial listening history of
1M people Predict the tracks that are missing

Collaborative filtering

Any Value in Content?

moar data > deeper Analysis

Simple use of metadata from far and wide gets you
further than Deep understanding of core-data

But it doesn’t help you explain things to the user.

http://www.netflixprize.com/community/viewtopic.php?id=1537 http://www.kaggle.com/c/msdchallenge http://mir-in-action.blogspot.co.uk/2012/09/collaborative- filtering-still-rules.html http://www.dtic.upf.edu/~ocelma/PhD/ http://www.slideshare.net/plamere/music-recommendation- and-discovery Ben Fields @alsothings
Questions?

Should We Care About Content? Recommending by P...

Should We Care About Content? Recommending by Proxy with Big Metadata

More Decks by Ben Fields

Other Decks in Technology

Featured

Transcript