on mobile apps u Users come to share secrets, make confessions, find others to connect to u No need to create an account u Engagement through replies, direct messages, “hearts” u Millions of users & hundreds of millions of whispers Whisper @ LA DS Meetup, 2015/03/23
repo is difficult: u Need a lot of images u Still need a source to populate the repo u Cannot simply use a search engine Whisper @ LA DS Meetup, 2015/03/23
repo is difficult: u Need a lot of images u Still need a source to populate the repo u Cannot simply use a search engine Whisper @ LA DS Meetup, 2015/03/23
different strategies: u Fixed list u Sentiment analysis u Keyword extraction Cut in phrases, score them using tf-idf, pos-tags, etc. u Learn from previous searches Whisper @ LA DS Meetup, 2015/03/23
from Image Repository Image Repo Generate Dictionary For each term, query 3rd party if needed Remove low quality images Offline Processing Whisper @ LA DS Meetup, 2015/03/23
how quickly images can be loaded u Remove images too big or too small (in addition to query parameters) u Text detection u Images with text make poor Whisper backgrounds Whisper @ LA DS Meetup, 2015/03/23
exact same content is not efficient. Engagement and interest depend on matching users’ preferences to content, i.e. personalization. Requirements: Fast and able to work with little data Whisper @ LA DS Meetup, 2015/03/23
preferences, inferred / implicit information • Content features • Model training, testing, feedback delay between rec. and user actions. • … Business • Ability to override algorithmic decisions for special cases • Insights into quality, performance of the algorithms • Ability to rapidly AB test new ideas • … Platform • Data Stores, unified user and item features • Throughput of the rec. engine, timeouts • Code reuse and testing • … Whisper @ LA DS Meetup, 2015/03/23
user based on their activity (created/liked/available user properties) u Preferred categories u Preferred languages u Keywords u User device u Whether or not the user is “new” u … Whisper @ LA DS Meetup, 2015/03/23
• … High Coverage • Popular in location • Recently popular • Popular with new users • … Combiner • Merge results, deciding on the right ordering • If not enough results, use fallback methods to backfill. Whisper @ LA DS Meetup, 2015/03/23
giant document] 2. Pre-processing [Lowercase, remove stopwords, etc..] 3. Vectorization [Bag of words into vectors] 4. Dimensionality reduction [Autoencoder maps 5K+ into ~100] 5. Similarity calculation [Top k users via cosine similarity] 6. Recommendation [Collect whispers from similar users] 7. Feedback [Regenerate model with new activity] Whisper @ LA DS Meetup, 2015/03/23
embedding for users and Whispers. u Learn a score function f(u,w) that gives scores of whispers given a user. Ex: u Define a rank function that ranks all whispers for all users *Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1):21–35, 2010. Whisper @ LA DS Meetup, 2015/03/23 f u,w ( )=U u ⋅W w rank u,w ( )= Ι f u,k ( )> f u,w ( ) { } k∈w,k≠w ∑
using the template: where L is a non-decreasing loss function and rank is the actual rank. u For large datasets like ours, it is computationally expensive to obtain exact ranks of items. u Idea: Online learning to rank - utilize Weighted Approximate Rank Pairwise Loss u Then use stochastic gradient descent for optimization u Extension to basic model: Use like-minded user metrics to make sure similar users have similar embeddings. err f x ( ), y ( )= L rank x, y ( ) ( )
for New Users Rec Group for Users w/Churn Risk User Context DAO … User W. Tier 1 Method Filter Sort Method Filter Sort Method Filter Sort Tier 2 Method Filter Sort Method Filter Sort Method Filter Sort Merger Group Sort
offline jobs, we simplify the online calculation requirements. u The current system can handle more than 500 queries per second with a response time of less than 1 second per query. Whisper @ LA DS Meetup, 2015/03/23