Save 37% off PRO during our Black Friday Sale! »

Ulas Bardak, Maarten Bosma, Rohan Monga - Data Science @Whisper - LA Data Science Meetup - March 2015

E936a58f495e26123f9f537ea31968f7?s=47 Data Science LA
March 24, 2015
1.5k

Ulas Bardak, Maarten Bosma, Rohan Monga - Data Science @Whisper - LA Data Science Meetup - March 2015

E936a58f495e26123f9f537ea31968f7?s=128

Data Science LA

March 24, 2015
Tweet

Transcript

  1. Data Science at Whisper ULAS BARDAK, MAARTEN BOSMA, ROHAN MONGA,

    MARK HSIAO, NICK STUCKY-MACK Presented at Data Science, LA Meetup. March 23rd, 2015.
  2. A little background on Whisper u  Anonymous Social Network focused

    on mobile apps u  Users come to share secrets, make confessions, find others to connect to u  No need to create an account u  Engagement through replies, direct messages, “hearts” u  Millions of users & hundreds of millions of whispers Whisper @ LA DS Meetup, 2015/03/23
  3. High Level Usage Patterns App Launch Recommended Whispers Recommendation Engine

    User + Content Models User Engagement Whisper Create Suggest Image Creation Flow Interaction Flow Whisper @ LA DS Meetup, 2015/03/23
  4. Some Problems We Are Tackling Content Understanding •  Spam detection

    •  Language detection •  Content quality prediction •  Content classification •  Image Suggestion User Understanding •  Spammer detection •  Personalization •  Similar user detection •  Churn prediction Overall •  A/B testing •  Reporting Whisper @ LA DS Meetup, 2015/03/23
  5. Language Detection Whisper @ LA DS Meetup, 2015/03/23

  6. Content Quality Prediction Whisper @ LA DS Meetup, 2015/03/23

  7. Image Suggestion From Text Maarten Bosma Whisper @ LA DS

    Meetup, 2015/03/23
  8. Whisper Creation Flow User creates text content System Suggests one

    image OK? Whisper is created More suggestions shown OK? User searches or uploads Yes Yes No No Whisper @ LA DS Meetup, 2015/03/23
  9. When Whisper First Started… u  No image suggestions u  Users

    had to type in a search phrase after they created whispers. Whisper @ LA DS Meetup, 2015/03/23
  10. Image Suggest Goals u  5 second create u  Support “mood”

    set in the whisper u  High quality images u  High variation in suggested images Whisper @ LA DS Meetup, 2015/03/23
  11. Where do we get the images? u  Building an image

    repo is difficult: u  Need a lot of images u  Still need a source to populate the repo u  Cannot simply use a search engine Whisper @ LA DS Meetup, 2015/03/23
  12. Where do we get the images? u  Building an image

    repo is difficult: u  Need a lot of images u  Still need a source to populate the repo u  Cannot simply use a search engine Whisper @ LA DS Meetup, 2015/03/23
  13. Pipeline Start Creating Whisper Generate Search Terms Read images from

    cache Query ext. source Terms Cached ? Cache 3rd Party Yes No Whisper @ LA DS Meetup, 2015/03/23
  14. How do we get the search terms? We use four

    different strategies: u  Fixed list u  Sentiment analysis u  Keyword extraction Cut in phrases, score them using tf-idf, pos-tags, etc. u  Learn from previous searches Whisper @ LA DS Meetup, 2015/03/23
  15. Learning Using Similar Whispers Top-n similar whispers (cosine similarity on

    tf-idf weighted bag of words) u  We only use image search terms that worked before Good, but not great… Whisper @ LA DS Meetup, 2015/03/23
  16. From Similar Whispers to Similar Terms u  Represent each term

    as a vector u  Faster, more scalable u  Fixed vocabulary Whisper @ LA DS Meetup, 2015/03/23
  17. New Pipeline Start Creating Whisper Generate Search Terms Read images

    from Image Repository Image Repo Generate Dictionary For each term, query 3rd party if needed Remove low quality images Offline Processing Whisper @ LA DS Meetup, 2015/03/23
  18. Low Quality Image Detection u  Remove dead images u  Check

    how quickly images can be loaded u  Remove images too big or too small (in addition to query parameters) u  Text detection u  Images with text make poor Whisper backgrounds Whisper @ LA DS Meetup, 2015/03/23
  19. Text Detection in Images u  Developed in-house u  Stroke based

    feature detection Whisper @ LA DS Meetup, 2015/03/23
  20. Future Work u  Learn which domains are likely to contain

    good images u  Combine different image sources u  Better image quality computations u  Other search term prediction strategies u  i18n Whisper @ LA DS Meetup, 2015/03/23
  21. Personalization Rohan Monga Whisper @ LA DS Meetup, 2015/03/23

  22. Why do we need recommendations? Problem: Showing every user the

    exact same content is not efficient. Engagement and interest depend on matching users’ preferences to content, i.e. personalization. Requirements: Fast and able to work with little data Whisper @ LA DS Meetup, 2015/03/23
  23. Recommendation Engine - Concerns Algorithmic •  User’s past actions, explicit

    preferences, inferred / implicit information •  Content features •  Model training, testing, feedback delay between rec. and user actions. •  … Business •  Ability to override algorithmic decisions for special cases •  Insights into quality, performance of the algorithms •  Ability to rapidly AB test new ideas •  … Platform •  Data Stores, unified user and item features •  Throughput of the rec. engine, timeouts •  Code reuse and testing •  … Whisper @ LA DS Meetup, 2015/03/23
  24. Well that complicates things… … let’s see if we can

    build something not overly complicated. Whisper @ LA DS Meetup, 2015/03/23
  25. Recommendation Engine Start out by building a profile for each

    user based on their activity (created/liked/available user properties) u  Preferred categories u  Preferred languages u  Keywords u  User device u  Whether or not the user is “new” u  … Whisper @ LA DS Meetup, 2015/03/23
  26. Recommendation Engine High Personalization •  Like-minded users •  Collaborative Filtering

    •  … High Coverage •  Popular in location •  Recently popular •  Popular with new users •  … Combiner •  Merge results, deciding on the right ordering •  If not enough results, use fallback methods to backfill. Whisper @ LA DS Meetup, 2015/03/23
  27. Like-minded User Calculations 1.  Agglomeration [Convert the user into a

    giant document] 2.  Pre-processing [Lowercase, remove stopwords, etc..] 3.  Vectorization [Bag of words into vectors] 4.  Dimensionality reduction [Autoencoder maps 5K+ into ~100] 5.  Similarity calculation [Top k users via cosine similarity] 6.  Recommendation [Collect whispers from similar users] 7.  Feedback [Regenerate model with new activity] Whisper @ LA DS Meetup, 2015/03/23
  28. Collaborative Filtering u  We want to learn a low dimensional

    embedding for users and Whispers. u  Learn a score function f(u,w) that gives scores of whispers given a user. Ex: u  Define a rank function that ranks all whispers for all users *Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1):21–35, 2010. Whisper @ LA DS Meetup, 2015/03/23 f u,w ( )=U u ⋅W w rank u,w ( )= Ι f u,k ( )> f u,w ( ) { } k∈w,k≠w ∑
  29. Collaborative Filtering u  We can then define an error function

    using the template: where L is a non-decreasing loss function and rank is the actual rank. u  For large datasets like ours, it is computationally expensive to obtain exact ranks of items. u  Idea: Online learning to rank - utilize Weighted Approximate Rank Pairwise Loss u  Then use stochastic gradient descent for optimization u  Extension to basic model: Use like-minded user metrics to make sure similar users have similar embeddings. err f x ( ), y ( )= L rank x, y ( ) ( )
  30. Current Architecture Whisper @ LA DS Meetup, 2015/03/23 Rec Group

    for New Users Rec Group for Users w/Churn Risk User Context DAO … User W. Tier 1 Method Filter Sort Method Filter Sort Method Filter Sort Tier 2 Method Filter Sort Method Filter Sort Method Filter Sort Merger Group Sort
  31. Performance u  By offloading most of the difficult calculations to

    offline jobs, we simplify the online calculation requirements. u  The current system can handle more than 500 queries per second with a response time of less than 1 second per query. Whisper @ LA DS Meetup, 2015/03/23
  32. Future directions u  Use more implicit signals on top of

    explicit user actions u  Personalize the way methods are used for each user by employing MABs Whisper @ LA DS Meetup, 2015/03/23
  33. Whisper @ LA DS Meetup, 2015/03/23

  34. Our Technology Stack for DS Whisper @ LA DS Meetup,

    2015/03/23
  35. Thank You for Listening! Questions? u  For more info: u 

    http://www.whisper.sh - We are hiring ;-) u  Contact me at ulas@whisper.sh u  Try out the app for yourself! Whisper @ LA DS Meetup, 2015/03/23