Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ulas Bardak, Maarten Bosma, Rohan Monga - Data Science @Whisper - LA Data Science Meetup - March 2015

Data Science LA
March 24, 2015
1.6k

Ulas Bardak, Maarten Bosma, Rohan Monga - Data Science @Whisper - LA Data Science Meetup - March 2015

Data Science LA

March 24, 2015
Tweet

More Decks by Data Science LA

Transcript

  1. Data Science at Whisper
    ULAS BARDAK, MAARTEN BOSMA, ROHAN MONGA,
    MARK HSIAO, NICK STUCKY-MACK
    Presented at Data Science, LA Meetup. March 23rd, 2015.

    View full-size slide

  2. A little background on Whisper
    u  Anonymous Social Network
    focused on mobile apps
    u  Users come to share secrets,
    make confessions, find others to
    connect to
    u  No need to create an account
    u  Engagement through replies,
    direct messages, “hearts”
    u  Millions of users & hundreds of
    millions of whispers
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  3. High Level Usage Patterns
    App Launch
    Recommended
    Whispers
    Recommendation
    Engine
    User + Content
    Models
    User Engagement
    Whisper Create
    Suggest Image
    Creation Flow
    Interaction Flow
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  4. Some Problems We Are Tackling
    Content Understanding
    •  Spam detection
    •  Language detection
    •  Content quality
    prediction
    •  Content classification
    •  Image Suggestion
    User Understanding
    •  Spammer detection
    •  Personalization
    •  Similar user detection
    •  Churn prediction
    Overall
    •  A/B testing
    •  Reporting
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  5. Language Detection
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  6. Content Quality Prediction
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  7. Image Suggestion From Text
    Maarten Bosma
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  8. Whisper Creation Flow
    User creates
    text content
    System
    Suggests one
    image
    OK?
    Whisper is
    created
    More
    suggestions
    shown
    OK?
    User searches
    or uploads
    Yes
    Yes
    No No
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  9. When Whisper First Started…
    u  No image suggestions
    u  Users had to type in a search
    phrase after they created
    whispers.
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  10. Image Suggest Goals
    u  5 second create
    u  Support “mood” set in the whisper
    u  High quality images
    u  High variation in suggested images
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  11. Where do we get the images?
    u  Building an image repo is
    difficult:
    u  Need a lot of images
    u  Still need a source to populate
    the repo
    u  Cannot simply use a search
    engine
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  12. Where do we get the images?
    u  Building an image repo is
    difficult:
    u  Need a lot of images
    u  Still need a source to populate
    the repo
    u  Cannot simply use a search
    engine
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  13. Pipeline
    Start Creating
    Whisper
    Generate
    Search Terms
    Read images
    from cache
    Query ext.
    source
    Terms
    Cached
    ?
    Cache
    3rd
    Party
    Yes
    No
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  14. How do we get the search terms?
    We use four different strategies:
    u  Fixed list
    u  Sentiment analysis
    u  Keyword extraction
    Cut in phrases, score them using tf-idf, pos-tags, etc.
    u  Learn from previous searches
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  15. Learning Using Similar Whispers
    Top-n similar whispers (cosine similarity on tf-idf weighted
    bag of words)
    u  We only use image search terms that worked before
    Good, but not great…
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  16. From Similar Whispers to Similar Terms
    u  Represent each term as a vector
    u  Faster, more scalable
    u  Fixed vocabulary
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  17. New Pipeline
    Start Creating
    Whisper
    Generate
    Search Terms
    Read images
    from Image
    Repository
    Image
    Repo
    Generate
    Dictionary
    For each term,
    query 3rd party
    if needed
    Remove low
    quality images
    Offline Processing
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  18. Low Quality Image Detection
    u  Remove dead images
    u  Check how quickly images can be loaded
    u  Remove images too big or too small (in addition to
    query parameters)
    u  Text detection
    u  Images with text make poor Whisper backgrounds
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  19. Text Detection in Images
    u  Developed in-house
    u  Stroke based feature detection
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  20. Future Work
    u  Learn which domains are likely to contain good images
    u  Combine different image sources
    u  Better image quality computations
    u  Other search term prediction strategies
    u  i18n
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  21. Personalization
    Rohan Monga
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  22. Why do we need recommendations?
    Problem:
    Showing every user the exact same content is not
    efficient. Engagement and interest depend on matching
    users’ preferences to content, i.e. personalization.
    Requirements:
    Fast and able to work with little data
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  23. Recommendation Engine - Concerns
    Algorithmic
    •  User’s past actions,
    explicit preferences,
    inferred / implicit
    information
    •  Content features
    •  Model training, testing,
    feedback delay
    between rec. and user
    actions.
    •  …
    Business
    •  Ability to override
    algorithmic decisions for
    special cases
    •  Insights into quality,
    performance of the
    algorithms
    •  Ability to rapidly AB test
    new ideas
    •  …
    Platform
    •  Data Stores, unified
    user and item features
    •  Throughput of the rec.
    engine, timeouts
    •  Code reuse and
    testing
    •  …
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  24. Well that complicates things…
    … let’s see if we can build something not overly complicated.
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  25. Recommendation Engine
    Start out by building a profile for each user based on their
    activity (created/liked/available user properties)
    u  Preferred categories
    u  Preferred languages
    u  Keywords
    u  User device
    u  Whether or not the user is “new”
    u  …
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  26. Recommendation Engine
    High Personalization
    •  Like-minded users
    •  Collaborative
    Filtering
    •  …
    High Coverage
    •  Popular in location
    •  Recently popular
    •  Popular with new
    users
    •  …
    Combiner
    •  Merge results, deciding
    on the right ordering
    •  If not enough results,
    use fallback methods to
    backfill.
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  27. Like-minded User Calculations
    1.  Agglomeration [Convert the user into a giant document]
    2.  Pre-processing [Lowercase, remove stopwords, etc..]
    3.  Vectorization [Bag of words into vectors]
    4.  Dimensionality reduction [Autoencoder maps 5K+ into ~100]
    5.  Similarity calculation [Top k users via cosine similarity]
    6.  Recommendation [Collect whispers from similar users]
    7.  Feedback [Regenerate model with new activity]
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  28. Collaborative Filtering
    u  We want to learn a low dimensional embedding for users
    and Whispers.
    u  Learn a score function f(u,w) that gives scores of whispers
    given a user. Ex:
    u  Define a rank function that ranks all whispers for all users
    *Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image
    annotation: learning to rank with joint word-image embeddings.
    Machine learning, 81(1):21–35, 2010.
    Whisper @ LA DS Meetup, 2015/03/23
    f u,w
    ( )=U
    u
    ⋅W
    w
    rank u,w
    ( )= Ι f u,k
    ( )> f u,w
    ( )
    { }
    k∈w,k≠w

    View full-size slide

  29. Collaborative Filtering
    u  We can then define an error function using the template:
    where L is a non-decreasing loss function and rank is the actual rank.
    u  For large datasets like ours, it is computationally expensive to obtain
    exact ranks of items.
    u  Idea: Online learning to rank - utilize Weighted Approximate Rank Pairwise Loss
    u  Then use stochastic gradient descent for optimization
    u  Extension to basic model: Use like-minded user metrics to make sure
    similar users have similar embeddings.
    err f x
    ( ), y
    ( )= L rank x, y
    ( )
    ( )

    View full-size slide

  30. Current Architecture
    Whisper @ LA DS Meetup, 2015/03/23
    Rec Group for New Users
    Rec Group for Users w/Churn Risk
    User
    Context
    DAO

    User
    W.
    Tier 1
    Method Filter Sort
    Method Filter Sort
    Method Filter Sort
    Tier 2
    Method Filter Sort
    Method Filter Sort
    Method Filter Sort
    Merger
    Group
    Sort

    View full-size slide

  31. Performance
    u  By offloading most of the difficult calculations to offline
    jobs, we simplify the online calculation requirements.
    u  The current system can handle more than 500 queries
    per second with a response time of less than 1 second
    per query.
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  32. Future directions
    u  Use more implicit signals on top of explicit user actions
    u  Personalize the way methods are used for each user by
    employing MABs
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  33. Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  34. Our Technology Stack for DS
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide

  35. Thank You for Listening!
    Questions?
    u  For more info:
    u  http://www.whisper.sh - We are hiring ;-)
    u  Contact me at [email protected]
    u  Try out the app for yourself!
    Whisper @ LA DS Meetup, 2015/03/23

    View full-size slide