Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A social music service, powered by ElasticSearch

A social music service, powered by ElasticSearch

ElasticSearch is one of the main technologies behind serendip.me, the social music service. It enables the application main features - generating the music feed and providing user recommendations.
You can find some more details about the technology behind serendip in this blog post: http://rore.im/posts/building-serendip/

Rotem Hermon

July 28, 2014
Tweet

More Decks by Rotem Hermon

Other Decks in Technology

Transcript

  1. • Scala (and Java) • akka • Play (web app

    and API) • MongoDB • ElasticSearch Stack
  2. Twitter API Facebook API URL Expander Music Service Filters Importer

    Meta Data Enrichers ElasticSearch The Pump akka actors
  3. Post An item containing a music link. { "postid" :

    "0972cd80-01bb-11e4-b21c-123136519c3", "network" : "serendip", "postDate" : "2014-07-01T01:00:04.000Z", "txt" : "#airing \"Door Gunner\"Performed by Herb Hutchinson Written by Jeffrey Deitelbaum #rockradio ROCK INSTRUMENTAL http://srndp.me/ahMFkTQ", "lang" : "en", "uid" : ["tw_...", "fb_...", "sd_..."], "service" : "serendip", "clip" : ["yt_at1kaxrmOR8"], ... }
  4. • ~25M Posts/month • Data continuously increasing: Using monthly indexes!

    • Searches are always within a time frame: Search only on the needed indexes! (e.g. posts-514, posts-614, posts-714) Post
  5. User A social network user (Facebook/Twitter/Serendip) { "network" : "serendip",

    "id" : "4dd0e2775c6b09a536aee1ab", "name" : "Rotem Hermon", "dsc" : "Non-social media amateur", "country" : "Israel", "city" : "Tel Aviv Yaffo", "connectedAccounts" : ["tw_...", "fb_..."], "lastUpdate" : "2014-06-30T09:00:00.000Z", "postCount" : 710, "rockOnCount" : 249, "reairCount" : 93, }
  6. • Requirements: ◦ Combine music from several sources (friends, preferred

    artists, recommendations) ◦ Reactive to user input and actions This means generating the feed in real-time. The Feed
  7. • A collection of “strategies” (e.g. “friends”, “preferred artists”, “suggested

    users”) ◦ A strategy considers most recent user actions • Strategies are dynamically combined in every feed fetch ◦ This translates to searches on posts in Elasticsearch The Feed Algorithm
  8. • A post is indexed with needed data from other

    objects: ◦ User details (e.g. location) ◦ Clip metadata (artist, genre, description, language) • So all required data for a strategy search is contained in the posts index • Cons: space (data is duplicated). integrity (data may not be recent) The Feed Algorithm
  9. Same for creating “stations” by artists or genre: All required

    data is indexed under the post. The Feed Algorithm
  10. • “Music Soulmates” - find users with matching musical taste

    • Common solutions - using machine learning, hadoop, M/R jobs • We’re a small startup. We already have enough systems on our plate • Can we do it with the existing system? Recommendations
  11. • Data preparation: ◦ When importing posts, constantly calculate top

    shared artists for users ◦ Top artists are found using faceted search on posts shared by the user ◦ Mark “spammers” (e.g. a lot of shares of only a single artist) Recommendations
  12. • Finding “Music Soulmates”: ◦ Search for users with matching

    “top artists” ◦ Use scoring to surface users with most matches ◦ Use boosting to tweak the results (e.g. prefer users from the same country, active users, recent activity) Recommendations
  13. • Current setup: ◦ 4 X m2.2xlarge ◦ Most CPU

    - pump imports (indexing + facets) • Scaling: ◦ More nodes, bigger nodes, IOPS optimization ◦ Indexing optimizations (use parent-child for frequently updated fields) Scaling