Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an activity feed in Ruby, a game of tr...

Building an activity feed in Ruby, a game of trade-offs

Creating an activity feed is always challenging, creating one that servers request for 2 million users (and growing) is even more fun.

Working on Eight's activity feed (https://8card.net/) I came to realize that there is no such thing as a "right way" for building a feed, just lots of trade-offs to consider.

So I thought it might be interesting to share a couple of lessons and tricks I have learned on the way.

Carlos Donderis

September 05, 2018
Tweet

More Decks by Carlos Donderis

Other Decks in Programming

Transcript

  1. Building an activity feed in Ruby, a game of trade-offs

    1. About me 2. About Eight 3. What is an activity feed? 4. Why should I build an activity feed? 5. How do I build an activity feed? 6. Open Q&A Agenda
  2. About me Carlos Donderis Engineer at Sansan Rubyist for about

    10 years. Also play with Crystal, Python, Javascript. Learning about Elixir and Go. Hobbies: Photography, Karate, Swordsmanship.
  3. Eight is a service for organizing and digitalize your contact’s

    business cards. How does it works? ❏ Take a picture of your business card. ❏ Eight extract the data from your business card and it transforms it into your profile within Eight’s network. ❏ Scan / take pictures of your contacts ❏ Your contact information will be added to your network. ❏ If any of your contacts is already a Eight user, the activity related to that contact will be displayed in your activity feed. About Eight
  4. Eight is free We do have a premium version for:

    ❏ Individuals ❏ Companies https://8card.net About Eight
  5. Sansan is hiring! (it’s ok if you don’t speak Japanese)

    Ruby, React, Python, C#, Kotlin, Swift https://hrmos.co/pages/sansan/jobs/1000354
  6. “An activity stream is a list of recent activities performed

    by an individual, typically on a single website” Wikipedia: https://en.wikipedia.org/wiki/Activity_stream What is an activity feed?
  7. Activity feeds can take several shapes, but there are some

    common elements: ❏ Actor ❏ Matt liked your post ❏ Action ❏ Matt liked your post ❏ Target ❏ Matt liked your post What is an activity feed?
  8. What is an activity feed? Actions can be : ❏

    Generated by users organically ❏ Tweets, pictures, check-ins, posts … ❏ Generated in batches ❏ News ❏ A mixture of both ❏ Recommendations of user posts ❏ Targeted news
  9. Targets can be: ❏ Posts ❏ Comments ❏ Pictures In

    general, anything your service will allow users to interact with. Rich featured activity feeds tends to have more targets. What is an activity feed?
  10. “If you deliver the right information to your users wrapped

    in the right shape, chances are that they will come back for more.” ❏ Increase user engagement ❏ Increase user retention ❏ Increase DAU / MAU ❏ Increase ROI (Monetization) Why should I build an activity feed?
  11. Why did we build an activity feed on Eight? ❏

    We wanted to deliver relevant information related to each user connection. ❏ We believe this way, Eight provides extra value. ❏ We wanted to enable a channel of communication between users. ❏ We wanted also to enable an additional source of monetization for Eight. Why should I build an activity feed?
  12. Step 1: Identify the kind of feed you want to

    build How do I build an activity feed?
  13. Step 1: Identify the Kind of feed you want to

    build Kinds of Feeds ❏ Rich features : Facebook ❏ Reduced features: Twitter ❏ Minimal features: Github How do I build an activity feed?
  14. How do I build an activity feed? Step 1: Identify

    the Kind of feed you want to build Why is this important? The kind of feed you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture
  15. Step 1: Identify the Kind of feed you want to

    build Why is this important? Because the kind of fee you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture ➔ Do you really want to build another Facebook? ◆ Are your users going to use it? ➔ Mentions, hashtags, check-ins, should be considered First-class citizen elements. ◆ This features are complex with lots of implications. ◆ Plan and design should be done ASAP. ➔ To innovate or not? ◆ Designing a totally new user experience through feeds might be challenging. ➔ Sorted by timestamp or raked? ➔ Flat or aggregated How do I build an activity feed?
  16. ➔ How do you expect your users to engage with

    your feed? ◆ Read only: Github ◆ Heavy on reads: LinkedIn ◆ Heavy on writes: Twitter ◆ Both: Facebook ➔ Does it need to be real time? ◆ Real timish? ➔ Who creates the content? ◆ Users ◆ Batches ◆ Both Step 1: Identify the Kind of feed you want to build Why is this important? Because the kind of fee you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture How do I build an activity feed?
  17. ➔ Realtime or not? ◆ Language ◆ Framework ◆ App

    server ➔ Persistence ◆ Do you need persistence? ◆ Optimized for reads ◆ High throughput for writes ◆ Both ➔ Monolithic vs Microservices ➔ How much data do you need to process? ◆ Batch generated content ◆ Influencers ◆ Inactive users How do I build an activity feed? Step 1: Identify the Kind of feed you want to build Why is this important? Because the kind of fee you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture
  18. Step2: Build it! Once all the decisions have been made,

    you are ready to begin building your feed! But where to start? How do I build an activity feed?
  19. Step2: Build it! Some conclusions: • There is not much

    specific and updated information about how to build an activity feed • There are some good resources though, such as: ◦ Yahoo Research Paper ▪ http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf ◦ Stream ▪ https://getstream.io/ (You can actually outsource your feed!) ◦ LinkedIn Engineering Blog ▪ https://engineering.linkedin.com/blog How do I build an activity feed?
  20. Step2: Build it! But how to build a feed the

    Right Way™ ? • Turns out that there is no right way to build a feed • There are many right ways for building a feed • And lots of trade-off to consider How do I build an activity feed?
  21. Step2: Build it! Key concept: Fan Out “In message-oriented middleware

    solutions, fan-out is a messaging pattern used to model an information exchange that implies the delivery (or spreading) of a message to one or multiple destinations possibly in parallel, and not halting the process that executes the messaging to wait for any response to that message.” Wikipedia: https://en.wikipedia.org/wiki/Fan-out_(software) How do I build an activity feed?
  22. Step2: Build it! Fan out on write Data is distributed

    as soon as some content is created. Good: ❏ Optimizes the read time for followers ❏ Good fit for simple feeds ❏ Allows data denormalization Bad: ❏ Heavy on writes. (Influencer effect) ❏ Rich featured feeds features might be complex ❏ Expensive if you allow data updates. ❏ Tends to generate waste (dead users) ❏ Can get challenging to scale How do I build an activity feed?
  23. Step2: Build it! Fan out on read Data is generated

    on demand. Good: ❏ Easier to implement (sort of) ❏ Good fit if you allow data updates ❏ Generates less waste Bad: ❏ Slow. Not a good fit for realtimish feed ❏ Heavy on reads (and writes?) How do I build an activity feed?
  24. Step2: Build it! Mixed fan out Mixes previous two methods.

    ❏ Behave as a write fan out for active users and will perform a read fan out for those users that become active after a long time. ❏ Behave as a write fan out up to a certain amount of followers ❏ Behave as a write fan out only for top ranked content, while behaves as a read fan out for low ranked content. Good: ❏ Best of both worlds Bad: ❏ More complex to implement How do I build an activity feed?
  25. About Eight feed: ❏ Eight is a rich featured feed.

    ❏ Backend is 90% Ruby. ❏ We use Rails for most of our APIs ❏ Most of Eight architecture is backed by AWS. ❏ DynamoDB. ❏ Aurora. ❏ SQS. ❏ Eight feed is timestamp based. ❏ Content is both generated by batches and users. ❏ All user’s posts are available upon pagination. ❏ Users are allowed to like, comment and share posts. ❏ Users can restrict the privacy of all or some of their posts. ❏ Users can ban, hide or block content from another users ❏ Users can tag companies when sharing links or mention users How do I build an activity feed?
  26. How does Eight feed works? Writes ❏ Eight feed uses

    a fan out on writes. ❏ Eight feed is real-timish but not real time. ❏ Data is semi denormalized. ❏ Update: Single source of truth ❏ Item: Relation between post and users + metadata ❏ Decoration: stored in RDB ❏ For the fan out, we use a custom made batch service + SQS How do I build an activity feed?
  27. How does Eight feed works? Writes => Challenges? ❏ Posts

    from users with many contacts are expensive ❏ Some of Eight uses have more than 10000 connections ❏ Popular companies can also generate content ❏ Tags, and mentions can affect the scope of the deliveries. ❏ DynamoDB autoscale sometimes is not fast enough ❏ Need to provision manually for batch-generated content ❏ DynamoDB writes can get expensive due secondary indexes ❏ Need to enqueue more workers manually when SQS gets clogged ❏ We do generate data even for non active users. How do I build an activity feed?
  28. How does Eight feed works? Reads ❏ Query DynamoDB for

    raw feed ❏ We have a DynamoDB wrapper that takes care of: ❏ Queries ❏ Retries ❏ Throttling ❏ Decorate it with RDB data ❏ Return response How do I build an activity feed?
  29. How does Eight feed works? Reads => Challenges? ❏ We

    need to query 4 DynamoDB tables per post/request ❏ Running out of read capacity on DynamoDB = no feed ❏ Mix that data with decorated information from RDB ❏ Filter information based on user and device ❏ Device type, App version…. ❏ Caching is complex and sometimes useless ❏ No access for DAX on aws-sdk ruby ❏ Dalli#get_multi performance seems to be not so good. How do I build an activity feed?
  30. ❏ Transition between a write fanout into a mixed one.

    ❏ Fanout on writes for active users and on read for inactive users ❏ Flexible follow/unfollow flow through channels. ❏ Make data accessible to anyone who wants to follow an actor ❏ Remove unwanted data from your feed with just one click ❏ Personalized feed. ❏ We want to deliver the most relevant content first ❏ Then fallback to timestamp ❏ Performance optimization ❏ We want to be able to provide responses around the 100 ms range ❏ … in Ruby ❏ Bring Redis into the stack. ❏ Increase scalability And more... Roadmap for Eight Feed v2
  31. ❏ Start with one data storage and change it once

    it becomes obsolete. ❏ Migrations are scary, but there are great tools out there that will help you. ❏ AWS Athena, Data pipeline, Lambda ❏ Ruby can be fast if used right. ❏ Careful with memory bloats. ❏ ActiveSupport, ActiveRecord... ❏ Using many external services will slow down your development environment ❏ Emulate DynamoDB, SQS, Kinesis, Lambda … ❏ Learn the basics and then experiment ❏ It’s fine not to get it right at the first try ❏ Better to spend one week prototyping than one month designing ❏ … and is so much fun. ❏ Keep yourself updated ❏ AWS, GCP, Azure, there are new technology every week ❏ Serverless Aurora anyone? Lessons learned