Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an activity feed in Ruby, a game of trade-offs

Building an activity feed in Ruby, a game of trade-offs

Creating an activity feed is always challenging, creating one that servers request for 2 million users (and growing) is even more fun.

Working on Eight's activity feed (https://8card.net/) I came to realize that there is no such thing as a "right way" for building a feed, just lots of trade-offs to consider.

So I thought it might be interesting to share a couple of lessons and tricks I have learned on the way.

Carlos Donderis

September 05, 2018
Tweet

More Decks by Carlos Donderis

Other Decks in Programming

Transcript

  1. Building an activity feed in Ruby
    a game of trade-offs

    View Slide

  2. Building an activity feed in Ruby, a game of trade-offs
    1. About me
    2. About Eight
    3. What is an activity feed?
    4. Why should I build an activity feed?
    5. How do I build an activity feed?
    6. Open Q&A
    Agenda

    View Slide

  3. About me
    Carlos Donderis
    Engineer at Sansan
    Rubyist for about 10 years.
    Also play with Crystal, Python, Javascript.
    Learning about Elixir and Go.
    Hobbies: Photography, Karate, Swordsmanship.

    View Slide

  4. Eight is a service for organizing and digitalize your
    contact’s business cards.
    How does it works?
    ❏ Take a picture of your business card.
    ❏ Eight extract the data from your business card
    and it transforms it into your profile within Eight’s
    network.
    ❏ Scan / take pictures of your contacts
    ❏ Your contact information will be added to your
    network.
    ❏ If any of your contacts is already a Eight user,
    the activity related to that contact will be
    displayed in your activity feed.
    About Eight

    View Slide

  5. Eight is free
    We do have a premium version for:
    ❏ Individuals
    ❏ Companies
    https://8card.net
    About Eight

    View Slide

  6. Sansan is hiring!
    (it’s ok if you don’t speak Japanese)
    Ruby, React, Python, C#, Kotlin, Swift
    https://hrmos.co/pages/sansan/jobs/1000354

    View Slide

  7. “An activity stream is a list of recent activities performed by an
    individual, typically on a single website”
    Wikipedia: https://en.wikipedia.org/wiki/Activity_stream
    What is an activity feed?

    View Slide

  8. Facebook
    What is an activity feed?
    Twitter Instagram

    View Slide

  9. Github
    What is an activity feed?
    Mercari Others...

    View Slide

  10. Activity feeds can take several shapes, but there are some common
    elements:
    ❏ Actor
    ❏ Matt liked your post
    ❏ Action
    ❏ Matt liked your post
    ❏ Target
    ❏ Matt liked your post
    What is an activity feed?

    View Slide

  11. Actors can be:
    ❏ Users
    ❏ AI
    ❏ Service
    What is an activity feed?

    View Slide

  12. What is an activity feed?
    Actions can be :
    ❏ Generated by users organically
    ❏ Tweets, pictures, check-ins, posts …
    ❏ Generated in batches
    ❏ News
    ❏ A mixture of both
    ❏ Recommendations of user posts
    ❏ Targeted news

    View Slide

  13. Targets can be:
    ❏ Posts
    ❏ Comments
    ❏ Pictures
    In general, anything your service will allow users to interact with.
    Rich featured activity feeds tends to have more targets.
    What is an activity feed?

    View Slide

  14. Why should I build an activity feed?

    View Slide

  15. “If you deliver the right information to your users wrapped in the right
    shape, chances are that they will come back for more.”
    ❏ Increase user engagement
    ❏ Increase user retention
    ❏ Increase DAU / MAU
    ❏ Increase ROI (Monetization)
    Why should I build an activity feed?

    View Slide

  16. Why did we build an activity feed on Eight?
    ❏ We wanted to deliver relevant information related to each user
    connection.
    ❏ We believe this way, Eight provides extra value.
    ❏ We wanted to enable a channel of communication between users.
    ❏ We wanted also to enable an additional source of monetization for
    Eight.
    Why should I build an activity feed?

    View Slide

  17. Step 1: Identify the kind of feed you want to build
    How do I build an activity feed?

    View Slide

  18. Step 1: Identify the Kind of feed you want to build
    Kinds of Feeds
    ❏ Rich features : Facebook
    ❏ Reduced features: Twitter
    ❏ Minimal features: Github
    How do I build an activity feed?

    View Slide

  19. How do I build an activity feed?
    Step 1: Identify the Kind of feed you want to build
    Why is this important?
    The kind of feed you are building will impact:
    ❏ The user experience
    ❏ The kind of traffic you will have
    ❏ The underlying architecture

    View Slide

  20. Step 1: Identify the Kind of feed you want to build
    Why is this important?
    Because the kind of fee you are building will impact:
    ❏ The user experience
    ❏ The kind of traffic you will have
    ❏ The underlying architecture
    ➔ Do you really want to build another
    Facebook?
    ◆ Are your users going to use it?
    ➔ Mentions, hashtags, check-ins, should be
    considered First-class citizen elements.
    ◆ This features are complex with lots of
    implications.
    ◆ Plan and design should be done
    ASAP.
    ➔ To innovate or not?
    ◆ Designing a totally new user
    experience through feeds might be
    challenging.
    ➔ Sorted by timestamp or raked?
    ➔ Flat or aggregated
    How do I build an activity feed?

    View Slide

  21. ➔ How do you expect your users to engage
    with your feed?
    ◆ Read only: Github
    ◆ Heavy on reads: LinkedIn
    ◆ Heavy on writes: Twitter
    ◆ Both: Facebook
    ➔ Does it need to be real time?
    ◆ Real timish?
    ➔ Who creates the content?
    ◆ Users
    ◆ Batches
    ◆ Both
    Step 1: Identify the Kind of feed you want to build
    Why is this important?
    Because the kind of fee you are building will impact:
    ❏ The user experience
    ❏ The kind of traffic you will have
    ❏ The underlying architecture
    How do I build an activity feed?

    View Slide

  22. ➔ Realtime or not?
    ◆ Language
    ◆ Framework
    ◆ App server
    ➔ Persistence
    ◆ Do you need persistence?
    ◆ Optimized for reads
    ◆ High throughput for writes
    ◆ Both
    ➔ Monolithic vs Microservices
    ➔ How much data do you need to process?
    ◆ Batch generated content
    ◆ Influencers
    ◆ Inactive users
    How do I build an activity feed?
    Step 1: Identify the Kind of feed you want to build
    Why is this important?
    Because the kind of fee you are building will impact:
    ❏ The user experience
    ❏ The kind of traffic you will have
    ❏ The underlying architecture

    View Slide

  23. How do I build an activity feed?
    Step2: Build it!

    View Slide

  24. Step2: Build it!
    Once all the decisions have been made, you are ready to begin building your feed!
    But where to start?
    How do I build an activity feed?

    View Slide

  25. How do I build an activity feed?
    Step2: Build it!

    View Slide

  26. Step2: Build it!
    Some conclusions:
    ● There is not much specific and updated information about how to build an activity feed
    ● There are some good resources though, such as:
    ○ Yahoo Research Paper
    ■ http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf
    ○ Stream
    ■ https://getstream.io/ (You can actually outsource your feed!)
    ○ LinkedIn Engineering Blog
    ■ https://engineering.linkedin.com/blog
    How do I build an activity feed?

    View Slide

  27. Step2: Build it!
    But how to build a feed the Right Way™ ?
    ● Turns out that there is no right way to build a feed
    ● There are many right ways for building a feed
    ● And lots of trade-off to consider
    How do I build an activity feed?

    View Slide

  28. Step2: Build it!
    Key concept: Fan Out
    “In message-oriented middleware solutions, fan-out is a messaging pattern used to model an information exchange that
    implies the delivery (or spreading) of a message to one or multiple destinations possibly in parallel, and not halting the
    process that executes the messaging to wait for any response to that message.”
    Wikipedia: https://en.wikipedia.org/wiki/Fan-out_(software)
    How do I build an activity feed?

    View Slide

  29. Step2: Build it!
    Fan out on write
    Data is distributed as soon as some content is created.
    Good:
    ❏ Optimizes the read time for followers
    ❏ Good fit for simple feeds
    ❏ Allows data denormalization
    Bad:
    ❏ Heavy on writes. (Influencer effect)
    ❏ Rich featured feeds features might be complex
    ❏ Expensive if you allow data updates.
    ❏ Tends to generate waste (dead users)
    ❏ Can get challenging to scale
    How do I build an activity feed?

    View Slide

  30. Step2: Build it!
    Fan out on read
    Data is generated on demand.
    Good:
    ❏ Easier to implement (sort of)
    ❏ Good fit if you allow data updates
    ❏ Generates less waste
    Bad:
    ❏ Slow. Not a good fit for realtimish feed
    ❏ Heavy on reads (and writes?)
    How do I build an activity feed?

    View Slide

  31. Step2: Build it!
    Mixed fan out
    Mixes previous two methods.
    ❏ Behave as a write fan out for active users and will perform a read fan out for those users that
    become active after a long time.
    ❏ Behave as a write fan out up to a certain amount of followers
    ❏ Behave as a write fan out only for top ranked content, while behaves as a read fan out for low
    ranked content.
    Good:
    ❏ Best of both worlds
    Bad:
    ❏ More complex to implement
    How do I build an activity feed?

    View Slide

  32. About Eight feed:
    ❏ Eight is a rich featured feed.
    ❏ Backend is 90% Ruby.
    ❏ We use Rails for most of our APIs
    ❏ Most of Eight architecture is backed by AWS.
    ❏ DynamoDB.
    ❏ Aurora.
    ❏ SQS.
    ❏ Eight feed is timestamp based.
    ❏ Content is both generated by batches and users.
    ❏ All user’s posts are available upon pagination.
    ❏ Users are allowed to like, comment and share posts.
    ❏ Users can restrict the privacy of all or some of their posts.
    ❏ Users can ban, hide or block content from another users
    ❏ Users can tag companies when sharing links or mention users
    How do I build an activity feed?

    View Slide

  33. How does Eight feed works?
    Writes
    ❏ Eight feed uses a fan out on writes.
    ❏ Eight feed is real-timish but not real time.
    ❏ Data is semi denormalized.
    ❏ Update: Single source of truth
    ❏ Item: Relation between post and users + metadata
    ❏ Decoration: stored in RDB
    ❏ For the fan out, we use a custom made batch service + SQS
    How do I build an activity feed?

    View Slide

  34. How does Eight feed works?
    Writes => Challenges?
    ❏ Posts from users with many contacts are expensive
    ❏ Some of Eight uses have more than 10000 connections
    ❏ Popular companies can also generate content
    ❏ Tags, and mentions can affect the scope of the deliveries.
    ❏ DynamoDB autoscale sometimes is not fast enough
    ❏ Need to provision manually for batch-generated content
    ❏ DynamoDB writes can get expensive due secondary indexes
    ❏ Need to enqueue more workers manually when SQS gets clogged
    ❏ We do generate data even for non active users.
    How do I build an activity feed?

    View Slide

  35. How does Eight feed works?
    Reads
    ❏ Query DynamoDB for raw feed
    ❏ We have a DynamoDB wrapper that takes care of:
    ❏ Queries
    ❏ Retries
    ❏ Throttling
    ❏ Decorate it with RDB data
    ❏ Return response
    How do I build an activity feed?

    View Slide

  36. How does Eight feed works?
    Reads => Challenges?
    How do I build an activity feed?

    View Slide

  37. How does Eight feed works?
    Reads => Challenges?
    How do I build an activity feed?

    View Slide

  38. How does Eight feed works?
    Reads => Challenges?
    ❏ We need to query 4 DynamoDB tables per post/request
    ❏ Running out of read capacity on DynamoDB = no feed
    ❏ Mix that data with decorated information from RDB
    ❏ Filter information based on user and device
    ❏ Device type, App version….
    ❏ Caching is complex and sometimes useless
    ❏ No access for DAX on aws-sdk ruby
    ❏ Dalli#get_multi performance seems to be not so good.
    How do I build an activity feed?

    View Slide

  39. How do we solve all these
    challenges?

    View Slide

  40. Eight Feed v2

    View Slide

  41. Eight Feed v2

    View Slide

  42. ❏ Transition between a write fanout into a mixed one.
    ❏ Fanout on writes for active users and on read for inactive users
    ❏ Flexible follow/unfollow flow through channels.
    ❏ Make data accessible to anyone who wants to follow an actor
    ❏ Remove unwanted data from your feed with just one click
    ❏ Personalized feed.
    ❏ We want to deliver the most relevant content first
    ❏ Then fallback to timestamp
    ❏ Performance optimization
    ❏ We want to be able to provide responses around the 100 ms range
    ❏ … in Ruby
    ❏ Bring Redis into the stack.
    ❏ Increase scalability
    And more...
    Roadmap for Eight Feed v2

    View Slide

  43. Currently under heavy development
    Roadmap for Eight Feed v2

    View Slide

  44. ❏ Start with one data storage and change it once it becomes obsolete.
    ❏ Migrations are scary, but there are great tools out there that will help you.
    ❏ AWS Athena, Data pipeline, Lambda
    ❏ Ruby can be fast if used right.
    ❏ Careful with memory bloats.
    ❏ ActiveSupport, ActiveRecord...
    ❏ Using many external services will slow down your development environment
    ❏ Emulate DynamoDB, SQS, Kinesis, Lambda …
    ❏ Learn the basics and then experiment
    ❏ It’s fine not to get it right at the first try
    ❏ Better to spend one week prototyping than one month designing
    ❏ … and is so much fun.
    ❏ Keep yourself updated
    ❏ AWS, GCP, Azure, there are new technology every week
    ❏ Serverless Aurora anyone?
    Lessons learned

    View Slide

  45. ENJOY!
    Lessons learned
    And probably the most important

    View Slide

  46. Thank you

    View Slide

  47. Questions?

    View Slide