Slide 1

Slide 1 text

Building an activity feed in Ruby a game of trade-offs

Slide 2

Slide 2 text

Building an activity feed in Ruby, a game of trade-offs 1. About me 2. About Eight 3. What is an activity feed? 4. Why should I build an activity feed? 5. How do I build an activity feed? 6. Open Q&A Agenda

Slide 3

Slide 3 text

About me Carlos Donderis Engineer at Sansan Rubyist for about 10 years. Also play with Crystal, Python, Javascript. Learning about Elixir and Go. Hobbies: Photography, Karate, Swordsmanship.

Slide 4

Slide 4 text

Eight is a service for organizing and digitalize your contact’s business cards. How does it works? ❏ Take a picture of your business card. ❏ Eight extract the data from your business card and it transforms it into your profile within Eight’s network. ❏ Scan / take pictures of your contacts ❏ Your contact information will be added to your network. ❏ If any of your contacts is already a Eight user, the activity related to that contact will be displayed in your activity feed. About Eight

Slide 5

Slide 5 text

Eight is free We do have a premium version for: ❏ Individuals ❏ Companies https://8card.net About Eight

Slide 6

Slide 6 text

Sansan is hiring! (it’s ok if you don’t speak Japanese) Ruby, React, Python, C#, Kotlin, Swift https://hrmos.co/pages/sansan/jobs/1000354

Slide 7

Slide 7 text

“An activity stream is a list of recent activities performed by an individual, typically on a single website” Wikipedia: https://en.wikipedia.org/wiki/Activity_stream What is an activity feed?

Slide 8

Slide 8 text

Facebook What is an activity feed? Twitter Instagram

Slide 9

Slide 9 text

Github What is an activity feed? Mercari Others...

Slide 10

Slide 10 text

Activity feeds can take several shapes, but there are some common elements: ❏ Actor ❏ Matt liked your post ❏ Action ❏ Matt liked your post ❏ Target ❏ Matt liked your post What is an activity feed?

Slide 11

Slide 11 text

Actors can be: ❏ Users ❏ AI ❏ Service What is an activity feed?

Slide 12

Slide 12 text

What is an activity feed? Actions can be : ❏ Generated by users organically ❏ Tweets, pictures, check-ins, posts … ❏ Generated in batches ❏ News ❏ A mixture of both ❏ Recommendations of user posts ❏ Targeted news

Slide 13

Slide 13 text

Targets can be: ❏ Posts ❏ Comments ❏ Pictures In general, anything your service will allow users to interact with. Rich featured activity feeds tends to have more targets. What is an activity feed?

Slide 14

Slide 14 text

Why should I build an activity feed?

Slide 15

Slide 15 text

“If you deliver the right information to your users wrapped in the right shape, chances are that they will come back for more.” ❏ Increase user engagement ❏ Increase user retention ❏ Increase DAU / MAU ❏ Increase ROI (Monetization) Why should I build an activity feed?

Slide 16

Slide 16 text

Why did we build an activity feed on Eight? ❏ We wanted to deliver relevant information related to each user connection. ❏ We believe this way, Eight provides extra value. ❏ We wanted to enable a channel of communication between users. ❏ We wanted also to enable an additional source of monetization for Eight. Why should I build an activity feed?

Slide 17

Slide 17 text

Step 1: Identify the kind of feed you want to build How do I build an activity feed?

Slide 18

Slide 18 text

Step 1: Identify the Kind of feed you want to build Kinds of Feeds ❏ Rich features : Facebook ❏ Reduced features: Twitter ❏ Minimal features: Github How do I build an activity feed?

Slide 19

Slide 19 text

How do I build an activity feed? Step 1: Identify the Kind of feed you want to build Why is this important? The kind of feed you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture

Slide 20

Slide 20 text

Step 1: Identify the Kind of feed you want to build Why is this important? Because the kind of fee you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture ➔ Do you really want to build another Facebook? ◆ Are your users going to use it? ➔ Mentions, hashtags, check-ins, should be considered First-class citizen elements. ◆ This features are complex with lots of implications. ◆ Plan and design should be done ASAP. ➔ To innovate or not? ◆ Designing a totally new user experience through feeds might be challenging. ➔ Sorted by timestamp or raked? ➔ Flat or aggregated How do I build an activity feed?

Slide 21

Slide 21 text

➔ How do you expect your users to engage with your feed? ◆ Read only: Github ◆ Heavy on reads: LinkedIn ◆ Heavy on writes: Twitter ◆ Both: Facebook ➔ Does it need to be real time? ◆ Real timish? ➔ Who creates the content? ◆ Users ◆ Batches ◆ Both Step 1: Identify the Kind of feed you want to build Why is this important? Because the kind of fee you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture How do I build an activity feed?

Slide 22

Slide 22 text

➔ Realtime or not? ◆ Language ◆ Framework ◆ App server ➔ Persistence ◆ Do you need persistence? ◆ Optimized for reads ◆ High throughput for writes ◆ Both ➔ Monolithic vs Microservices ➔ How much data do you need to process? ◆ Batch generated content ◆ Influencers ◆ Inactive users How do I build an activity feed? Step 1: Identify the Kind of feed you want to build Why is this important? Because the kind of fee you are building will impact: ❏ The user experience ❏ The kind of traffic you will have ❏ The underlying architecture

Slide 23

Slide 23 text

How do I build an activity feed? Step2: Build it!

Slide 24

Slide 24 text

Step2: Build it! Once all the decisions have been made, you are ready to begin building your feed! But where to start? How do I build an activity feed?

Slide 25

Slide 25 text

How do I build an activity feed? Step2: Build it!

Slide 26

Slide 26 text

Step2: Build it! Some conclusions: ● There is not much specific and updated information about how to build an activity feed ● There are some good resources though, such as: ○ Yahoo Research Paper ■ http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf ○ Stream ■ https://getstream.io/ (You can actually outsource your feed!) ○ LinkedIn Engineering Blog ■ https://engineering.linkedin.com/blog How do I build an activity feed?

Slide 27

Slide 27 text

Step2: Build it! But how to build a feed the Right Way™ ? ● Turns out that there is no right way to build a feed ● There are many right ways for building a feed ● And lots of trade-off to consider How do I build an activity feed?

Slide 28

Slide 28 text

Step2: Build it! Key concept: Fan Out “In message-oriented middleware solutions, fan-out is a messaging pattern used to model an information exchange that implies the delivery (or spreading) of a message to one or multiple destinations possibly in parallel, and not halting the process that executes the messaging to wait for any response to that message.” Wikipedia: https://en.wikipedia.org/wiki/Fan-out_(software) How do I build an activity feed?

Slide 29

Slide 29 text

Step2: Build it! Fan out on write Data is distributed as soon as some content is created. Good: ❏ Optimizes the read time for followers ❏ Good fit for simple feeds ❏ Allows data denormalization Bad: ❏ Heavy on writes. (Influencer effect) ❏ Rich featured feeds features might be complex ❏ Expensive if you allow data updates. ❏ Tends to generate waste (dead users) ❏ Can get challenging to scale How do I build an activity feed?

Slide 30

Slide 30 text

Step2: Build it! Fan out on read Data is generated on demand. Good: ❏ Easier to implement (sort of) ❏ Good fit if you allow data updates ❏ Generates less waste Bad: ❏ Slow. Not a good fit for realtimish feed ❏ Heavy on reads (and writes?) How do I build an activity feed?

Slide 31

Slide 31 text

Step2: Build it! Mixed fan out Mixes previous two methods. ❏ Behave as a write fan out for active users and will perform a read fan out for those users that become active after a long time. ❏ Behave as a write fan out up to a certain amount of followers ❏ Behave as a write fan out only for top ranked content, while behaves as a read fan out for low ranked content. Good: ❏ Best of both worlds Bad: ❏ More complex to implement How do I build an activity feed?

Slide 32

Slide 32 text

About Eight feed: ❏ Eight is a rich featured feed. ❏ Backend is 90% Ruby. ❏ We use Rails for most of our APIs ❏ Most of Eight architecture is backed by AWS. ❏ DynamoDB. ❏ Aurora. ❏ SQS. ❏ Eight feed is timestamp based. ❏ Content is both generated by batches and users. ❏ All user’s posts are available upon pagination. ❏ Users are allowed to like, comment and share posts. ❏ Users can restrict the privacy of all or some of their posts. ❏ Users can ban, hide or block content from another users ❏ Users can tag companies when sharing links or mention users How do I build an activity feed?

Slide 33

Slide 33 text

How does Eight feed works? Writes ❏ Eight feed uses a fan out on writes. ❏ Eight feed is real-timish but not real time. ❏ Data is semi denormalized. ❏ Update: Single source of truth ❏ Item: Relation between post and users + metadata ❏ Decoration: stored in RDB ❏ For the fan out, we use a custom made batch service + SQS How do I build an activity feed?

Slide 34

Slide 34 text

How does Eight feed works? Writes => Challenges? ❏ Posts from users with many contacts are expensive ❏ Some of Eight uses have more than 10000 connections ❏ Popular companies can also generate content ❏ Tags, and mentions can affect the scope of the deliveries. ❏ DynamoDB autoscale sometimes is not fast enough ❏ Need to provision manually for batch-generated content ❏ DynamoDB writes can get expensive due secondary indexes ❏ Need to enqueue more workers manually when SQS gets clogged ❏ We do generate data even for non active users. How do I build an activity feed?

Slide 35

Slide 35 text

How does Eight feed works? Reads ❏ Query DynamoDB for raw feed ❏ We have a DynamoDB wrapper that takes care of: ❏ Queries ❏ Retries ❏ Throttling ❏ Decorate it with RDB data ❏ Return response How do I build an activity feed?

Slide 36

Slide 36 text

How does Eight feed works? Reads => Challenges? How do I build an activity feed?

Slide 37

Slide 37 text

How does Eight feed works? Reads => Challenges? How do I build an activity feed?

Slide 38

Slide 38 text

How does Eight feed works? Reads => Challenges? ❏ We need to query 4 DynamoDB tables per post/request ❏ Running out of read capacity on DynamoDB = no feed ❏ Mix that data with decorated information from RDB ❏ Filter information based on user and device ❏ Device type, App version…. ❏ Caching is complex and sometimes useless ❏ No access for DAX on aws-sdk ruby ❏ Dalli#get_multi performance seems to be not so good. How do I build an activity feed?

Slide 39

Slide 39 text

How do we solve all these challenges?

Slide 40

Slide 40 text

Eight Feed v2

Slide 41

Slide 41 text

Eight Feed v2

Slide 42

Slide 42 text

❏ Transition between a write fanout into a mixed one. ❏ Fanout on writes for active users and on read for inactive users ❏ Flexible follow/unfollow flow through channels. ❏ Make data accessible to anyone who wants to follow an actor ❏ Remove unwanted data from your feed with just one click ❏ Personalized feed. ❏ We want to deliver the most relevant content first ❏ Then fallback to timestamp ❏ Performance optimization ❏ We want to be able to provide responses around the 100 ms range ❏ … in Ruby ❏ Bring Redis into the stack. ❏ Increase scalability And more... Roadmap for Eight Feed v2

Slide 43

Slide 43 text

Currently under heavy development Roadmap for Eight Feed v2

Slide 44

Slide 44 text

❏ Start with one data storage and change it once it becomes obsolete. ❏ Migrations are scary, but there are great tools out there that will help you. ❏ AWS Athena, Data pipeline, Lambda ❏ Ruby can be fast if used right. ❏ Careful with memory bloats. ❏ ActiveSupport, ActiveRecord... ❏ Using many external services will slow down your development environment ❏ Emulate DynamoDB, SQS, Kinesis, Lambda … ❏ Learn the basics and then experiment ❏ It’s fine not to get it right at the first try ❏ Better to spend one week prototyping than one month designing ❏ … and is so much fun. ❏ Keep yourself updated ❏ AWS, GCP, Azure, there are new technology every week ❏ Serverless Aurora anyone? Lessons learned

Slide 45

Slide 45 text

ENJOY! Lessons learned And probably the most important

Slide 46

Slide 46 text

Thank you

Slide 47

Slide 47 text

Questions?