Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rebuilding Twitter with Cassandra and Ruby

Rebuilding Twitter with Cassandra and Ruby

Presented at WIDS 2014 in Hong Kong by Matthew Rudy Jacobs

Check out my Twitter implementation https://github.com/matthewrudy/twissandra-rb

Matthew Rudy Jacobs

March 28, 2014
Tweet

More Decks by Matthew Rudy Jacobs

Other Decks in Technology

Transcript

  1. Relational Approach class User has_many :friendships has_many :friends, through: :friendships

    has_many :tweets has_many :timeline_tweets, through: :friends end ! me = User.find_by(username: “me”) me.timeline_tweets.order(“created_at DESC”).limit(20)
  2. 10 TB of data? • Heroku Postgres has a 1TB

    limit • RDS Postgres has a 3TB limit • Amazon I2 8xlarge has a 6TB limit • buy a massive 12TB disk array?
  3. Shard it? Tweets#1 Tweets#2 Tweets#3 Client id % 3 ==

    0 id % 3 == 2 id % 3 == 1 Tweet id generator
  4. NO!

  5. CREATE TABLE CREATE TABLE tweets ( id timeuuid, user_id uuid,

    body text, mentions set<uuid>, ! PRIMARY KEY (id) ); ! CREATE INDEX ON tweets (user_id);
  6. Timelines CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, !

    PRIMARY KEY (user_id, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC); ! SELECT * FROM timeline WHERE user_id = 1337 AND tweet_id < minTimeuuid('2013-02-02 10:00+0000’);
  7. Types • integer - 1234567 • long - 123.456 •

    text - ‘abc123’ • uuid - {756716f7-2e54-4715-9f00-91dcbea6cf50} • timeuuid - NOW() • timestamp - ‘2013-06-13 11:00:00'
  8. Collections • Sets - {‘dog’, ‘cat’, ‘elephant’} • Lists -

    [‘AMZN’, ‘AAPL’, ‘FB’] • Maps - { ‘GOOG': 1200, ‘AAPL’: 512}
  9. Sets CREATE TABLE friends ( user_id uuid, friend_ids set<uuid>, !

    PRIMARY KEY (user_id) ); ! UPDATE users SET friend_ids = friend_ids + {8007} WHERE user_id = 1337;
  10. Counters CREATE TABLE user_tweet_counts ( user_id INT, tweet_count COUNTER, !

    PRIMARY KEY (user_id) ); ! UPDATE user_tweet_counts SET count = count + 1 WHERE user_id = 1337; ! INSERT INTO tweets (id, user_id, body) VALUES (NOW(), 1337, ‘some tweet’);
  11. Distributed A B C D E F G H A

    B C D G H A B C D E F A B C D E F G H E F G H replication factor =3
  12. Singapore Multi-region master master A B C D E F

    G H Virginia A B C D E F G H HK user SF user
  13. Processes • Create a User • Send a Tweet •

    Read my Timeline • Follow a User • Unfollow a User
  14. Send a Tweet • create a Tweet for Me •

    add Tweet to my Followers Timelines • add Tweet to *mentioned* Users' Timeline
  15. Read my Timeline • find Tweets from my Timeline •

    load the Tweet details • load the User details
  16. Follow a User • add User to my Friends •

    add Me to the User's Followers • add User's Tweets to my Timeline
  17. Unfollow a User • remove User from my Friends •

    remove Me from User's Followers • remove User's Tweets from my Timeline
  18. Entities • Users • Tweets • Userline - tweets from

    the user • Timeline - tweets for the user to see • Friends - people the user follows • Followers - people who follow the user
  19. Users • id - used for references • username -

    used for display - UNIQUE • metadata - location, photos and stuff
  20. Tweets • id - used for references • user_id -

    the user who tweeted • body - the text of the tweet • mentions - a list of users mentioned
  21. Userline • user_id - the user who tweeted • tweet_id

    - id of the tweet - UNIQUE • timestamp - when it was tweeted
  22. Timeline • user_id - the user who sees this •

    tweet_id - id of the tweet - UNIQUE • timestamp - when it was tweeted
  23. Friends • user_id - the user • friend_id - the

    person they follow • timestamp - when they were followed
  24. Followers • user_id - the user • follower_id - the

    person who follows them • timestamp - when they were followed
  25. Users • create - insert into database • find_by_username -

    find @matthewrudy • find_all_by_id - find all users for a set of tweets • UNIQUE by username
  26. Timeline • add_tweet_for_user • find_all_by_user • ORDER BY timestamp DESC

    • UNIQUE by {user, tweet} • delete_all_by_tweet_user
  27. Users CREATE TABLE users ( id uuid, username text, location

    text, ! PRIMARY KEY (id) ); ! CREATE INDEX ON users (username);
  28. Tweets CREATE TABLE tweets ( id timeuuid, // unique id

    with timestamp user_id uuid, body text, mentions set<uuid>, ! PRIMARY KEY (id) );
  29. Userline CREATE TABLE userline ( user_id uuid, tweet_id timeuuid, !

    PRIMARY KEY (user_id, tweet_id) ) ! WITH CLUSTERING ORDER BY (tweet_id DESC);
  30. Timeline CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_user_id

    uuid, ! PRIMARY KEY (user_id, tweet_id) ) ! WITH CLUSTERING ORDER BY (tweet_id DESC);
  31. Friends CREATE TABLE friends ( user_id uuid, friend_id uuid, timestamp

    timestamp, ! PRIMARY KEY (user_id, friend_id) );
  32. Followers CREATE TABLE followers ( user_id uuid, follower_id uuid, timestamp

    timestamp, ! PRIMARY KEY (user_id, follower_id) );