Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to Cassandra and the Python driver

Tyler Hobbs
February 27, 2014

Intro to Cassandra and the Python driver

Presented at the Austin Web Python Users Group on February 27th, 2014.

There's a video recording of the talk here: http://capitalfactory.lifesize.com/videos/video/322/?access_token=shr00000003227541148304880759831091078996279

Tyler Hobbs

February 27, 2014
Tweet

More Decks by Tyler Hobbs

Other Decks in Technology

Transcript

  1. Five years of Cassandra Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13

    0.1 0.3 0.6 0.7 1.0 1.2 ... 2.0 DSE Jul-08
  2. Application/Use Case • Social Signals: like/want/own features for eBay product

    and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support ACE
  3. Application/Use Case • Adobe AudienceManager: web analytics, content management, and

    online advertising Why Cassandra? • Low-latency • Scalable • Multi-datacenter • Tuneable consistency ACE
  4. Application/Use Case • Logging • Notifications Why Cassandra? • Efficient

    writes • Durable • Scalable • High availability ACE
  5. Memory Hard drive k1 c1:v k1 c1:v k1 c2:v c2:v

    write( , ) k2 c1:v c2:v k2 c1:v c2:v k2 c1:v c2:v
  6. Memory Hard drive k1 c1:v k1 c1:v k1 c2:v c2:v

    write( , ) k1 c1:v c3:v k2 c1:v c2:v k2 c1:v c2:v k1 c1:v c3:v c3:v
  7. 0 20000 40000 60000 80000 0 2 4 6 8

    10 12 Cassandra HBase Redis MySQL NUMBER OF NODES THROUGHPUT OPS/SEC) CASSANDRA VLDB benchmark (RWS)
  8. 0 8750 17500 26250 35000 1 2 4 8 16

    32 Cassandra HBase MongoDB CA SSA N D RA Endpoint benchmark (RW) THROUGHPUT OPS/SEC) NUMBER OF NODES
  9. Ease of use CREATE TABLE users ( id uuid PRIMARY

    KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
  10. Primary key determines placement* Partitioning jim carol johnny suzy age:

    36 car: camaro gender: M age: 37 car: subaru gender: F age:12 gender: M age:10 gender: F
  11. jim carol johnny suzy PK 5e02739678... a9a0198010... f4eb27cea7... 78b421309e... Murmur

    Hash Murmur* hash operation yields a 64-bit number for keys of any size.
  12. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End

    A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  13. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End

    A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  14. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End

    A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  15. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End

    A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  16. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End

    A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  17. C’’ A’’ D’ C’ A’ D A B’ C B

    Virtual nodes Node D Node C Node B Node A Without vnodes With vnodes
  18. #CASSANDRAEU Race condition SELECT name FROM users WHERE username =

    'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin';
  19. #CASSANDRAEU Race condition SELECT name FROM users WHERE username =

    'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows)
  20. #CASSANDRAEU Race condition SELECT name FROM users WHERE username =

    'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01');
  21. #CASSANDRAEU Race condition SELECT name FROM users WHERE username =

    'pmcfadin'; This one wins (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01');
  22. #CASSANDRAEU Lightweight transactions INSERT INTO users (username, name, password, created_date)

    VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS;
  23. #CASSANDRAEU Lightweight transactions INSERT INTO users (username, name, password, created_date)

    VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------- INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;
  24. #CASSANDRAEU Lightweight transactions [applied] | username | created_date | name

    -----------+----------+----------------+---------------- False | pmcfadin | 2011-06-20 ... | Patrick McFadin INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------- INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;
  25. Paxos •All operations are quorum-based •Each replica sends information about

    unfinished operations to the leader during prepare •Paxos made Simple
  26. Details •4 round trips vs 1 for normal updates •Paxos

    state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweight- transactions-in-cassandra-2-0
  27. Use with caution •Great for 1% of your application •Eventual

    consistency is your friend •http://www.slideshare.net/planetcassandra/c- summit-2013-eventual-consistency-hopeful-consistency- by-christos-kalantzis
  28. Connecting from cassandra.cluster import Cluster cluster = Cluster([‘192.168.1.1’, ‘192.168.1.2’]) session

    = cluster.connect() rows = session.execute( “SELECT * FROM users.accounts”)
  29. Prepared Statements fetch_user = session.prepare( “SELECT * FROM accounts WHERE

    id=?”) for id in users_to_fetch: user = session.execute(fetch_user, [id])
  30. Async Queries fetch = session.prepare( “SELECT * FROM accounts WHERE

    id=?”) futures = [] for id in users_to_fetch: future = session.execute_async(fetch, [id]) futures.append(future) users = [f.result()[0] for f in futures]
  31. Async Queries def process_user(query_results): # do some work ... def

    log_error(query_exception): logger.error(“Bad stuff:”, query_exception) future = session.execute_async( “SELECT username, id FROM users.accounts”) future.add_callbacks(process_user, log_error)
  32. Tracing •Breakdown of how long each step of the query

    took •Trace covers coordinator and replica nodes
  33. Tracing query = SimpleStatement( “SELECT * FROM accounts WHERE id=%s”)

    session.execute(query, [user_id], trace=True) print query.trace
  34. Followers CREATE TABLE followers ( username text, follower text, since

    timestamp, PRIMARY KEY (username, follower) )
  35. Followers CREATE TABLE followers ( username text, follower text, since

    timestamp, PRIMARY KEY (username, follower) ) •Partition by username •Cluster by follower •SELECT follower FROM followers WHERE username=?
  36. Tweets CREATE TABLE tweets_by_user ( username text, tweet_id timeuuid, body

    text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC)
  37. Tweets CREATE TABLE tweets_by_user ( username text, tweet_id timeuuid, body

    text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC) •Partition by user •Cluster by timeuuid (timestamp) •SELECT * FROM tweets_by_user WHERE username=? LIMIT 100
  38. Tweets by Followed CREATE TABLE tweets_by_followed ( username text, tweet_id

    timeuuid, author_username text, body text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC)
  39. Tweets by Followed CREATE TABLE tweets_by_followed ( username text, tweet_id

    timeuuid, author_username text, body text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC) •Denormalize, append to each follower’s timeline •Extra writes, cheaper reads