Slide 1

Slide 1 text

©2014 DataStax Cassandra committer, drivers eng at DataStax [email protected] @tylhobbs Tyler Hobbs Cassandra + Python 1

Slide 2

Slide 2 text

Five years of Cassandra Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13 0.1 0.3 0.6 0.7 1.0 1.2 ... 2.0 DSE Jul-08

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support ACE

Slide 5

Slide 5 text

Time series data

Slide 6

Slide 6 text

Multi-datacenter support

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Distributed counters

Slide 9

Slide 9 text

Hadoop support

Slide 10

Slide 10 text

Application/Use Case • Adobe AudienceManager: web analytics, content management, and online advertising Why Cassandra? • Low-latency • Scalable • Multi-datacenter • Tuneable consistency ACE

Slide 11

Slide 11 text

Low latency

Slide 12

Slide 12 text

Bootstrapping

Slide 13

Slide 13 text

Bootstrapping

Slide 14

Slide 14 text

Bootstrapping s d s d s d s d

Slide 15

Slide 15 text

Bootstrapping s d s d s d s d

Slide 16

Slide 16 text

Bootstrapping

Slide 17

Slide 17 text

Tuneable consistency •(We’ll come back to this)

Slide 18

Slide 18 text

Application/Use Case • Logging • Notifications Why Cassandra? • Efficient writes • Durable • Scalable • High availability ACE

Slide 19

Slide 19 text

Durable + efficient writes Memory Hard drive Memtable write( , ) k1 c1:v1 Commit log

Slide 20

Slide 20 text

Memory Hard drive Memtable write( , k1 c1:v Commit log k1 c1:v k1 c1:v

Slide 21

Slide 21 text

Memory Hard drive write( , k1 c2:v k1 c1:v k1 c1:v k1 c2:v c2:v

Slide 22

Slide 22 text

Memory Hard drive k1 c1:v k1 c1:v k1 c2:v c2:v write( , ) k2 c1:v c2:v k2 c1:v c2:v k2 c1:v c2:v

Slide 23

Slide 23 text

Memory Hard drive k1 c1:v k1 c1:v k1 c2:v c2:v write( , ) k1 c1:v c3:v k2 c1:v c2:v k2 c1:v c2:v k1 c1:v c3:v c3:v

Slide 24

Slide 24 text

Memory Hard drive SSTable flush k1 c1:v c2:v k2 c1:v c2:v c3:v index / BF cleanup

Slide 25

Slide 25 text

High availability •99.9999% availability on Cassandra •(We’ll come back to this, too)

Slide 26

Slide 26 text

Core values •Massive scalability •High performance •Ease of use •Reliability/ Cassandra HBase Redis MySQL

Slide 27

Slide 27 text

0 20000 40000 60000 80000 0 2 4 6 8 10 12 Cassandra HBase Redis MySQL NUMBER OF NODES THROUGHPUT OPS/SEC) CASSANDRA VLDB benchmark (RWS)

Slide 28

Slide 28 text

0 8750 17500 26250 35000 1 2 4 8 16 32 Cassandra HBase MongoDB CA SSA N D RA Endpoint benchmark (RW) THROUGHPUT OPS/SEC) NUMBER OF NODES

Slide 29

Slide 29 text

Ease of use CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Classic partitioning (SPOF) partition 1 partition 2 partition 3 partition 4 router client

Slide 32

Slide 32 text

(Not a theoretical problem) https://speakerdeck.com/mitsuhiko/a-year-of-mongodb http://aphyr.com/posts/288-the-network-is-reliable

Slide 33

Slide 33 text

Fully distributed, no SPOF p1 p1 p1 p3 p6 Client

Slide 34

Slide 34 text

Primary key determines placement* Partitioning jim carol johnny suzy age: 36 car: camaro gender: M age: 37 car: subaru gender: F age:12 gender: M age:10 gender: F

Slide 35

Slide 35 text

jim carol johnny suzy PK 5e02739678... a9a0198010... f4eb27cea7... 78b421309e... Murmur Hash Murmur* hash operation yields a 64-bit number for keys of any size.

Slide 36

Slide 36 text

Node D Node C Node B Node A The “token ring”

Slide 37

Slide 37 text

jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0

Slide 38

Slide 38 text

jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0

Slide 39

Slide 39 text

jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0

Slide 40

Slide 40 text

jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0

Slide 41

Slide 41 text

jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0

Slide 42

Slide 42 text

Node D Node C Node B Node A carol a9a0198010... Replication

Slide 43

Slide 43 text

Node D Node C Node B Node A carol a9a0198010...

Slide 44

Slide 44 text

Node D Node C Node B Node A carol a9a0198010...

Slide 45

Slide 45 text

C’’ A’’ D’ C’ A’ D A B’ C B Virtual nodes Node D Node C Node B Node A Without vnodes With vnodes

Slide 46

Slide 46 text

A closer look at reads Client Coordinator 40% busy 90% busy 30% busy

Slide 47

Slide 47 text

A closer look at reads Client Coordinator 40% busy 90% busy 30% busy

Slide 48

Slide 48 text

A closer look at reads Client Coordinator 40% busy 90% busy 30% busy

Slide 49

Slide 49 text

A closer look at reads Client Coordinator 40% busy 90% busy 30% busy

Slide 50

Slide 50 text

A closer look at reads Client Coordinator 40% busy 90% busy 30% busy

Slide 51

Slide 51 text

Consistency levels Client Coordinator 40% busy 90% busy 30% busy

Slide 52

Slide 52 text

Consistency levels Client Coordinator 40% busy 90% busy 30% busy

Slide 53

Slide 53 text

Consistency levels Client Coordinator 40% busy 90% busy 30% busy

Slide 54

Slide 54 text

Consistency levels Client Coordinator 40% busy 90% busy 30% busy

Slide 55

Slide 55 text

Consistency levels Client Coordinator 40% busy 90% busy 30% busy

Slide 56

Slide 56 text

Consistency levels •ONE •QUORUM •LOCAL_QUORUM •EACH_QUORUM •ALL

Slide 57

Slide 57 text

#CASSANDRAEU Race condition SELECT name FROM users WHERE username = 'pmcfadin';

Slide 58

Slide 58 text

#CASSANDRAEU Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin';

Slide 59

Slide 59 text

#CASSANDRAEU Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows)

Slide 60

Slide 60 text

#CASSANDRAEU Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01');

Slide 61

Slide 61 text

#CASSANDRAEU Race condition SELECT name FROM users WHERE username = 'pmcfadin'; This one wins (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01');

Slide 62

Slide 62 text

#CASSANDRAEU Lightweight transactions INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS;

Slide 63

Slide 63 text

#CASSANDRAEU Lightweight transactions INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------- INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;

Slide 64

Slide 64 text

#CASSANDRAEU Lightweight transactions [applied] | username | created_date | name -----------+----------+----------------+---------------- False | pmcfadin | 2011-06-20 ... | Patrick McFadin INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------- INSERT INTO users (username, name, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;

Slide 65

Slide 65 text

Paxos •All operations are quorum-based •Each replica sends information about unfinished operations to the leader during prepare •Paxos made Simple

Slide 66

Slide 66 text

Details •4 round trips vs 1 for normal updates •Paxos state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweight- transactions-in-cassandra-2-0

Slide 67

Slide 67 text

Use with caution •Great for 1% of your application •Eventual consistency is your friend •http://www.slideshare.net/planetcassandra/c- summit-2013-eventual-consistency-hopeful-consistency- by-christos-kalantzis

Slide 68

Slide 68 text

Cassandra Questions?

Slide 69

Slide 69 text

Python Driver •github.com/datastax/python-driver •pip install cassandra-driver •Apache License

Slide 70

Slide 70 text

Connecting from cassandra.cluster import Cluster cluster = Cluster([‘192.168.1.1’, ‘192.168.1.2’]) session = cluster.connect() rows = session.execute( “SELECT * FROM users.accounts”)

Slide 71

Slide 71 text

Connecting cluster = Cluster([‘192.168.1.1’, ‘192.168.1.2’]) Provide contact points, and the rest of the cluster is automatically discovered.

Slide 72

Slide 72 text

Executing Queries rows = session.execute( “SELECT username, id FROM users.accounts”) for row in rows: print row.username, row.id

Slide 73

Slide 73 text

Executing Queries rows = session.execute( “SELECT username, id FROM users.accounts”) for username, id in rows: print username, id

Slide 74

Slide 74 text

Executing Queries rows = session.execute(“”” INSERT INTO accounts (id, username) VALUES (%s, %s) “””, (123, “j’doe”))

Slide 75

Slide 75 text

Prepared Statements fetch_user = session.prepare( “SELECT * FROM accounts WHERE id=?”) for id in users_to_fetch: user = session.execute(fetch_user, [id])

Slide 76

Slide 76 text

Async Queries future = session.execute_async( “SELECT username, id FROM users.accounts”) # do some other work... rows = future.result()

Slide 77

Slide 77 text

Async Queries fetch = session.prepare( “SELECT * FROM accounts WHERE id=?”) futures = [] for id in users_to_fetch: future = session.execute_async(fetch, [id]) futures.append(future) users = [f.result()[0] for f in futures]

Slide 78

Slide 78 text

Async Queries def process_user(query_results): # do some work ... def log_error(query_exception): logger.error(“Bad stuff:”, query_exception) future = session.execute_async( “SELECT username, id FROM users.accounts”) future.add_callbacks(process_user, log_error)

Slide 79

Slide 79 text

Tracing •Breakdown of how long each step of the query took •Trace covers coordinator and replica nodes

Slide 80

Slide 80 text

Tracing query = SimpleStatement( “SELECT * FROM accounts WHERE id=%s”) session.execute(query, [user_id], trace=True) print query.trace

Slide 81

Slide 81 text

Python Driver Questions?

Slide 82

Slide 82 text

Twissandra •Twitter clone using Cassandra •github.com/twissandra/twissandra •‘cql3’ branch

Slide 83

Slide 83 text

Followers CREATE TABLE followers ( username text, follower text, since timestamp, PRIMARY KEY (username, follower) )

Slide 84

Slide 84 text

Followers CREATE TABLE followers ( username text, follower text, since timestamp, PRIMARY KEY (username, follower) ) •Partition by username •Cluster by follower •SELECT follower FROM followers WHERE username=?

Slide 85

Slide 85 text

Tweets CREATE TABLE tweets_by_user ( username text, tweet_id timeuuid, body text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC)

Slide 86

Slide 86 text

Tweets CREATE TABLE tweets_by_user ( username text, tweet_id timeuuid, body text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC) •Partition by user •Cluster by timeuuid (timestamp) •SELECT * FROM tweets_by_user WHERE username=? LIMIT 100

Slide 87

Slide 87 text

Tweets by Followed CREATE TABLE tweets_by_followed ( username text, tweet_id timeuuid, author_username text, body text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC)

Slide 88

Slide 88 text

Tweets by Followed CREATE TABLE tweets_by_followed ( username text, tweet_id timeuuid, author_username text, body text, PRIMARY KEY (username, tweet_id) ) WITH CLUSTERING ORDER BY (tweet_id DESC) •Denormalize, append to each follower’s timeline •Extra writes, cheaper reads