Cassandra. Bass. Databass.

Cassanra. Bass. Data Bass. Tuesday, April 16, 13

Databass Tuesday, April 16, 13

Big brother is watching you Tuesday, April 16, 13

Big sister is watching you Tuesday, April 16, 13

Tuesday, April 16, 13

Let’s w⌀rm up Tuesday, April 16, 13

Y U NO FAQ Tuesday, April 16, 13

Consistent Better than available? Also, what’s you slave replication lag?
Also, are you caches, search index, and all denormalizations consistent at all times? Is EVERYTHING you do is happening with user watching it? Y U NO? Tuesday, April 16, 13

JOIN / UNION Yeah, across multi-terabyte cluster. Also, availability? Cascading
updates? Read performance? Y U NO? Tuesday, April 16, 13

Transactions Sure, just wait a couple of minutes until I
roll-back previous one Or go buy Oracle. NOW! Y U NO? Tuesday, April 16, 13

World Tuesday, April 16, 13

• Amount of data is growing • Frequency/rate of incoming
data is growing • Demand in scalable services is growing • Application backends get larger • Data loss and corruption is unacceptable • Variety of data is growing World expectations Tuesday, April 16, 13

• Building scalable backends is hard, no matter which tech
you use • Lack of tool knowledge will bite you • Not knowing your numbers makes your choices blind • Eventual consistency is eventual • Conflict resolution strategies matter • Picking a nosql data store will require you to write more backend code World expectations Tuesday, April 16, 13

• Hundreds, thousands of requests per second • Write (and
read) heavy applications • How to make sense of data • Eliminate single point of failure • Predictable latency Challenges Tuesday, April 16, 13

• Predictable latency • Horizontal scaling • Continuous availability •
Reducing system complexity • Avoiding data loss (accepting writes) • Storing data in a meaningful way • Prepare data for reading Challenges Tuesday, April 16, 13

Before you jump bandwagon Tuesday, April 16, 13

• Application data model • Read and write patterns •
What can be shown immediately, what can wait (think: dashboards vs shopping carts) • How to build backend that’ll make sure it all is ticking • Learn how your clustered client works • Know your potential problems with consistency Things to ponder upon before joining nosql camp Tuesday, April 16, 13

Possibilities Tuesday, April 16, 13

• Will vary depending on: • the size of your
DB • load on the database • frequency of written data Possibilities Tuesday, April 16, 13

• “ultra-hot” rather small datasets in memory • make it
possible to reconstruct your data • write increments (changes) • reconstruct final views from history • for the data that doesn’t tolerate inconsistency, make sure to use appropriate DSs • avoid single point of failure at all costs Possibilities Tuesday, April 16, 13

Use cases Tuesday, April 16, 13

• Session store • Time series, logging • Business intelligence
backend • Caching/speed layer (denormalized, read-ready) • Round-robin/capped collections Use Cases Tuesday, April 16, 13

Elastic Increase throughput linearly by adding more machines Tuesday, April
16, 13

Decentralized Fault-tolerant, no single point of failure, master-less Tuesday, April
16, 13

Under the hood Tuesday, April 16, 13

• Consistent hashing • Virtual nodes • SSTables • Partitioners
• Hinted handoff • Node/data balancing • Gossip protocol • Conflict resolution Cassandra, under the hood Tuesday, April 16, 13

Under the Hood Partitioners Tuesday, April 16, 13

• determine how data is distributed across nodes • with
growth of nodes amount, minimal amount of data is affected • Murmur3, Random (consistent hashing) • Ordered Partitioner • allows ordered scans by partition key • difficult load balancing • hotspots because of sequential writes Under The Hood / Paritioners Tuesday, April 16, 13

Under the Hood VNodes Tuesday, April 16, 13

• many token ranges per node • each node owns
small range, which reduces load • does not require token generation and assignment • helps to avoid rebalancing • rebuilding dead node is faster, since data is copied from more nodes • major improvements for heterogenous cluster Under The Hood / VNodes Tuesday, April 16, 13

Under The Hood / Virtual Nodes Tuesday, April 16, 13

Under the Hood CommitLog Tuesday, April 16, 13

• crash recovery log for data • by default, mode
is `periodic` • move commitlog to differnt drive (to reduce contention with SSTable) • configure commitlog size • smaller will improve replay time • flush infrequently-updated CFs to disk • (slight) chance of lost written data, if all replicas go down during sync timeout period • check `batch` instead, to sync before ack'ing write Under The Hood / CommitLog Tuesday, April 16, 13

Under the Hood Memtable Tuesday, April 16, 13

• in-memory cache with key/column structure • per ColumnFamily •
sorted by key and written sequentially • flushed to SSTable when full, in background Under The Hood / Memtable Tuesday, April 16, 13

Under the Hood SSTable Tuesday, April 16, 13

• Sorted String Table • Immutable • has BloomFilter, for
read optimization • reads (potentially) require to combine row fragments from multiple SSTables and unflushed memtables Under The Hood / SSTable Tuesday, April 16, 13

• has index (data location by key) • holds data
itself • multiple per CF • merged after threshold is reached Under The Hood / SSTable Tuesday, April 16, 13

• merges SSTables • changes for same key on different
SSTables • discards tombstones, reclaims space • refreshes index of SSTable (changed addresses) Under The Hood / SSTable / Compaction Tuesday, April 16, 13

Under the Hood Hinted Handoff Tuesday, April 16, 13

• by default, sloppy quorum is off • toggled by
coordinator when target node is unavailable • hint is replicated back when node is back Under The Hood / Hinted Handoff Tuesday, April 16, 13

Under the Hood Read Path Tuesday, April 16, 13

• client connects to coordinator node • technically, any node
you connect to may be a coordinator • coordinator determines nodes responsible for the key • responsible nodes are sorted by proximity Under The Hood / Read Path / On Coordinator Tuesday, April 16, 13

• node checks memtable for data • if not found,
data is read from SSTables Under The Hood / Read Path / On the node Tuesday, April 16, 13

BOOM filters Tuesday, April 16, 13

• using bloom filters • false positives are possible, but
not false negatives • determine if SSTable contains the key • avoiding additional disk seeks • uses index to locate data fast • data is returned to client Under The Hood / Read Path / On the node / BOOM filters Tuesday, April 16, 13

Under the Hood Write Path Tuesday, April 16, 13

• pretty much same as read pat • data is
first written to CommitLog • then, data is written to Memtable • if there's not enough nodes to receive a write • coordinator takes a hinted write Under The Hood / Write Path Tuesday, April 16, 13

Under the Hood Conflict Resolution Tuesday, April 16, 13

• Riak guys like to mention lack of vector clocks
• Yeah, it's timestamp-based • Grab yourself some ntp • or a TAAS (timestamp as a service) • but keep in mind that... • writes are column-based, not row-based Under The Hood / Conflict Resolution Tuesday, April 16, 13

CREATE KEYSPACE new_cql_keyspace ... CREATE TABLE posts (content text, entry_title
text, userid text, PRIMARY KEY (userid)); INSERT INTO posts (userid, entry_title, content) VALUES ('user1', 'title1', 'content1') USING TIMESTAMP 1365885294994; Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13

userid | content | entry_title --------+----------+------------- user1 | content1 |
title1 Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13

get posts[user1]; => (column=content, value=636f6e74656e7431, timestamp=1365885294994) => (column=entry_title, value=7469746c6531, timestamp=1365885294994)
Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13

UPDATE posts USING TIMESTAMP 1365885294995 SET entry_title='new title' WHERE userid='user1';
Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13

get posts[user1]; => (column=content, value=636f6e74656e7431, timestamp=1365885294994) ;; OLD => (column=entry_title,
value=636f6e74656e7431, timestamp=1365885294995) ;; NEW Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13

My favorite C* features Tuesday, April 16, 13

Data types Tuesday, April 16, 13

• There are actual data types, yes! • ASCII, String
• Boolean • all kinds of numbers • Binary type • Date type • Counters My favorite C* features / Data Types Tuesday, April 16, 13

Store binary data Tuesday, April 16, 13

Range query by date Tuesday, April 16, 13

Composite types Tuesday, April 16, 13

• CompositeType (Int32Type, DateType) • Sets #(1, 2, 3) •
Lists [1, 2 ,3] • Maps {:a “value1” :b “value2”} • Downside: no indexes on collections My favorite C* features / Data Types Tuesday, April 16, 13

Keys Tuesday, April 16, 13

• Compound primary keys • consists of more than one
column • allows wide range of queries • Compound primary keys • key itself consists of 2 values • Clustering order • allows on-disk sorting of columns My favorite C* features / Keys Tuesday, April 16, 13

• Partition key and Clustering columns • first key is
partition key • determines node placement • rest is clustering columns • insert,update,delete ops for same partition key are atomic and isolated • values that share partition key are stored on same node(s) My favorite C* features / Keys Tuesday, April 16, 13

Querying Tuesday, April 16, 13

• Work best when you index data properly • Range
queries • “IN” queries My favorite C* features / Querying Tuesday, April 16, 13

MapReduce Tuesday, April 16, 13

• Not quite built-in • But Cassandra provides nice Hadoop
integration • We use Cascading/Cascalog & Hadoop My favorite C* features / MapReduce Tuesday, April 16, 13

TTL Tuesday, April 16, 13

• Useful for caches • Specify how long your data
lives My favorite C* features / TTL Tuesday, April 16, 13

LIMIT remember those “How do I paginate in Cassandra” StackOverflow
questions in early days? Tuesday, April 16, 13

Batch Operations Tuesday, April 16, 13

• Saves network round-trips • Updates in BATCH on partition
key are performed atomically and in isolation • Supports only Update, Insert and Delete My favorite C* features / Batch Operations Tuesday, April 16, 13

Prepared Statements Tuesday, April 16, 13

• Parse once, execute many times • Major speed-up •
Very simple to work with / serialize complex DSs and binary data My favorite C* features / Prepared Statements Tuesday, April 16, 13

Schema Tuesday, April 16, 13

• Reduces overhead of communication between teams • Easy to
see what’s the structure table • Easy to see what / how to query My favorite C* features / Prepared Statements Tuesday, April 16, 13

Conclusions Tuesday, April 16, 13

• Beware of • Schema changes • Network failures •
Storage (or hardware/software) upgrades • Know how to handle them with your store My favorite C* features / Conclusions Tuesday, April 16, 13

• Good drivers (FINALLY), after years of Hector/ Thrift •
Binary CQL protocol, new possibilities • Internals became way more powerful with 1.2 • Ease of use/development • Still some work to do (Hadoop integration is still on Thrift) My favorite C* features / Conclusions Tuesday, April 16, 13

• Awesome database • Predictable • Know your data before
using anything • Know your tool before using it • Make conscious choices My favorite C* features / Conclusions Tuesday, April 16, 13

• Forget about normalized data • Plan against data usage
• Have a powerful backend to conform to the store • Determine best caching strategy • Use windowed operations (EEP) for pre- aggregation, instead of believing realtime ad-hoc myth My favorite C* features / Conclusions Tuesday, April 16, 13

Tuesday, April 16, 13

ifesdjeen clojurewerkz michaelklishin Also, you should follow Tuesday, April 16,
13

Cassandra. Bass. Databass.

Cassandra. Bass. Databass.

More Decks by αλεx π

Featured

Transcript