NoSQL Survey

NoSQL a datastore survey by: Sean McKibben’); DROP TABLE Students;--
Monday, March 18, 13

’Tis Himself Sean McKibben VP Engineering, Push IO -10 year
plan: Designer => Engineer C# => JRuby Flex => Ember.js Coke => Mexican Coke Cat => Dog Beer => Whiskey Steelcase => Herman Miller Volkswagen => Subaru @graphex Monday, March 18, 13

NoSQL RDBMS ACID Monday, March 18, 13

RDBMS Relational Database Management System Great for relational data You’ll
probably still use it for some stuff Monday, March 18, 13

RDBMS is well understood tooling skills concepts people column_name data_type
column_name data_type Model: Staff staff column_name data_type column_name data_type Model: FieldOperation ﬁeld_operations column_name data_type column_name data_type Model: Relationship belongs_to :contact belongs_to :account relationships column_name data_type column_name data_type Model: Account has_many :relationships accounts Versioned column_name data_type column_name data_type Model: Contact has_many :relationships contacts Versioned column_name data_type column_name data_type Model: Assignment assignments column_name data_type column_name data_type Model: OperationalLog operational_logs column_name data_type column_name data_type Model: Distributor has_many :relationships distributors Versioned column_name data_type column_name data_type Model: Relationship belongs_to :contact belongs_to :account distributor_relationships Monday, March 18, 13

Scaling up RDBMS RDBMS can scale, but you start losing
its beneﬁts very quickly. Monday, March 18, 13

Scaling up RDBMS Sharding/partitioning can be complex relating across shards
difﬁcult reliability Monday, March 18, 13

Scaling up RDBMS Master/slave Read replicas Shared-nothing How much do
you rely on ACID? Monday, March 18, 13

ACID Atomicity - all or nothing Consistency - all clients
get same results Isolation - concurrent changes result in the same end state as serial changes Durability - data not lost during failures Monday, March 18, 13

The Distributed Jungle Monday, March 18, 13

NoSQL CAP Concurrency Paxos CRDT Monday, March 18, 13

CAP AKA Brewer’s Theorem Hypothesized by Eric Brewer at the
2000 Symposium on Principles of Distributed Computing Primarily about distributed systems, but can apply to single system scenarios as well. Monday, March 18, 13

CAP Consistency Concurrent requests from multiple clients would return the
same results. Availability Every request receives a success/failure response. Partition Tolerance System continues to perform even with part of the system failing or failure of the network between system elements. Monday, March 18, 13

CAP CA Probably bad, because you can write to it
but if a partition exists it won't re-sync the state. CP Always consistent between all nodes, but goes down when any part of it goes down. AP Nodes can be partitioned but the system will remain available, and data will re-sync when the partition is removed. Monday, March 18, 13

Diagram from Nathan Hurst’s Blog http://blog.nahurst.com/visual-guide-to-nosql-systems Monday, March 18, 13

Paxos Paxos is a family of protocols for solving consensus
in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difﬁcult when the participants or their communication medium may experience failures. Tends to be a high latency process Few complete implementations The primary goal is to deal with edge cases Monday, March 18, 13

CRDT Convergent/Commutative Replicated Data Types INRIA Paper http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf Eventual consistency
aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufﬁcient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufﬁcient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement non-trivial CRDTs. Monday, March 18, 13

CRDT Counters, CRDT style are not 1+1 = 2 http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf

That was boring. Let’s look at NoSQL as one approach
to scaling your solution. This tends to be the quickest path to scalability if you’re trying to make use of the work of others for implementations of Paxos or CRDTs. Most datastores have known CAP behavior. Monday, March 18, 13

That was boring. So what elements are you looking for
in a NoSQL data store? Depends on you: Monday, March 18, 13

Brogrammer —We're at 2.1MM MAUs Brah, so I need ROFLScale
NoSQL FOR SPEED Monday, March 18, 13

Neckbeard — This operation will cost O(log(2n)2) Monday, March 18,
13

Statistician —30-56-99 are correct. Limited 4 and 8 are missing.

Let’s Start With Redis Monday, March 18, 13

Not really distributed "Data structure server" Single threaded, very fast
in-memory data store written in ANSI C Monday, March 18, 13

Playground of data structures Keys (basic access of values) Strings
(can be numbers, can increment/decrement) Hashes (hash values can be numbers, can increment/decrement) Lists (basically an array) Sets (great for de-duplication) Sorted Sets (each element has a score for sorting) Pub/Sub (can be used for coordination if you don’t mind SPOF) Transactions (deﬁne your own blocking set of commands) Server-Side Scripting (use Lua to control atomicity) Monday, March 18, 13

Great documentation site. Lists time complexity of every operation. You
can type in to the examples boxes and it will execute. Monday, March 18, 13

Your data must ﬁt in RAM Can be durable-ish Need
to be careful about master, slave and RDB vs AOF Sentinel could provide some degree of oversight Cluster would be great... but I'm not getting my hopes up Monday, March 18, 13

! Overall, most people use it like a shared cache
or coordinator. Can be used for locking/mutex stuff. Monday, March 18, 13

Brogrammer Brogrammer love Redis. It eats teh data real nice
from my node.js servorama. Monday, March 18, 13

Neckbeard Neckbeard appreciates that the time complexity is listed in
the documentation, and that it was written in ANSI C in a single thread without all that fancy threading whatnot. Monday, March 18, 13

Statistician Statistician has reservedly applied some data collection algorithms using
Redis, but typically produces outputs that need no data store. Monday, March 18, 13

Dynamo-style ! Other dynamo style: ! ! Cassandra ! !
Voldemort ! ! DynamoDB http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf Monday, March 18, 13

Very AP, with good availability, partition tolerance, and node recovery.

"eventually consistent" is sometimes challenging Existence operations are difﬁcult (usually
poll a few times if you want to be sure something is really not there) Monday, March 18, 13

Consistent hashing allows for great rebalancing with very little effort
Operationally quite nice, allows for tuning, monitoring and migrations. ! Never List Keys Monday, March 18, 13

The Ring Shared data structure Basically a CRDT communicated between
nodes Ring size must be selected carefully! Monday, March 18, 13

Buckets, Keys, and Values data store is pretty easy to
understand conceptually. Keys are partitioned into buckets. Values are stored as binary with a content-type. Secondary indexes are good, but may not ROFLScale. Never List Keys Monday, March 18, 13

Can't (yet) do a simple counter (unless you really go
feet wet with CRDT yourself) ! ! Riak DT should be available soon Monday, March 18, 13

Riak Search is going away Yokozuna: Riak with Solr integration
is due out soon. Until then: 2i and MapReduce (Erlang or JS) are about the only ways to get data out of Riak. Monday, March 18, 13

Brogrammer Brogrammer can store some JSON up in The Riak
Clizoud, but all this Ring stuff cuts in to happy hour time. Also, haven’t you seen the movie? 7 days bro. Monday, March 18, 13

Neckbeard 1.3 - Usefulness: Riak SHOULD eventually be a useful
datastore, however I MUST be prepared to change my implementation such that eventual consistency won't invalidate the primary theory of the algorithm. (RFC 30981) Monday, March 18, 13

Statistician Statistician is very underwhelmed with riak due to the
difﬁculty of data collection and aggregation. Monday, March 18, 13

Is anyone from 10Gen here? All information contained in this
presentation is non-factual opinions of a friend of a cousin of the presenter. Monday, March 18, 13

Document-oriented Master-slave setup, auto-sharding and replication Locking based consistency model
SAFE mode?? JOURNAL mode? Monday, March 18, 13

Can be a honey-pot, delivering good features until it hits
a certain scale or level of reliability. Strong consistency when reading from a master, not when reading from a replica. Monday, March 18, 13

Supports a number of features like 2d geospatial indexing, secondary
indexing, counters, sets. Generally easy to implement... Monday, March 18, 13

Controversy exists: Broken by Design: MongoDB Fault Tolerance http://hackingdistributed.com/2013/01/29/ mongo-ft/
"Don't store anything you can't afford to lose" Monday, March 18, 13

Brogrammer I got my Mongoose feeding my JSON to the
BSON storage last week, bro! Monday, March 18, 13

Statistician Seriously? We just discovered that 28.3% of last week's
request data was lost because Brogrammer forgot to call getLastError in his implementation of the purchasing feature.! Monday, March 18, 13

Neckbeard Ok, ﬁrst of all, let's consider what amount of
this data we may at some point need, and divide that by the amount we just lost. Luckily I’ve been keeping the server logs so we just need to write some regex to repopulate our data store. Monday, March 18, 13

Based on BigTable ! ! http://static.googleusercontent.com/external_content/ untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf A giant table
ordered by rowkey. Billions of rows, millions of columns. Monday, March 18, 13

Table Rowkey Column family Column name Timestamp =value. Monday, March
18, 13

Columns are part of a column family. Values can have
versions. Data is stored as byte arrays, so can be any type. Sometimes can be challenging in creating serialization/deserialization approaches. Monday, March 18, 13

Rows are lexically ordered, and the entry point is the
lowest value. Regions are determined from the rowkey. Using a simple timestamp as a rowkey hotspots the region server, and makes it difﬁcult to access the most recent data. Typical solution is to preﬁx the timestamp, and use a timestamp that counts down instead of up. Monday, March 18, 13

HBase only recommends a few column families per table, so
you have to be pretty frugal. Columns are sparse, so you can have tons of columns in one row, and only a few in the next row. myrow1: cf1:alpha=foo cf1:bravo=bar cf1:charlie=baz myrow2: cf1:alpha=foo cf1:charlie=baz Monday, March 18, 13

Relatively complex set up Requires Zookeeper to coordinate region servers
Can have multiple hot masters which zookeeper will switch between Allows mostly ACID stuff at the row level Fast writes and good read times Monday, March 18, 13

Brogrammer Just like my old job busting out Excel sheets!

Neckbeard I’ve developed an algorithm that distributes our row keys
into, at maximum, 649 different regions, so our load will be spread over our 8 servers very well. Monday, March 18, 13

Statistician My Hadoop MapReduce job was able to supply its
input from the 8.9B HBase rows to determine that our average cost per page view is $0.00000378 Monday, March 18, 13

HyperDex Memcache EHCache DynamoDB CouchDB RethinkDB LevelDB Neo4j Others Monday,
March 18, 13

Push IO is hiring DevOps and Devs! [email protected] Q&A Monday,
March 18, 13

NoSQL Survey

NoSQL Survey

More Decks by Sean McKibben

Other Decks in Technology

Featured

Transcript