Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Survey

NoSQL Survey

An overview of some SQL vs NoSQL decision points, and a survey of some key NoSQL data stores.

Sean McKibben

March 18, 2013
Tweet

More Decks by Sean McKibben

Other Decks in Technology

Transcript

  1. ’Tis Himself Sean McKibben VP Engineering, Push IO -10 year

    plan: Designer => Engineer C# => JRuby Flex => Ember.js Coke => Mexican Coke Cat => Dog Beer => Whiskey Steelcase => Herman Miller Volkswagen => Subaru @graphex Monday, March 18, 13
  2. RDBMS Relational Database Management System Great for relational data You’ll

    probably still use it for some stuff Monday, March 18, 13
  3. RDBMS is well understood tooling skills concepts people column_name data_type

    column_name data_type Model: Staff staff column_name data_type column_name data_type Model: FieldOperation field_operations column_name data_type column_name data_type Model: Relationship belongs_to :contact belongs_to :account relationships column_name data_type column_name data_type Model: Account has_many :relationships accounts Versioned column_name data_type column_name data_type Model: Contact has_many :relationships contacts Versioned column_name data_type column_name data_type Model: Assignment assignments column_name data_type column_name data_type Model: OperationalLog operational_logs column_name data_type column_name data_type Model: Distributor has_many :relationships distributors Versioned column_name data_type column_name data_type Model: Relationship belongs_to :contact belongs_to :account distributor_relationships Monday, March 18, 13
  4. Scaling up RDBMS RDBMS can scale, but you start losing

    its benefits very quickly. Monday, March 18, 13
  5. ACID Atomicity - all or nothing Consistency - all clients

    get same results Isolation - concurrent changes result in the same end state as serial changes Durability - data not lost during failures Monday, March 18, 13
  6. CAP AKA Brewer’s Theorem Hypothesized by Eric Brewer at the

    2000 Symposium on Principles of Distributed Computing Primarily about distributed systems, but can apply to single system scenarios as well. Monday, March 18, 13
  7. CAP Consistency Concurrent requests from multiple clients would return the

    same results. Availability Every request receives a success/failure response. Partition Tolerance System continues to perform even with part of the system failing or failure of the network between system elements. Monday, March 18, 13
  8. CAP CA Probably bad, because you can write to it

    but if a partition exists it won't re-sync the state. CP Always consistent between all nodes, but goes down when any part of it goes down. AP Nodes can be partitioned but the system will remain available, and data will re-sync when the partition is removed. Monday, March 18, 13
  9. Paxos Paxos is a family of protocols for solving consensus

    in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures. Tends to be a high latency process Few complete implementations The primary goal is to deal with edge cases Monday, March 18, 13
  10. CRDT Convergent/Commutative Replicated Data Types INRIA Paper http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf Eventual consistency

    aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement non-trivial CRDTs. Monday, March 18, 13
  11. That was boring. Let’s look at NoSQL as one approach

    to scaling your solution. This tends to be the quickest path to scalability if you’re trying to make use of the work of others for implementations of Paxos or CRDTs. Most datastores have known CAP behavior. Monday, March 18, 13
  12. That was boring. So what elements are you looking for

    in a NoSQL data store? Depends on you: Monday, March 18, 13
  13. Not really distributed "Data structure server" Single threaded, very fast

    in-memory data store written in ANSI C Monday, March 18, 13
  14. Playground of data structures Keys (basic access of values) Strings

    (can be numbers, can increment/decrement) Hashes (hash values can be numbers, can increment/decrement) Lists (basically an array) Sets (great for de-duplication) Sorted Sets (each element has a score for sorting) Pub/Sub (can be used for coordination if you don’t mind SPOF) Transactions (define your own blocking set of commands) Server-Side Scripting (use Lua to control atomicity) Monday, March 18, 13
  15. Great documentation site. Lists time complexity of every operation. You

    can type in to the examples boxes and it will execute. Monday, March 18, 13
  16. Your data must fit in RAM Can be durable-ish Need

    to be careful about master, slave and RDB vs AOF Sentinel could provide some degree of oversight Cluster would be great... but I'm not getting my hopes up Monday, March 18, 13
  17. ! Overall, most people use it like a shared cache

    or coordinator. Can be used for locking/mutex stuff. Monday, March 18, 13
  18. Brogrammer Brogrammer love Redis. It eats teh data real nice

    from my node.js servorama. Monday, March 18, 13
  19. Neckbeard Neckbeard appreciates that the time complexity is listed in

    the documentation, and that it was written in ANSI C in a single thread without all that fancy threading whatnot. Monday, March 18, 13
  20. Statistician Statistician has reservedly applied some data collection algorithms using

    Redis, but typically produces outputs that need no data store. Monday, March 18, 13
  21. Dynamo-style ! Other dynamo style: ! ! Cassandra ! !

    Voldemort ! ! DynamoDB http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf Monday, March 18, 13
  22. "eventually consistent" is sometimes challenging Existence operations are difficult (usually

    poll a few times if you want to be sure something is really not there) Monday, March 18, 13
  23. Consistent hashing allows for great rebalancing with very little effort

    Operationally quite nice, allows for tuning, monitoring and migrations. ! Never List Keys Monday, March 18, 13
  24. The Ring Shared data structure Basically a CRDT communicated between

    nodes Ring size must be selected carefully! Monday, March 18, 13
  25. Buckets, Keys, and Values data store is pretty easy to

    understand conceptually. Keys are partitioned into buckets. Values are stored as binary with a content-type. Secondary indexes are good, but may not ROFLScale. Never List Keys Monday, March 18, 13
  26. Can't (yet) do a simple counter (unless you really go

    feet wet with CRDT yourself) ! ! Riak DT should be available soon Monday, March 18, 13
  27. Riak Search is going away Yokozuna: Riak with Solr integration

    is due out soon. Until then: 2i and MapReduce (Erlang or JS) are about the only ways to get data out of Riak. Monday, March 18, 13
  28. Brogrammer Brogrammer can store some JSON up in The Riak

    Clizoud, but all this Ring stuff cuts in to happy hour time. Also, haven’t you seen the movie? 7 days bro. Monday, March 18, 13
  29. Neckbeard 1.3 - Usefulness: Riak SHOULD eventually be a useful

    datastore, however I MUST be prepared to change my implementation such that eventual consistency won't invalidate the primary theory of the algorithm. (RFC 30981) Monday, March 18, 13
  30. Statistician Statistician is very underwhelmed with riak due to the

    difficulty of data collection and aggregation. Monday, March 18, 13
  31. Is anyone from 10Gen here? All information contained in this

    presentation is non-factual opinions of a friend of a cousin of the presenter. Monday, March 18, 13
  32. Can be a honey-pot, delivering good features until it hits

    a certain scale or level of reliability. Strong consistency when reading from a master, not when reading from a replica. Monday, March 18, 13
  33. Supports a number of features like 2d geospatial indexing, secondary

    indexing, counters, sets. Generally easy to implement... Monday, March 18, 13
  34. Brogrammer I got my Mongoose feeding my JSON to the

    BSON storage last week, bro! Monday, March 18, 13
  35. Statistician Seriously? We just discovered that 28.3% of last week's

    request data was lost because Brogrammer forgot to call getLastError in his implementation of the purchasing feature.! Monday, March 18, 13
  36. Neckbeard Ok, first of all, let's consider what amount of

    this data we may at some point need, and divide that by the amount we just lost. Luckily I’ve been keeping the server logs so we just need to write some regex to repopulate our data store. Monday, March 18, 13
  37. Columns are part of a column family. Values can have

    versions. Data is stored as byte arrays, so can be any type. Sometimes can be challenging in creating serialization/deserialization approaches. Monday, March 18, 13
  38. Rows are lexically ordered, and the entry point is the

    lowest value. Regions are determined from the rowkey. Using a simple timestamp as a rowkey hotspots the region server, and makes it difficult to access the most recent data. Typical solution is to prefix the timestamp, and use a timestamp that counts down instead of up. Monday, March 18, 13
  39. HBase only recommends a few column families per table, so

    you have to be pretty frugal. Columns are sparse, so you can have tons of columns in one row, and only a few in the next row. myrow1: cf1:alpha=foo cf1:bravo=bar cf1:charlie=baz myrow2: cf1:alpha=foo cf1:charlie=baz Monday, March 18, 13
  40. Relatively complex set up Requires Zookeeper to coordinate region servers

    Can have multiple hot masters which zookeeper will switch between Allows mostly ACID stuff at the row level Fast writes and good read times Monday, March 18, 13
  41. Neckbeard I’ve developed an algorithm that distributes our row keys

    into, at maximum, 649 different regions, so our load will be spread over our 8 servers very well. Monday, March 18, 13
  42. Statistician My Hadoop MapReduce job was able to supply its

    input from the 8.9B HBase rows to determine that our average cost per page view is $0.00000378 Monday, March 18, 13