Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Non-Relational Storage for Relational Person

Non-Relational Storage for Relational Person

This deck is copy of presentation delivered at TechED 2012. It is 101 for people who are comfortable with relational store (sql server, mysql, postgres, oracle, db2) about Non-Relational store (popularly known as No-Sql) .
Please also see cmohan's blog articles - http://cmohan.tumblr.com/post/20274650560/part-2-the-nosql-hoopla-what-is-nonsensql-about-it
http://cmohan.tumblr.com/post/20317650254/part-3-the-nosql-hoopla-what-is-nonsensql-about-it

Govind Kanshi

March 29, 2012
Tweet

More Decks by Govind Kanshi

Other Decks in Technology

Transcript

  1. 1

  2. 2

  3. 3

  4. 4

  5. Demo Access of database instance via 1. Tool – query

    analyzer 2. Application – which binds to library and sends commands Use SQL dialect to create Use SQL dialect to execute insert/retrieve data Indexes help to retrieve the data Tool helps in understanding the query plan Query Optimizers are standard Transactions SELECT vs UPDATE Why? Isolate & provide durable updates/inserts Use locks, 2 phase commit to provide ACID Monitoring – DMV, Perfcounters,Logs,Profilers, ETW Management – Backup,restore,cleanup,ETL,Integrity of data Summary 1. ODBC/OLEDB – usage – because of a standard – data stores can provide similar looking library which can be used in client programs 2. Connection over TCPIP, transfer of data/command getting serialized to TDS is all hidden 3. SQL DSL is king 4. Transactions are given – allow developer not to worry about the inconsistency, he just has to deal with error message 5
  6. Data type - Store/retrieve structured –schematized data vs blobs, xmls

    HA/DR Understood very well to handle machine, location, upgrade downtime SLAs drive the engagement Scale -Resolve issues locally using DMVs/query analysis and scale CPU/Memory/IO -Cache most frequently used items elsewhere - Table Partitioning -Sharding / Federated DB -RW seperation 6
  7. What we want ideally is storage of different type of

    data, scale of data, availability in case of failure and off course as much consistency. • Data can’t fit on one node (availability/scale), SANs expensive So Replicating and partitioning data over many servers • Reliability/Availability - Nodes, network, hardware will fail(SANs expensive) Replicate data to multiple nodes and remain available No single point of failure Different Data types for different domains - different data store ** Tradeoff Income tax submissions over years increase monotonically, how to take care of it. Reshufling, schema change is challenging considering it is partitioned on year. Availability goes down as more partitions are added. 90%*90%*90 – you need reduntant storage 7
  8. 8

  9. KeyValue- What is it -Key-value data model (dictionary) Context of

    using it – Store cache, lookup, shopping cart Who uses it – All web apps, mostly on service layer Example - E.g. Amazon Dynamo,Voldemort, Tokyo Cabinet, Azure storage Document Store What is it -Schema-less JSON-style documents Context of using it – Store low friction json representation of “object” Who uses it – Foursquare uses MongoDB for checkin Example - E.g. CouchDB, MongoDB Column-Oriented Data Stores (Big Table clones) Store & process data by column- & row Datawarehousing, large concurrent available writes Adobe uses Hbase for storing/doing analysis E.g. Google’s BigTable/AppEngine Datastore, Cassandra, Hbase Graph Store Store and retrieve,explore relationships between entities When data needs to be modelled as graph News firms use variation to explore – relationships,find connections E.g. Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension Search Store 9
  10. Store and retrieve information which needs to be found quickly

    Search at large scale is the main focus Ecommerce stores, Internal knowledge stores, Job sites Solr Memcached demonstrated that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes. Dynamo pioneered the idea of eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually. BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to. 9
  11. What is it -Key-value data model (dictionary) Context of using

    it – Store cache, lookup, shopping cart Who uses it – All web apps, mostly on service layer Example - E.g. Amazon Dynamo,Voldemort, Tokyo Cabinet, Azure storage 10
  12. Azure Storage is great option for many-2 scenarios with scale

    requirements. It fills the other end when compared to memory-based stores. Redis too can do persistence but it is on demand(last I checked the save command, there is nothing like a “WAL” – write ahead log). Speaking notes “Indexing” (sic) is on key. Query is API or rest based Transactions are either lease/timestamp based. Failover depends on whether it is memory based on persistence based. Riak handles failover via read/write quorums. HA is through replicas for Azure storage. Redis does not have replicas. Riak does have the replicas. Scale in Azure storage is based on account, Redis – Vertical scale as of now, Riak dynamic node addition. Joins – (left as an exercise – if you need join – you need to re-think the model) 11
  13. Similar to collections/dictionaries/associative arrays/caches Redis – Redis lists to be

    useful as a simple fifo work queue Set datatype, which has all of the union, intersection, and difference operations available across set keys. Common use-cases include de-duplicating items for work queues. Hashes are Useful as a way of gathering similar kinds of data together, a hash can store a row from a database table with each column an entry, which allows for sorting and retrieval via sort and the various hash access methods. Redis has the zset or sorted set data type, - here each member in the set also has an associated float/double score, which produces an ordering over all keys in the sorted set, and which you can query by member, score, or rank. Some common use cases include priority queues, tag clouds, timeouts, rate limiting Dynamo or its available counterpart from LinkedIn Voldemort PNUTs 12
  14. Document Store What is it -Schema-less JSON-style documents Context of

    using it – Store low friction json representation of “object” Who uses it – Foursquare uses MongoDB for checkin Example - E.g. CouchDB, MongoDB 13
  15. Trivial demo to show eod – document store too is

    a “key value” store  with ability to index on attributes. It has replication, sharding. Speaking notes “Indexing” (sic) is on key, allowed on attributes Query is API or rest based Transactions – depends on implementation Failover depends on whether it is memory based on persistence based. Riak handles failover via read/write quorums. MongoDB has replica sets. CouchDB believes in failfast. HA is through replicas for MongoDB. Scale in MongodB storage is replicas. Joins/Stored proc – MR through javascript. 14
  16. Examples Posterous - store the posts in Riak with bitcask

    (tried redis – could not fit data into ram, mongodb – difficult to operate) Archiving at Craigslist – - 2.2B historical posts, semi-structured - Relatively large blobs: avg 2KB, max > 4 MB Checkin at Foursquare Small location based updates, sharded on user Geo-spatial indexing Partitioning – Range based in MongoDB Availability Data type 15
  17. Column-Oriented Data Stores (Big Table clones) Store & process data

    by column- & row Datawarehousing, large concurrent available writes Adobe uses Hbase for storing/doing analysis E.g. Google’s BigTable/AppEngine Datastore, Cassandra, Hbase Example of 5 columns of 10 bytes each and 10 rows 500 bytes If you need to read the price > 20 I need to only load the 50 bytes Daniel Abadi • http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf – Are row-stores same as column stores • http://db.csail.mit.edu/projects/cstore/vldb.pdf – C-Store • Off course the mother of them all - http://labs.google.com/papers/bigtable-osdi06.pdf • http://labs.google.com/papers/gfs-sosp2003.pdf 16
  18. “Indexing” (sic) is on row,key, allowed on attributes Query is

    API or rest based Transactions – depends on implementation Failover – read/write quorums. HA is possible through replicas. Scale is via node addition. Joins/Stored proc – Keep column-family to store columns together. Not very sure of Hypertable but blows Cassandra out of water for performance. 17
  19. Based off Google BigTable Very Similar to Relational stores yet

    different Examples Facebook – messages(Hbase) vs wall vs profile Simple consistency, sharding, load balancing,replication Messages in Hbase , Cassandra for inbox search Twitter – Hbase replicate data from mysql, hadoop analytics Netflix migrated from Oracle to cassandra Store customer profile, movie watch log, complex usage logging No locking *Payment Benefits DR – async inter-data center replication No downtime for schema changes Eventual or strong consistency Cassandra sharding - Consistent Hashing(cassandra) Hbase Sharding – Range partitioning Reading - http://dbmsmusings.blogspot.in/2010/03/distinguishing-two-major-types-of_29.html 18
  20. Scaling http://www.listware.net/201102/neo4j-user/84469-neo4j-cache-sharding-blog-post.html FlockDB FlockDB is much simpler than other graph

    databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites. Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic of 20k writes/second and 100k reads/second. https://github.com/twitter/flockdb 19
  21. Reed specialist recruitment Data – Hundred of million documents Hundred

    of fields in db which evolve Multiple languages Extract data from config, resume and oracle db and insert into solr. Query via facets 20
  22. Why this discussion – Databases provide peace of mind by

    being ACID compliant. NoSql Store provide scale, availability by loosening up the consistency requirements across x nodes at same time. NoSql stores (purely memory based) can at worse loose data, can be inconsistent. Reference for above - http://nosql.mypopescu.com/post/12466059249/anonymous- post-dont-use-mongodb Amazon’s dynamo paper http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html ACID vs. BASE (cf. [Bre00, slide 13]) Other approach How ACID can be fixed without abandoning it http://db.cs.yale.edu/determinism-vldb10.pdf “The root of these problems lies in the isolation property within ACID. In particular, the serializability property (which is the standard isolation level for fully ACID systems) guarantees that execution of a set of transactions occurs in a manner equivalent to some sequential, non-concurrent execution of those transactions, even 21
  23. if what actually happens under the hood is highly threaded

    and parallelized. So if three transactions (let's call them A, B and C) are active at the same time on an ACID system, it will guarantee that the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B, etc. This obviously causes problems for replication. If a set of (potentially non- commutative) transactions is sent to two replicas of the same system, the two replicas might each execute the transactions in a manner equivalent to a different serial order, allowing the replicas' states to diverge. “ Two concepts which distinquish Databases and Non-Relational stores Acid - properties that guarantees transaction are reliable, durable In certain ultra-scale web services, availability and partition tolerance are more important than consistency. BASE Basically available, Soft state and eventually consistency 21
  24. 23

  25. 24

  26. Tunable Consistency Off course we will not get into latency

    and consistency and leave Abadi to handle it - http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos- little.html Eventual consistency refers to a strategy used by many distributed systems to improve query and update latencies, and in a more limited way, to provide stronger availability to a system than could otherwise be attained. There are a lot of parameters which come into play when trying to predict or model the performance of distributed systems. How many nodes can die before data is lost? How many can die without affecting the usability of the system? Are there any single points of failure? Can the system be used if, for some period of time, half of the nodes can’t see the other half? How fast can I get data out of the system? How fast can it accept new data? 25
  27. 29

  28. how to model your data what data access patterns will

    your application need how you deal with data integrity/consistency (what happens if two applications will need to access the same data in read/write mode?) what is the final complexity, performance, scalability of the solution based on the decisions you’ve made to the above points. Access Mechanims – REST vs API vs SQL (like mechanisms of cassandara) Joins ? – mostly not – document db allow nesting and transactions, columnar stores allow storing columns together and spreading them Other ways to skin Data model – kv, column API – single tuple, range Partition (ordered/random) Optimized for reads/writes Versions – version/timestamp Replication – quorum/filesystem Arch – decentralized, hierarchical Data center aware? 30
  29. 32

  30. 33

  31. 34

  32. 35

  33. 36

  34. Eventual consistency demo - http://www.eecs.berkeley.edu/~pbailis/projects/pbs/# Paper - http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-4.pdf Fay Chang,Jeffrey

    Dean,etal.2008.Bigtable:A Distributed Storage System for Structured Data. Giuseppe De Candia,etal.2007. Dynamo:amazon'shighlyavailablekey-valuestore Designs,Lessons and Advice from Building Large Distributed Systems– JeffDean(GoogleFellow)• The Anatomy of the Google Architecture(EdAustin) •Dynamo:Amazon’s Highly Available Key-valueStore(JagrutSharma) •Ricardo Vilaca,Francisco Cruz,and Rui Oliveira - 2010.On the expressiveness and trade-offs of large scale tuple stores Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the WWW, STOC’97 • Chord: A Scalable Peer‐to‐Peer Lookup Service for Internet ApplicaLons, Stoica et al, SIGCOMM’01 The Chubby Lock Service for Loosely Coupled Distributed Systems, Mike Burrows, OSDI’06 • Paxos Made Simple, Leslie Lamport 37