$30 off During Our Annual Pro Sale. View Details »

Non-Relational Storage for Relational Person

Non-Relational Storage for Relational Person

This deck is copy of presentation delivered at TechED 2012. It is 101 for people who are comfortable with relational store (sql server, mysql, postgres, oracle, db2) about Non-Relational store (popularly known as No-Sql) .
Please also see cmohan's blog articles - http://cmohan.tumblr.com/post/20274650560/part-2-the-nosql-hoopla-what-is-nonsensql-about-it
http://cmohan.tumblr.com/post/20317650254/part-3-the-nosql-hoopla-what-is-nonsensql-about-it

Govind Kanshi

March 29, 2012
Tweet

More Decks by Govind Kanshi

Other Decks in Technology

Transcript

  1. 1

    View Slide

  2. 2

    View Slide

  3. 3

    View Slide

  4. 4

    View Slide

  5. Demo
    Access of database instance via
    1. Tool – query analyzer
    2. Application – which binds to library and sends commands
    Use SQL dialect to create
    Use SQL dialect to execute insert/retrieve data
    Indexes help to retrieve the data
    Tool helps in understanding the query plan
    Query Optimizers are standard
    Transactions
    SELECT vs UPDATE
    Why? Isolate & provide durable updates/inserts
    Use locks, 2 phase commit to provide ACID
    Monitoring – DMV, Perfcounters,Logs,Profilers, ETW
    Management – Backup,restore,cleanup,ETL,Integrity of data
    Summary
    1. ODBC/OLEDB – usage – because of a standard – data stores can provide similar looking library which can be
    used in client programs
    2. Connection over TCPIP, transfer of data/command getting serialized to TDS is all hidden
    3. SQL DSL is king
    4. Transactions are given – allow developer not to worry about the inconsistency, he just has to deal with error
    message
    5

    View Slide

  6. Data type
    - Store/retrieve structured –schematized data vs blobs, xmls
    HA/DR
    Understood very well to handle machine, location, upgrade downtime
    SLAs drive the engagement
    Scale
    -Resolve issues locally using DMVs/query analysis and scale CPU/Memory/IO
    -Cache most frequently used items elsewhere
    - Table Partitioning
    -Sharding / Federated DB
    -RW seperation
    6

    View Slide

  7. What we want ideally is storage of different type of data, scale of data, availability in case of failure
    and off course as much consistency.
    • Data can’t fit on one node (availability/scale), SANs expensive
    So Replicating and partitioning data over many servers
    • Reliability/Availability - Nodes, network, hardware will fail(SANs expensive)
    Replicate data to multiple nodes and remain available
    No single point of failure
    Different Data types for different domains - different data store
    ** Tradeoff
    Income tax submissions over years increase monotonically, how to take care of it. Reshufling, schema
    change is challenging considering it is partitioned on year.
    Availability goes down as more partitions are added. 90%*90%*90 – you need reduntant storage
    7

    View Slide

  8. 8

    View Slide

  9. KeyValue-
    What is it -Key-value data model (dictionary)
    Context of using it – Store cache, lookup, shopping cart
    Who uses it – All web apps, mostly on service layer
    Example - E.g. Amazon Dynamo,Voldemort, Tokyo Cabinet, Azure storage
    Document Store
    What is it -Schema-less JSON-style documents
    Context of using it – Store low friction json representation of “object”
    Who uses it – Foursquare uses MongoDB for checkin
    Example - E.g. CouchDB, MongoDB
    Column-Oriented Data Stores (Big Table clones)
    Store & process data by column- & row
    Datawarehousing, large concurrent available writes
    Adobe uses Hbase for storing/doing analysis
    E.g. Google’s BigTable/AppEngine Datastore, Cassandra, Hbase
    Graph Store
    Store and retrieve,explore relationships between entities
    When data needs to be modelled as graph
    News firms use variation to explore – relationships,find connections
    E.g. Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension
    Search Store
    9

    View Slide

  10. Store and retrieve information which needs to be found quickly
    Search at large scale is the main focus
    Ecommerce stores, Internal knowledge stores, Job sites
    Solr
    Memcached demonstrated that in-memory indexes
    can be highly scalable, distributing and replicating
    objects over multiple nodes.
    Dynamo pioneered the idea of eventual
    consistency as a way to achieve higher availability
    and scalability: data fetched are not guaranteed to
    be up-to-date, but updates are guaranteed to be
    propagated to all nodes eventually.
    BigTable demonstrated that persistent record
    storage could be scaled to thousands of nodes, a
    feat that most of the other systems aspire to.
    9

    View Slide

  11. What is it -Key-value data model (dictionary)
    Context of using it – Store cache, lookup, shopping cart
    Who uses it – All web apps, mostly on service layer
    Example - E.g. Amazon Dynamo,Voldemort, Tokyo Cabinet, Azure storage
    10

    View Slide

  12. Azure Storage is great option for many-2 scenarios with scale requirements. It fills the other end when
    compared to memory-based stores. Redis too can do persistence but it is on demand(last I checked
    the save command, there is nothing like a “WAL” – write ahead log).
    Speaking notes
    “Indexing” (sic) is on key.
    Query is API or rest based
    Transactions are either lease/timestamp based.
    Failover depends on whether it is memory based on persistence based. Riak handles failover via
    read/write quorums.
    HA is through replicas for Azure storage. Redis does not have replicas. Riak does have the replicas.
    Scale in Azure storage is based on account, Redis – Vertical scale as of now, Riak dynamic node
    addition.
    Joins – (left as an exercise – if you need join – you need to re-think the model)
    11

    View Slide

  13. Similar to collections/dictionaries/associative arrays/caches
    Redis –
    Redis lists to be useful as a simple fifo work queue
    Set datatype, which has all of the union, intersection, and difference operations available across set
    keys. Common use-cases include de-duplicating items for work queues.
    Hashes are Useful as a way of gathering similar kinds of data together, a hash can store a row from a
    database table with each column an entry, which allows for sorting and retrieval via sort and the
    various hash access methods.
    Redis has the zset or sorted set data type, - here each member in the set also has an associated
    float/double score, which produces an ordering over all keys in the sorted set, and which you can
    query by member, score, or rank. Some common use cases include priority queues, tag clouds,
    timeouts, rate limiting
    Dynamo or its available counterpart from LinkedIn Voldemort
    PNUTs
    12

    View Slide

  14. Document Store
    What is it -Schema-less JSON-style documents
    Context of using it – Store low friction json representation of “object”
    Who uses it – Foursquare uses MongoDB for checkin
    Example - E.g. CouchDB, MongoDB
    13

    View Slide

  15. Trivial demo to show eod – document store too is a “key value” store  with ability to index on
    attributes. It has replication, sharding.
    Speaking notes
    “Indexing” (sic) is on key, allowed on attributes
    Query is API or rest based
    Transactions – depends on implementation
    Failover depends on whether it is memory based on persistence based. Riak handles failover via
    read/write quorums. MongoDB has replica sets. CouchDB believes in failfast.
    HA is through replicas for MongoDB.
    Scale in MongodB storage is replicas.
    Joins/Stored proc – MR through javascript.
    14

    View Slide

  16. Examples
    Posterous - store the posts in Riak with bitcask (tried redis – could not fit data into ram, mongodb –
    difficult to operate)
    Archiving at Craigslist –
    - 2.2B historical posts, semi-structured
    - Relatively large blobs: avg 2KB, max > 4 MB
    Checkin at Foursquare
    Small location based updates, sharded on user
    Geo-spatial indexing
    Partitioning – Range based in MongoDB
    Availability
    Data type
    15

    View Slide

  17. Column-Oriented Data Stores (Big Table clones)
    Store & process data by column- & row
    Datawarehousing, large concurrent available writes
    Adobe uses Hbase for storing/doing analysis
    E.g. Google’s BigTable/AppEngine Datastore, Cassandra, Hbase
    Example of 5 columns of 10 bytes each and 10 rows
    500 bytes
    If you need to read the price > 20
    I need to only load the 50 bytes
    Daniel Abadi
    • http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf – Are row-stores same as column
    stores
    • http://db.csail.mit.edu/projects/cstore/vldb.pdf – C-Store
    • Off course the mother of them all - http://labs.google.com/papers/bigtable-osdi06.pdf
    • http://labs.google.com/papers/gfs-sosp2003.pdf
    16

    View Slide

  18. “Indexing” (sic) is on row,key, allowed on attributes
    Query is API or rest based
    Transactions – depends on implementation
    Failover – read/write quorums.
    HA is possible through replicas.
    Scale is via node addition.
    Joins/Stored proc – Keep column-family to store columns together.
    Not very sure of Hypertable but blows Cassandra out of water for performance.
    17

    View Slide

  19. Based off Google BigTable
    Very Similar to Relational stores yet different
    Examples
    Facebook – messages(Hbase) vs wall vs profile
    Simple consistency, sharding, load balancing,replication
    Messages in Hbase , Cassandra for inbox search
    Twitter – Hbase replicate data from mysql, hadoop analytics
    Netflix migrated from Oracle to cassandra
    Store customer profile, movie watch log, complex usage logging
    No locking *Payment
    Benefits
    DR – async inter-data center replication
    No downtime for schema changes
    Eventual or strong consistency
    Cassandra sharding - Consistent Hashing(cassandra)
    Hbase Sharding – Range partitioning
    Reading - http://dbmsmusings.blogspot.in/2010/03/distinguishing-two-major-types-of_29.html
    18

    View Slide

  20. Scaling
    http://www.listware.net/201102/neo4j-user/84469-neo4j-cache-sharding-blog-post.html
    FlockDB
    FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer
    problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments
    such as web-sites.
    Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary
    indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic
    of 20k writes/second and 100k reads/second.
    https://github.com/twitter/flockdb
    19

    View Slide

  21. Reed specialist recruitment
    Data –
    Hundred of million documents
    Hundred of fields in db which evolve
    Multiple languages
    Extract data from config, resume and oracle db and insert into solr.
    Query via facets
    20

    View Slide

  22. Why this discussion –
    Databases provide peace of mind by being ACID compliant. NoSql Store provide scale,
    availability by loosening up the consistency requirements across x nodes at same
    time.
    NoSql stores (purely memory based) can at worse loose data, can be inconsistent.
    Reference for above - http://nosql.mypopescu.com/post/12466059249/anonymous-
    post-dont-use-mongodb
    Amazon’s dynamo paper
    http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
    ACID vs. BASE (cf. [Bre00, slide 13])
    Other approach
    How ACID can be fixed without abandoning it
    http://db.cs.yale.edu/determinism-vldb10.pdf
    “The root of these problems lies in the isolation property within ACID. In particular,
    the serializability property (which is the standard isolation level for fully ACID
    systems) guarantees that execution of a set of transactions occurs in a manner
    equivalent to some sequential, non-concurrent execution of those transactions, even
    21

    View Slide

  23. if what actually happens under the hood is highly threaded and parallelized. So if
    three transactions (let's call them A, B and C) are active at the same time on an ACID
    system, it will guarantee that the resulting database state will be the same as if it had
    run them one-by-one. No promises are made, however, about which particular order
    execution it will be equivalent to: A-B-C, B-A-C, A-C-B, etc.
    This obviously causes problems for replication. If a set of (potentially non-
    commutative) transactions is sent to two replicas of the same system, the two
    replicas might each execute the transactions in a manner equivalent to a different
    serial order, allowing the replicas' states to diverge.

    Two concepts which distinquish Databases and Non-Relational stores
    Acid - properties that guarantees transaction are reliable, durable
    In certain ultra-scale web services, availability and partition tolerance are more
    important than consistency.
    BASE Basically available, Soft state and eventually consistency
    21

    View Slide

  24. Brewer helps choose the “tuning factor” in distributed store
    22

    View Slide

  25. 23

    View Slide

  26. 24

    View Slide

  27. Tunable Consistency
    Off course we will not get into latency and consistency and leave Abadi to handle it -
    http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-
    little.html
    Eventual consistency refers to a strategy used by many distributed systems
    to improve query and update latencies, and in a more limited way, to
    provide stronger availability to a system than could otherwise be attained.
    There are a lot of parameters which come into play when trying to predict or model
    the performance of distributed systems. How many nodes can die before data is lost?
    How many can die without affecting the usability of the system? Are there any single
    points of failure? Can the system be used if, for some period of time, half of the
    nodes can’t see the other half? How fast can I get data out of the system? How fast
    can it accept new data?
    25

    View Slide

  28. 29

    View Slide

  29. how to model your data
    what data access patterns will your application need
    how you deal with data integrity/consistency (what happens if two applications will
    need to access the same data in read/write mode?)
    what is the final complexity, performance, scalability of the solution based on the
    decisions you’ve made to the above points.
    Access Mechanims – REST vs API vs SQL (like mechanisms of cassandara)
    Joins ? – mostly not – document db allow nesting and transactions, columnar stores
    allow storing columns together and spreading them
    Other ways to skin
    Data model – kv, column
    API – single tuple, range
    Partition (ordered/random)
    Optimized for reads/writes
    Versions – version/timestamp
    Replication – quorum/filesystem
    Arch – decentralized, hierarchical
    Data center aware?
    30

    View Slide

  30. Supplement Relational store with appropriate Non-Relational Store.
    Datomic &
    31

    View Slide

  31. 32

    View Slide

  32. 33

    View Slide

  33. 34

    View Slide

  34. 35

    View Slide

  35. 36

    View Slide

  36. Eventual consistency demo - http://www.eecs.berkeley.edu/~pbailis/projects/pbs/#
    Paper - http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-4.pdf
    Fay Chang,Jeffrey Dean,etal.2008.Bigtable:A Distributed Storage System for
    Structured Data.
    Giuseppe De Candia,etal.2007. Dynamo:amazon'shighlyavailablekey-valuestore
    Designs,Lessons and Advice from Building Large Distributed Systems–
    JeffDean(GoogleFellow)•
    The Anatomy of the Google Architecture(EdAustin)
    •Dynamo:Amazon’s Highly Available Key-valueStore(JagrutSharma)
    •Ricardo Vilaca,Francisco Cruz,and Rui Oliveira - 2010.On the expressiveness and
    trade-offs of large scale tuple stores
    Consistent Hashing and Random Trees: Distributed Caching
    Protocols for Relieving Hot Spots on the WWW, STOC’97
    • Chord: A Scalable Peer‐to‐Peer Lookup Service for Internet
    ApplicaLons, Stoica et al, SIGCOMM’01
    The Chubby Lock Service for Loosely Coupled Distributed Systems,
    Mike Burrows, OSDI’06
    • Paxos Made Simple, Leslie Lamport
    37

    View Slide