$30 off During Our Annual Pro Sale. View Details »

NoSQL Overview

NoSQL Overview

A high level overview on my thoughts on approaching the NoSQL space. Given at the Progressive.NET meetup hosted by Valtech Stockholm.

Approx 60 minutes.

Mårten Gustafson

May 12, 2011
Tweet

More Decks by Mårten Gustafson

Other Decks in Technology

Transcript

  1. NoSQL
    Mårten Gustafson
    Progressive.Net @ Valtech Stockholm
    2011-05-12
    Thursday, May 12, 2011

    View Slide

  2. Not Only SQL
    Thursday, May 12, 2011

    View Slide

  3. “NoSQL is a movement promoting a loosely
    defined class of non-relational data stores
    that break with a long history of relational
    databases” - Wikipedia
    Thursday, May 12, 2011

    View Slide

  4. “NoSQL is a movement promoting a loosely
    defined class of non-relational data stores
    that break with a long history of relational
    databases” - Wikipedia
    • Not one single technique
    Thursday, May 12, 2011

    View Slide

  5. “NoSQL is a movement promoting a loosely
    defined class of non-relational data stores
    that break with a long history of relational
    databases” - Wikipedia
    • Not one single technique
    • Not one type of data
    Thursday, May 12, 2011

    View Slide

  6. “NoSQL is a movement promoting a loosely
    defined class of non-relational data stores
    that break with a long history of relational
    databases” - Wikipedia
    • Not one single technique
    • Not one type of data
    • Not one type of use case
    Thursday, May 12, 2011

    View Slide

  7. The way I see it
    Thursday, May 12, 2011

    View Slide

  8. Families
    • Graph
    • Key/Value
    • Document
    Thursday, May 12, 2011

    View Slide

  9. Flavors
    • Stand-alone
    • Distributed
    • Embedded
    Thursday, May 12, 2011

    View Slide

  10. Flavors
    • Stand-alone
    • Distributed
    • Embedded
    •Isolated instances are common
    •Might have slave replication
    Thursday, May 12, 2011

    View Slide

  11. Flavors
    • Stand-alone
    • Distributed
    • Embedded
    •“Cluster” is default mode of operation
    •No master node
    •Multiple nodes may fail without interrupting
    service (implies storage distribution)
    •Isolated instances are common
    •Might have slave replication
    Thursday, May 12, 2011

    View Slide

  12. Disclaimer
    I don’t know any and all
    products by heart
    I’m trying to illustrate my
    broad reasoning
    Thursday, May 12, 2011
    NoSQL tends crammed with religious zealots

    View Slide

  13. Example
    of my reasoning
    Flavor / Family Graph Key/Value Document
    Stand alone Neo4J Redis
    CouchDB
    MongoDB
    Distributed
    Riak
    Voldemort
    Cassandra
    Embedded Neo4J
    Tokyo Cabinet
    LevelDB
    Thursday, May 12, 2011

    View Slide

  14. Example
    of my reasoning
    Flavor / Family Graph Key/Value Document
    Stand alone Neo4J Redis
    CouchDB
    MongoDB
    Distributed
    Riak
    Voldemort
    Cassandra
    Embedded Neo4J
    Tokyo Cabinet
    LevelDB
    Thursday, May 12, 2011

    View Slide

  15. Example
    of my reasoning
    Flavor / Family Graph Key/Value Document
    Stand alone Neo4J Redis
    CouchDB
    MongoDB
    Distributed
    Riak
    Voldemort
    Cassandra
    Embedded Neo4J
    Tokyo Cabinet
    LevelDB
    Thursday, May 12, 2011

    View Slide

  16. Example
    of my reasoning
    Flavor / Family Graph Key/Value Document
    Stand alone Neo4J Redis
    CouchDB
    MongoDB
    Distributed
    Riak
    Voldemort
    Cassandra
    Embedded Neo4J
    Tokyo Cabinet
    LevelDB
    Thursday, May 12, 2011

    View Slide

  17. priorities & trade-offs
    Thursday, May 12, 2011
    (No)SQL for me is very much about trade offs

    View Slide

  18. Does it fit with current data structures?
    Thursday, May 12, 2011
    Don’t underestimate the exercise of making your data “fit” a certain nosql product

    View Slide

  19. Ease of adoption?
    Thursday, May 12, 2011
    Client libraries?
    Does it require driver libraries?

    View Slide

  20. Indices or some sort of search?
    Thursday, May 12, 2011
    What access patterns do you have today? Tomorrow?
    What kind of reports will customers or management require?

    View Slide

  21. Does it speak HTTP?
    Thursday, May 12, 2011
    For us at Hitta.se this is important since almost everything we do is HTTP based

    View Slide

  22. Availability and redundancy?
    Thursday, May 12, 2011
    What kinds of availability?
    How does it handle node failures? Network partitions?

    View Slide

  23. Can you monitor it?
    Thursday, May 12, 2011
    How and with what?

    View Slide

  24. Performance?
    Thursday, May 12, 2011
    Does performance scale with additional nodes?

    View Slide

  25. Ease of scaling in and out?
    Thursday, May 12, 2011
    What’s required to add additional nodes?
    How do you remove a node temporarily or permanently?

    View Slide

  26. Commercial support available?
    Thursday, May 12, 2011

    View Slide

  27. Does it run on your preferred OS?
    Thursday, May 12, 2011

    View Slide

  28. Is it properly packaged?
    Thursday, May 12, 2011
    Proper install packages?
    Sane defaults in terms of service accounts and privileges?

    View Slide

  29. Do you understand it?
    Thursday, May 12, 2011
    Don’t underestimate this

    View Slide

  30. Can you kill it without loosing data?
    Thursday, May 12, 2011
    Is your data really durable on disk -- assuming that’s what you need

    View Slide

  31. For example...
    • I work at Hitta.se
    • We love availability
    • We like “easy” scalability
    Thursday, May 12, 2011

    View Slide

  32. For example...
    • I work at Hitta.se
    • We love availability
    • We like “easy” scalability
    Thursday, May 12, 2011

    View Slide

  33. availability
    +
    scalability
    Thursday, May 12, 2011

    View Slide

  34. availability
    +
    scalability
    =
    multi-master
    Thursday, May 12, 2011

    View Slide

  35. availability
    +
    scalability
    =
    storage distribution
    Thursday, May 12, 2011

    View Slide

  36. availability
    +
    scalability
    =
    replication
    Thursday, May 12, 2011

    View Slide

  37. availability
    +
    scalability
    =
    add & remove nodes
    Thursday, May 12, 2011

    View Slide

  38. availability
    +
    scalability
    =
    tune behavior per use case
    Thursday, May 12, 2011

    View Slide

  39. availability
    +
    scalability
    =
    ?
    Thursday, May 12, 2011

    View Slide

  40. availability
    +
    scalability
    =
    Riak & CouchDB
    Thursday, May 12, 2011
    For us, so far, the answer has been Riak & CouchDB

    View Slide

  41. Riak
    CouchDB
    Thursday, May 12, 2011

    View Slide

  42. Riak
    CouchDB
    Dynamo inspired
    key / value store
    Thursday, May 12, 2011

    View Slide

  43. Riak
    CouchDB
    Dynamo inspired
    key / value store
    Document
    database
    Thursday, May 12, 2011

    View Slide

  44. Riak
    CouchDB
    Data that must be
    available as soon as
    possible on all nodes
    Dynamo inspired
    key / value store
    Document
    database
    Thursday, May 12, 2011

    View Slide

  45. Riak
    CouchDB
    Data that must be
    available as soon as
    possible on all nodes
    Data that changes less
    frequently and is ok to
    replicate “manually”
    Dynamo inspired
    key / value store
    Document
    database
    Thursday, May 12, 2011

    View Slide

  46. Riak
    CouchDB
    Data that must be
    available as soon as
    possible on all nodes
    Data that changes less
    frequently and is ok to
    replicate “manually”
    Data that
    require storage
    distribution
    Dynamo inspired
    key / value store
    Document
    database
    Thursday, May 12, 2011

    View Slide

  47. Riak
    CouchDB
    Data that must be
    available as soon as
    possible on all nodes
    Data that changes less
    frequently and is ok to
    replicate “manually”
    Data that
    require storage
    distribution
    Data that might
    be local to a
    single node
    Dynamo inspired
    key / value store
    Document
    database
    Thursday, May 12, 2011

    View Slide

  48. Common factors
    Riak & CouchDB
    Good packaging
    Thursday, May 12, 2011

    View Slide

  49. Common factors
    Riak & CouchDB
    Monitorable (lots of stats)
    Thursday, May 12, 2011

    View Slide

  50. Common factors
    Riak & CouchDB
    Easy configuration
    Thursday, May 12, 2011

    View Slide

  51. Common factors
    Riak & CouchDB
    Reliable
    Thursday, May 12, 2011
    Append only disk structures

    View Slide

  52. Common factors
    Riak & CouchDB
    HTTP API
    Thursday, May 12, 2011

    View Slide

  53. Common factors
    Riak & CouchDB
    They embrace HTTP
    Thursday, May 12, 2011

    View Slide

  54. All your webish skillz and tools apply...
    Thursday, May 12, 2011
    Important for us as it requires no “drivers” and allows us to serve binary+mime
    No, I don’t like WS-*

    View Slide

  55. All your webish skillz and tools apply...
    proxies
    load balancers
    caches
    HTTP client libs (etag, if-modified-since, etc)
    language-, platform- and OS-neutral
    MIME / Content-Type
    Thursday, May 12, 2011
    Important for us as it requires no “drivers” and allows us to serve binary+mime
    No, I don’t like WS-*

    View Slide

  56. Common factors
    Riak & CouchDB
    Able to store and serve complete web apps
    Thursday, May 12, 2011

    View Slide

  57. Go do!
    • Test one or more NoSQL thingys
    • Get familiar with Brewers CAP theorem
    • Get familiar with the Dynamo paper
    Thursday, May 12, 2011

    View Slide

  58. Thx.
    Mårten Gustafson
    @martengustafson
    http://marten.gustafson.pp.se/
    [email protected]
    Thursday, May 12, 2011

    View Slide