Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL

 NoSQL

An outline of the state of NoSQL sometime in early 2014

Chris Aves

May 13, 2014
Tweet

More Decks by Chris Aves

Other Decks in Programming

Transcript

  1. SQL • Table for each class of object • Row

    for each object instance • Columns of attributes for a given entity • Defined schema • Relations defined between tables • Indexes • Usually normalised
  2. Tables ID Title Content 1 “Welcome to my blog” “Hello

    readers..” 2 “Sorry for not posting” “It’s been a while..” ID Post ID Comment 1 1 “Yay!” 2 1 “Congratulations” 3 2 “Yay! You’re back!” Blog Posts Comments
  3. Characteristics of NoSQL • Schema-free • Easy replication • Simple

    API • Eventual consistency • Humongous data
  4. Schema-free • No need to pre-define what attributes get stored

    for each class of object • Any object can have any attributes • No migrations • Better performance
  5. Easy replication • Just create more servers • Servers are

    given specific responsibilities • Not the same as sharding
  6. Simple API • Easier to learn than the query languages

    for relational databases • Allows database to be exposed through commonly available mechanisms, such as REST • Leads to a proliferation in clients, tools and add-ons
  7. Humongous data • For example: Google indexes one trillion URLs

    • Some NoSQL systems have built-in functionality for aggregating and reducing data • Can also defer to external systems (e.g. Hadoop)
  8. Eventual consistency • Really only an issue with replicated systems

    • Cannot guarantee that what you read back will match what you most recently wrote • Sometimes this doesn’t matter
  9. Some NoSQLs • Key-Value: Riak and Redis • Columnar: HBase

    • Document: MongoDB and CouchDB • Graph: Neo4J • SQL: PostgreSQL
  10. Columnar • Column data is stored together (not rows) •

    Columns are inexpensive to add • Columns added on row by row basis • Each row can have its own columns
  11. Document • Stores anything, i.e. a document • Each document

    has a unique ID and a set of values • Documents can contain nested structures
  12. Graph • Interconnected data • Nodes • Relationships between nodes

    • Quick to find data by navigating these nodes and relationships
  13. Characteristics • Stored in memory and on filesystem • Journaled

    • Very quick • Can be scripted with Lua • Clustered (in development)
  14. What can Redis store? • Strings: Any data (including binary)

    up to 512MB • Lists: Strings sorted by insertion order, up to 4 billion per list • Sets: Unordered collection of unique strings • Hashes: Mapping between string fields and string values • Sorted Sets: Ordered by a ‘score’
  15. Clients • ActionScript, C, C#, C++, Clojure, Common Lisp, D,

    Dart, emacs lisp, Erlang, Fancy, Go, Haskell, haXe, Io, Java, Lua, Node.js, Objective-C, Perl, PHP, Pure Data, Python, Ruby, Scala, Scheme, Smalltalk, Tcl • Plus: dozens of tools built on top of Redis
  16. Redis in real life • Analytics tracking for multi-domain
 e-Commerce

    site • Tracks catalogue views for individual domain and month, and all domains for month • Same for searches, product views, referers, sale value • For tracking, uses just ZINCRBY on Sorted Sets
  17. Keys are important • product:views:domain_id:period • catalogue:views:domain_id:period • catalogue:searches:domain_id:period •

    track:referers:domain_id:period • Each contains Sorted Set of Key-Value pairs • Key is something like a product_id, or a name • Value is number of events • product:views:mysite.com:2013-01 => {
 “teddy bear” => 4,
 “book” => 3,
 “poster” => 1
 } • product:views:mysite.com:2013-02 => {
 “book” => 7,
 “poster” => 1,
 “cutlery set” => 4
 }
  18. Characteristics • Column families for fine-grained performance tuning • Designed

    for GB and TB of data • Clustered • Versioning, Compression and Filtering • No Sorting or Indexing except for Keys • No Datatypes
  19. Protocols • JRuby-based Shell • Java API • REST •

    Thrift: provides access via Ruby, PHP, Python etc.
  20. Wiki Example Keys (title) Family “text” Family “revision” Row (page)

    “First page” “”: “...” “author”: “...”
 “comment”: “...” Row (page) “Second page” “”: “...” “author”: “...”
 “comment”: “...”
  21. massive_record • A Ruby client for HBase: https://github.com/CompanyBook/ massive_record •

    More like a data mapper than just a Client, so you get relations, validations, callbacks, finders
  22. Characteristics • Indexes and Sorting • No Joins • Deep

    queries • Journaled • Supports MapReduce • Easy replication • Easy sharding
  23. Clients • C, C++, C#, Erlang, Java, JavaScript, Node.js, Perl,

    PHP, Python, Ruby, Scala • ActionScript, Clojure, ColdFusion, D, Dart, Delphi, Entity, Factor, Fantom, F#, Go, Groovy, Lisp, MatLab, Objective-C, Opa, PowerShell, Prolog, R, Racket, Smalltalk • REST
  24. MongoDB in real life • Custom e-Commerce system • Users,

    Sales and Line Items all stored in MongoDB • Line Items are embedded in a Sale • Why?
  25. MongoDB in real life • Each Line Item is a

    customisable product • A product may have more than one customisable component • Customisable components can be of same customisation as other components in Line Item • Customisation varies according to the type of component
  26. Seven Databases in Seven Weeks “We find MongoDB to be

    a much more natural answer to many common problem scopes for application-driven datasets than relational databases.”
  27. Characteristics • Built on Nodes and Relationships • Both can

    have arbitrary Properties • Can cope with tens of billions of Nodes • Indexes as a separate service, uses Lucene • Distributed • Replication
  28. Clients • Spring, Java, Ruby, PHP, .NET, Python, Node.js, Clojure,

    Django, Perl, Scala, Grails, Haskell, GO • REST
  29. Better than SQL? • More indexing options (e.g. B-Tree) •

    More data types (e.g. arrays, hashes, money) • Extensions (e.g. add your own data types) • Views • Rules • Full text search