NoSQL

What's SQL?

SQL • Table for each class of object • Row
for each object instance • Columns of attributes for a given entity • Deﬁned schema • Relations deﬁned between tables • Indexes • Usually normalised

Tables ID Title Content 1 “Welcome to my blog” “Hello
readers..” 2 “Sorry for not posting” “It’s been a while..” ID Post ID Comment 1 1 “Yay!” 2 1 “Congratulations” 3 2 “Yay! You’re back!” Blog Posts Comments

What's NoSQL?

No Not SQL

Next Generation Databases

Not Only SQL • Storage mechanisms that have SQL-like characteristics
and / or • Non-SQL-like characteristics

Not Only SQL • Non-relational • Distributed • Scalable

Characteristics of NoSQL • Schema-free • Easy replication • Simple
API • Eventual consistency • Humongous data

Schema-free • No need to pre-deﬁne what attributes get stored
for each class of object • Any object can have any attributes • No migrations • Better performance

Easy replication • Just create more servers • Servers are
given speciﬁc responsibilities • Not the same as sharding

Simple API • Easier to learn than the query languages
for relational databases • Allows database to be exposed through commonly available mechanisms, such as REST • Leads to a proliferation in clients, tools and add-ons

Humongous data • For example: Google indexes one trillion URLs
• Some NoSQL systems have built-in functionality for aggregating and reducing data • Can also defer to external systems (e.g. Hadoop)

Eventual consistency • Really only an issue with replicated systems
• Cannot guarantee that what you read back will match what you most recently wrote • Sometimes this doesn’t matter

How many NoSQL databases are there?

120 nosql-database.org August 2011

150 nosql-database.org

Some NoSQLs • Key-Value: Riak and Redis • Columnar: HBase
• Document: MongoDB and CouchDB • Graph: Neo4J • SQL: PostgreSQL

Key-Value • Essentially hashtables • Great performance • Limited usage
scenarios

Columnar • Column data is stored together (not rows) •
Columns are inexpensive to add • Columns added on row by row basis • Each row can have its own columns

Document • Stores anything, i.e. a document • Each document
has a unique ID and a set of values • Documents can contain nested structures

Graph • Interconnected data • Nodes • Relationships between nodes
• Quick to ﬁnd data by navigating these nodes and relationships

Examples! Redis HBase MongoDB Neo4j PostgreSQL

Redis www.redis.io K ey-Value

www.redis.io

Characteristics • Stored in memory and on ﬁlesystem • Journaled
• Very quick • Can be scripted with Lua • Clustered (in development)

What can Redis store? • Strings: Any data (including binary)
up to 512MB • Lists: Strings sorted by insertion order, up to 4 billion per list • Sets: Unordered collection of unique strings • Hashes: Mapping between string ﬁelds and string values • Sorted Sets: Ordered by a ‘score’

Commands

Clients • ActionScript, C, C#, C++, Clojure, Common Lisp, D,
Dart, emacs lisp, Erlang, Fancy, Go, Haskell, haXe, Io, Java, Lua, Node.js, Objective-C, Perl, PHP, Pure Data, Python, Ruby, Scala, Scheme, Smalltalk, Tcl • Plus: dozens of tools built on top of Redis

Try it! try.redis.io

redis-rb • A Ruby client for Redis: https://github.com/redis/redis-rb

Redis in real life • Analytics tracking for multi-domain  e-Commerce
site • Tracks catalogue views for individual domain and month, and all domains for month • Same for searches, product views, referers, sale value • For tracking, uses just ZINCRBY on Sorted Sets

Keys are important • product:views:domain_id:period • catalogue:views:domain_id:period • catalogue:searches:domain_id:period •
track:referers:domain_id:period • Each contains Sorted Set of Key-Value pairs • Key is something like a product_id, or a name • Value is number of events • product:views:mysite.com:2013-01 => {  “teddy bear” => 4,  “book” => 3,  “poster” => 1  } • product:views:mysite.com:2013-02 => {  “book” => 7,  “poster” => 1,  “cutlery set” => 4  }

Reporting by duration

HBase hbase.apache.org C olum nar

hbase.apache.org

Characteristics • Column families for ﬁne-grained performance tuning • Designed
for GB and TB of data • Clustered • Versioning, Compression and Filtering • No Sorting or Indexing except for Keys • No Datatypes

Protocols • JRuby-based Shell • Java API • REST •
Thrift: provides access via Ruby, PHP, Python etc.

Try it! brew install hbase

Wiki Example Keys (title) Family “text” Family “revision” Row (page)
“First page” “”: “...” “author”: “...”  “comment”: “...” Row (page) “Second page” “”: “...” “author”: “...”  “comment”: “...”

massive_record • A Ruby client for HBase: https://github.com/CompanyBook/ massive_record •
More like a data mapper than just a Client, so you get relations, validations, callbacks, ﬁnders

MongoDB www.mongodb.org D ocum ent

www.mongodb.org

Characteristics • Indexes and Sorting • No Joins • Deep
queries • Journaled • Supports MapReduce • Easy replication • Easy sharding

Clients • C, C++, C#, Erlang, Java, JavaScript, Node.js, Perl,
PHP, Python, Ruby, Scala • ActionScript, Clojure, ColdFusion, D, Dart, Delphi, Entity, Factor, Fantom, F#, Go, Groovy, Lisp, MatLab, Objective-C, Opa, PowerShell, Prolog, R, Racket, Smalltalk • REST

Try it! try.mongodb.org

mongoid • An Object-Document-Mapper framework for MongoDB in Ruby: https://github.com/mongoid/mongoid

MongoDB in real life • Custom e-Commerce system • Users,
Sales and Line Items all stored in MongoDB • Line Items are embedded in a Sale • Why?

MongoDB in real life • Each Line Item is a
customisable product • A product may have more than one customisable component • Customisable components can be of same customisation as other components in Line Item • Customisation varies according to the type of component

Seven Databases in Seven Weeks “We ﬁnd MongoDB to be
a much more natural answer to many common problem scopes for application-driven datasets than relational databases.”

Neo4j www.neo4j.org G raph

www.neo4j.org

Characteristics • Built on Nodes and Relationships • Both can
have arbitrary Properties • Can cope with tens of billions of Nodes • Indexes as a separate service, uses Lucene • Distributed • Replication

Clients • Spring, Java, Ruby, PHP, .NET, Python, Node.js, Clojure,
Django, Perl, Scala, Grails, Haskell, GO • REST

Harry Oliver Lily Emily Jack Emily Ruby Jack FATHER FATHER
FATHER MOTHER MOTHER MOTHER

Try it! brew install neo4j

PostgreSQL www.postgresql.org SQ L

Better than SQL? • More indexing options (e.g. B-Tree) •
More data types (e.g. arrays, hashes, money) • Extensions (e.g. add your own data types) • Views • Rules • Full text search

Polyglot • MySQL • Redis • MongoDB

Alternative to Polyglot • ArangoDB • RethinkDB

www.arangodb.org

www.rethinkdb.com

NoSQL

NoSQL

More Decks by Chris Aves

Other Decks in Programming

Featured

Transcript