NoSQL - An Introduction

NoSQL Now we know what it’s not... what is it?

What are we running from? • Relational databases are the
defacto standard for storing data in a web application. • A lot of times, that data isn’t really relational at all. • RDBMS’s have lots of rules that can impact performance.

Rules? What Rules? • Classic relational databases follow the ACID
rules: • Atomicity • Consistency • Isolation • Durability

Atomicity • If any part of the update fails, it
all fails. • Databases have to be able to lock tables and rows for operations, which can block or delay other incoming requests.

Consistency • After a transaction, all copies of the data
must be consistent with each other (my interpretation). • Replication across lots of shards is expensive especially if there’s locking involved.

Isolation • Data involved in a transaction must be inaccessible
to other operations. • Remember the thing about locked rows and tables? • It’s a bummer.

Durability • Once a user is notiﬁed that a transaction
has completed, the data must be accessible and all integrity constraints have been met.

I come not to bury MySQL... • Relational databases are
great for a lot of uses. • If you have data that’s actually relational and you need transactions, joins and have a limited number of data types, then an RDBMS will work for you.

But... • RDBMS’s have been treated like hammers and used
for things they’re not good at and weren’t designed for. • Like the web...

Thus were born... • Key-Value Stores • Wide-Column Stores •
Document Stores/Databases • Graph Databases

All thrown together & clumsily dubbed...

Which, despite it’s negative sound, supposedly means: “Not Only SQL”

Yeah, I don’t believe it either...

Key-Value Just what it sounds like. You set a Key
to a Value and can then retrieve it.

Key-Value Beneﬁts • Simple • High performance (usually) because there
are no transactions or relations so it’s a simple bucket and lookup. • Extremely ﬂexible • Commonly used as caches in front of slower resources (like MySQL - bazinga!)

Popular Players • memcached - in memory only, extremely efﬁcient
hashing algorithm allows you to scale easily to hundreds of nodes. • Redis - persistent, slightly more complex than memcached (has support for arrays) but still highly performant. • Riak - The Rails Machine guys love it. Jesse?

My Uses • memcached: Read-through cache for Rails with cache-money.
• redis: persistent cache for results from our algorithm, partitioned by version and instance.

Wide Column • Family of databases modeled on either Google’s
BigTable or Amazon’s Dynamo. • Pick two out of three from the CAP theorem in order to get horizontal scalability. • Data stored by column instead of by row.

CAP? • Consistency: All clients always have the same view
of the data. • Availability: Each client can always read and write. • Partition Tolerance: The system works well despite physical network partitions

Use cases • Making sense out of large amounts of
data where you know your query scenario ahead of time. • Large = 100s of millions of records. • Data-mining log ﬁles and other sources of similar data.

Big Players • HBase • Cassandra • Hypertable • Amazon’s
SimpleDB • Google’s BigTable (the granddaddy of all of them)

Graph Databases • Store nodes, edges and properties • Think
of them as Things, Connections and Properties • Good for storing properties and relationships. • Honestly, I don’t fully understand them... anyone?

The Players • Neo4j • FlockDB • HyperGraphDB

Document Stores • Short on relationships, tall on rich data
types. • Big on eventual consistency and ﬂexible schemas. • Hybrid of traditional RDBMS and Key-Value stores.

Use Cases • Content Management Systems • Applications with rapid
partial updates • Anything you don’t need joins or transactions for that you would normally use a RDBMS for.

The Players • CouchDB • MongoDB • Terrastore

MongoDB • Support for rich data types: arrays, hashes, embedded
documents, etc • Support for adding and removing things from arrays and embedded documents (addToSet, for example). • Map/Reduce support and strong indexes • Regular expression support in queries

Design Considerations • Embedded Documents - Use only if it
the embedded document will always be selected with the parent. • Indexes - MongoDB punishes you much earlier for missing indexes than MySQL. • Document size - Currently, documents are limited to 4MB, which should be large enough, but if it’s not...

Real-World MongoDB • We use MongoDB heavily at MIS. •
Statistics application and reporting • Top-secret new application • Web crawler and indexer • CMS

Real-World Example Let’s do tags. Everything is taggable now, right?

The MySQL Way

Schema

And to get a “thing’s” tags? SELECT `tags`.* FROM `tags`
INNER JOIN `taggings` ON `tags`.id = `taggings`.tag_id WHERE ((`taggings`.taggable_id = 237) AND (`taggings`.taggable_type = 'Song'))

Yuck! That’s a lot of pain for something so simple.
And I didn’t even show you ﬁnding things with tag “x”. Or how to set and unset tags on a “thing”. Ouch.

The MongoDB Way Using MongoMapper and Rails 3

class Post include MongoMapper::Document key :title, String key :body, String
key :tags, Array ensure_index :tags end

Let’s Make This Easy... def add_tag(tag) tag = Post.clean_tag(tag) self.tags
<< tag self.add_to_set(:tags => tag) unless self.new_record? end def remove_tag(tag) tag = Post.clean_tag(tag) self.tags.delete(tag) self.pull(:tags => tag) unless self.new_record? end def self.clean_tag(str) str.strip.downcase.gsub(" ","-").gsub(/[^a-z0-9-]/,"") end def self.clean_tags(str) out = [] arr = str.split(",") arr.each do |t| out << self.clean_tag(t) end out end

Demo Time Sorry if you’re looking at this later, but
it’s console time!

Why I Love MongoDB • Document model ﬁts how I
build web apps. • For most apps, I don’t need transactions. • Eventual consistency is actually OK. • Partial updates and arrays make things that are a pain in SQL-land absolutely painless. • It’s just smart enough without getting in the way.

What’s NoSQL, really? • The right tool for the job.
• We’ve got lots of options for storing application data. • The key is picking the one that solves our real problem. • And if an RDBMS is the right tool, that’s OK too.

Questions?

Further Reading • Visual NoSQL: http://blog.nahurst.com/ visual-guide-to-nosql-systems • MongoDB: http://mongodb.org
• MongoMapper: http://mongomapper.com/

Thanks! • Kevin Lawver • @kplawver • [email protected] • http://kevinlawver.com

NoSQL - An Introduction

NoSQL - An Introduction

More Decks by Kevin Lawver

Other Decks in Technology

Featured

Transcript