Slide 1

Slide 1 text

NoSQL Now we know what it’s not... what is it?

Slide 2

Slide 2 text

What are we running from? • Relational databases are the defacto standard for storing data in a web application. • A lot of times, that data isn’t really relational at all. • RDBMS’s have lots of rules that can impact performance.

Slide 3

Slide 3 text

Rules? What Rules? • Classic relational databases follow the ACID rules: • Atomicity • Consistency • Isolation • Durability

Slide 4

Slide 4 text

Atomicity • If any part of the update fails, it all fails. • Databases have to be able to lock tables and rows for operations, which can block or delay other incoming requests.

Slide 5

Slide 5 text

Consistency • After a transaction, all copies of the data must be consistent with each other (my interpretation). • Replication across lots of shards is expensive especially if there’s locking involved.

Slide 6

Slide 6 text

Isolation • Data involved in a transaction must be inaccessible to other operations. • Remember the thing about locked rows and tables? • It’s a bummer.

Slide 7

Slide 7 text

Durability • Once a user is notified that a transaction has completed, the data must be accessible and all integrity constraints have been met.

Slide 8

Slide 8 text

I come not to bury MySQL... • Relational databases are great for a lot of uses. • If you have data that’s actually relational and you need transactions, joins and have a limited number of data types, then an RDBMS will work for you.

Slide 9

Slide 9 text

But... • RDBMS’s have been treated like hammers and used for things they’re not good at and weren’t designed for. • Like the web...

Slide 10

Slide 10 text

Thus were born... • Key-Value Stores • Wide-Column Stores • Document Stores/Databases • Graph Databases

Slide 11

Slide 11 text

All thrown together & clumsily dubbed...

Slide 12

Slide 12 text

NoSQL

Slide 13

Slide 13 text

Which, despite it’s negative sound, supposedly means: “Not Only SQL”

Slide 14

Slide 14 text

Yeah, I don’t believe it either...

Slide 15

Slide 15 text

Key-Value Just what it sounds like. You set a Key to a Value and can then retrieve it.

Slide 16

Slide 16 text

Key-Value Benefits • Simple • High performance (usually) because there are no transactions or relations so it’s a simple bucket and lookup. • Extremely flexible • Commonly used as caches in front of slower resources (like MySQL - bazinga!)

Slide 17

Slide 17 text

Popular Players • memcached - in memory only, extremely efficient hashing algorithm allows you to scale easily to hundreds of nodes. • Redis - persistent, slightly more complex than memcached (has support for arrays) but still highly performant. • Riak - The Rails Machine guys love it. Jesse?

Slide 18

Slide 18 text

My Uses • memcached: Read-through cache for Rails with cache-money. • redis: persistent cache for results from our algorithm, partitioned by version and instance.

Slide 19

Slide 19 text

Wide Column • Family of databases modeled on either Google’s BigTable or Amazon’s Dynamo. • Pick two out of three from the CAP theorem in order to get horizontal scalability. • Data stored by column instead of by row.

Slide 20

Slide 20 text

CAP? • Consistency: All clients always have the same view of the data. • Availability: Each client can always read and write. • Partition Tolerance: The system works well despite physical network partitions

Slide 21

Slide 21 text

Use cases • Making sense out of large amounts of data where you know your query scenario ahead of time. • Large = 100s of millions of records. • Data-mining log files and other sources of similar data.

Slide 22

Slide 22 text

Big Players • HBase • Cassandra • Hypertable • Amazon’s SimpleDB • Google’s BigTable (the granddaddy of all of them)

Slide 23

Slide 23 text

Graph Databases • Store nodes, edges and properties • Think of them as Things, Connections and Properties • Good for storing properties and relationships. • Honestly, I don’t fully understand them... anyone?

Slide 24

Slide 24 text

The Players • Neo4j • FlockDB • HyperGraphDB

Slide 25

Slide 25 text

Document Stores • Short on relationships, tall on rich data types. • Big on eventual consistency and flexible schemas. • Hybrid of traditional RDBMS and Key-Value stores.

Slide 26

Slide 26 text

Use Cases • Content Management Systems • Applications with rapid partial updates • Anything you don’t need joins or transactions for that you would normally use a RDBMS for.

Slide 27

Slide 27 text

The Players • CouchDB • MongoDB • Terrastore

Slide 28

Slide 28 text

MongoDB • Support for rich data types: arrays, hashes, embedded documents, etc • Support for adding and removing things from arrays and embedded documents (addToSet, for example). • Map/Reduce support and strong indexes • Regular expression support in queries

Slide 29

Slide 29 text

Design Considerations • Embedded Documents - Use only if it the embedded document will always be selected with the parent. • Indexes - MongoDB punishes you much earlier for missing indexes than MySQL. • Document size - Currently, documents are limited to 4MB, which should be large enough, but if it’s not...

Slide 30

Slide 30 text

Real-World MongoDB • We use MongoDB heavily at MIS. • Statistics application and reporting • Top-secret new application • Web crawler and indexer • CMS

Slide 31

Slide 31 text

Real-World Example Let’s do tags. Everything is taggable now, right?

Slide 32

Slide 32 text

The MySQL Way

Slide 33

Slide 33 text

Schema

Slide 34

Slide 34 text

And to get a “thing’s” tags? SELECT `tags`.* FROM `tags` INNER JOIN `taggings` ON `tags`.id = `taggings`.tag_id WHERE ((`taggings`.taggable_id = 237) AND (`taggings`.taggable_type = 'Song'))

Slide 35

Slide 35 text

Yuck! That’s a lot of pain for something so simple. And I didn’t even show you finding things with tag “x”. Or how to set and unset tags on a “thing”. Ouch.

Slide 36

Slide 36 text

The MongoDB Way Using MongoMapper and Rails 3

Slide 37

Slide 37 text

class Post include MongoMapper::Document key :title, String key :body, String key :tags, Array ensure_index :tags end

Slide 38

Slide 38 text

Let’s Make This Easy... def add_tag(tag) tag = Post.clean_tag(tag) self.tags << tag self.add_to_set(:tags => tag) unless self.new_record? end def remove_tag(tag) tag = Post.clean_tag(tag) self.tags.delete(tag) self.pull(:tags => tag) unless self.new_record? end def self.clean_tag(str) str.strip.downcase.gsub(" ","-").gsub(/[^a-z0-9-]/,"") end def self.clean_tags(str) out = [] arr = str.split(",") arr.each do |t| out << self.clean_tag(t) end out end

Slide 39

Slide 39 text

Demo Time Sorry if you’re looking at this later, but it’s console time!

Slide 40

Slide 40 text

Why I Love MongoDB • Document model fits how I build web apps. • For most apps, I don’t need transactions. • Eventual consistency is actually OK. • Partial updates and arrays make things that are a pain in SQL-land absolutely painless. • It’s just smart enough without getting in the way.

Slide 41

Slide 41 text

What’s NoSQL, really? • The right tool for the job. • We’ve got lots of options for storing application data. • The key is picking the one that solves our real problem. • And if an RDBMS is the right tool, that’s OK too.

Slide 42

Slide 42 text

Questions?

Slide 43

Slide 43 text

Further Reading • Visual NoSQL: http://blog.nahurst.com/ visual-guide-to-nosql-systems • MongoDB: http://mongodb.org • MongoMapper: http://mongomapper.com/

Slide 44

Slide 44 text

Thanks! • Kevin Lawver • @kplawver • [email protected] • http://kevinlawver.com