MongoDB Introduction - Mongo Hamburg 2011

An Introduction Monday, July 4, 2011

Let’s Face It ... SQL Sucks. Monday, July 4, 2011

Let’s Face It ... SQL Sucks. For some problems at
least. Monday, July 4, 2011

I know! Let’s use an ORM! Monday, July 4, 2011

I know! Let’s use an ORM! Congratulations: Now we’ve got
2 problems! Monday, July 4, 2011

Stuffing an object graph into a relational model is like
fitting a square peg into a round hole. Monday, July 4, 2011

Sure, we can use an ORM. But who are we
really fooling? Monday, July 4, 2011

Sure, we can use an ORM. But who are we
really fooling? ... and who/what are we going to wake up next to in the morning? Monday, July 4, 2011

This is the SQL Model mysql> select * from book;
+----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Monday, July 4, 2011

Joins are great and all ... Monday, July 4, 2011

Joins are great and all ... • Potentially organizationally messy
Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend Monday, July 4, 2011

• Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend • I don’t know about you, but I have better things to do with my time. Monday, July 4, 2011

The Same Data in MongoDB > db.books.find().forEach(printjson) { "_id" :
ObjectId("4dfa6baa9c65dae09a4bbda3"), "title" : "The Demon-Haunted World: Science as a Candle in the Dark", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda4"), "title" : "Cosmos", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } Monday, July 4, 2011

The Same Data in MongoDB (Part 2) { "_id" :
ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, July 4, 2011

Access to the embedded objects is integral > db.books.find({"author.first_name": "Martin",
"author.last_name": "Odersky"}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, July 4, 2011

As is manipulation of the embedded data > db.books.update({"author.first_name": "Bill",
"author.last_name": "Venners"}, ... {$set: {"author.$.company": "Artima, Inc."}}) > db.books.update({"author.first_name": "Martin", "author.last_name": "Odersky"}, ... {$set: {"author.$.company": "Typesafe, Inc."}}) > db.books.findOne({"title": /Scala$/}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "author" : [ { "company" : "Typesafe, Inc.", "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "company" : "Artima, Inc.", "first_name" : "Bill", "last_name" : "Venners" } ], "title" : "Programming in Scala" } Monday, July 4, 2011

NoSQL Really Means... non-relational, next-generation operational datastores and databases Monday,
July 4, 2011

NoSQL Really Means... non-relational, next-generation operational datastores and databases ...
focus on the “non-relational” bit. Monday, July 4, 2011

Horizontally Scalable Architectures no joins + no complex transactions Monday,
July 4, 2011

no joins + no complex transactions Monday, July 4, 2011

New Data Models no joins + no complex transactions Monday,
July 4, 2011

depth of functionality scalability & performance • memcached • key/value
• RDBM S Monday, July 4, 2011

Scale out write read shard1 rep_a1 rep_b1 rep_c2 shard2 rep_a2
rep_b2 rep_c2 shard3 rep_a3 rep_b3 rep_c3 mongos / config server mongos / config server mongos / config server Monday, July 4, 2011

Why did non-relational databases arise? Problems with relational databases in
the web world The Whys of Non-Relational Databases Monday, July 4, 2011

Problem - Schema Evolution • Applications are evolving all the
time • Applications need new fields • Applications need new indexes • Data is growing – sometimes very fast • Users need to be able to alter their schemas without making their data unavailable • The web world expects 24x7 service • RDBMSs can have a hard time doing this Monday, July 4, 2011

Problem – Write Rates • Replication is a solution for
high read loads • Sooner or later, writing becomes a bottleneck • Sharding – partitioning a logical database across multiple database instances • Joins and aggregation become a problem • Distributed transactions are too slow for the web • Manual management of shards • Choosing shard partitions • Rebalancing shards Monday, July 4, 2011

An introduction to terminology you’re going to be seeing a
lot Vocabulary of the Non-Relational World Monday, July 4, 2011

Data Models • A non-relational database’s data model determines the
kinds of items it can contain and how they can be retrieved • What can the system store, and what does it know about what it contains? • The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition • What kind of queries can you do? • SQL is a manifestation of the kinds of queries that fall out of relational algebra Monday, July 4, 2011

Non-Relational Data Models • Key-value stores • Document stores •
Column-oriented databases • Graph databases Monday, July 4, 2011

Key-Value Stores • A mapping from a key to a
value • The store doesn’t know anything about the the key or value • The store doesn’t know anything about the insides of the value • Operations • Set, get, or delete a key-value pair Monday, July 4, 2011

Document Stores • The store is a container for documents
• Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores • Can create “secondary indexes” • These provide the ability to query on any document field(s) • Operations: • Insert and delete documents • Update fields within documents Monday, July 4, 2011

Column-Oriented Stores • Like a relational store, but flipped around:
all data for a column is kept together • An index provides a means to get a column value for a record • Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop Monday, July 4, 2011

Graph Databases • Stores vertex-to-vertex edges • Operations: • Getting
and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints Monday, July 4, 2011

Consistency Models • Relational databases support transactions • Can only
see committed changes • Commit/abort span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Classic assumption: “I’m querying the one- and-only database” • Scaling reads and writes introduce different problems Monday, July 4, 2011

Replication - The 1st Breakdown of Consistency Monday, July 4,
2011

Limitations of a Single Master • Replication can provide arbitrary
read scalability • Subject to coping with read-consistency issues • Sooner or later, writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem Monday, July 4, 2011

Sharding • Paritition the primary key space via hashing •
Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system? Monday, July 4, 2011

Multi-Site Operation • Failure of a single-master system’s master •
A new master can be chosen • But what if there’s a network partition? • Can the application continue in read-only mode? Monday, July 4, 2011

Dynamo • Now a generic term for multi-master systems •
Writes can occur to any node • The same record can be updated on different nodes by different clients • All writes are replicated everywhere Monday, July 4, 2011

Dynamo – the 2nd breakdown of consistency • Collisions can
occur • Who wins? • A collision resolution strategy is required • Vector clocks • http://en.wikipedia.org/wiki/Vector_clock • Application access must be aware of this Monday, July 4, 2011

The Commercial Landscape Data Model Data Model Data Model Key-‐Value
Document Column-‐ Oriented Consistency Model Single Master Membase MongoDB Consistency Model Multi-‐Master/ Dynamo Riak CouchDB Cassandra, HBase, Hypertable Monday, July 4, 2011

Key Client Implementation Concerns • Monotonic reads • Can my
reads go back in time? • Read-your-own-writes • If I issue a query immediately after an insert or update, will I see my changes? • Uninterrupted writes • Am I always guaranteed the ability to write? • Conflict Resolution • Do I need to have a conflict resolution strategy? Monday, July 4, 2011

Using a Single-Master System • What does the intermediate agent
or system do for… • Monotonic reads? • Read-your-own-writes? • Uninterrupted writes? • Conflict Resolution? Monday, July 4, 2011

Using a Multi-Master System • What does the intermediate agent
or system do for… • Monotonic reads? • Read-your-own-writes? • Uninterrupted writes? • Conflict Resolution? Monday, July 4, 2011

Where MongoDB fits in the non-relational world MongoDB’s architecture and
features Some real-world users MongoDB Monday, July 4, 2011

MongoDB is a Document Store • MongoDB stores JSON objects
as BSON • { LastName: ‘Flintstone’, FirstName: ‘Fred’, …} • Secondary Indexes • db.collection.ensureIndex({LastName : 1, FirstName : 1}); • Simple QBE-like query syntax • db.collection.find({LastName : ‘Flintstone’}); • db.collection.find({LastName : { $gte : ‘Flintstone’}); Monday, July 4, 2011

MongoDB – Advanced Queries • Geo-spatial queries • Create a
geo index • Find points near a given point, sorted by radial distance • Can be planar or spherical • Find points within a certain radial distance, within a bounding box, or a polygon • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript Monday, July 4, 2011

MongoDB is a Single-Master System • A database is served
by members of a “replica set” • The system elects a primary (master) • Failure of the master is detected, and a new master is elected • Application writes get an error if there is no quorum to elect a new master • Reads continue to be fulfilled Monday, July 4, 2011

MongoDB Replica Set Monday, July 4, 2011

MongoDB Supports Sharding • A collection can be sharded •
Each shard is served by its own replica set • New shards (each a replica set) can be added at any time • Shard key ranges are automatically balanced Monday, July 4, 2011

MongoDB – Sharded Deployment Monday, July 4, 2011

MongoDB Storage Management • Data is kept in memory-mapped files
• Servers should have a lot of memory • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses Monday, July 4, 2011

MongoDB Server Management • Replica set members are aware of
each other • A majority of votes is required to elect a new primary • Members can be assigned priorities to affect the election • e.g., an “invisible” replica can be created with zero priority for backup purposes Monday, July 4, 2011

MongoDB Access • Drivers are available in many languages •
10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools Monday, July 4, 2011

MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:
AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers Monday, July 4, 2011

MongoDB – Hosted Services • http://www.mongodb.org/display/DOCS/ Hosting+Center • MongoHQ, Mongo
Machine, MongoLab • RESTful access to collections Monday, July 4, 2011

MongoDB Support • Paid Support • http://www.10gen.com/client-portal • 10gen Hosted
Monitoring • Consulting, training • Free Support • http://groups.google.com/group/mongodb-user • http://stackoverflow.com/questions/tagged/ mongodb Monday, July 4, 2011

Monday, July 4, 2011

• 2,000+ Production Deployments and growing. Monday, July 4, 2011

• 2,000+ Production Deployments and growing. • NYTimes, MTV, Shutterfly,
Foursquare, Craigslist, Disney, and more in Production. Monday, July 4, 2011

• 2,000+ Production Deployments and growing. • NYTimes, MTV, Shutterfly,
Foursquare, Craigslist, Disney, and more in Production. • Real, full indexes including sparse, covered & geospatial. Monday, July 4, 2011

MongoDB Users • http://www.10gen.com/customers • http://www.10gen.com/presentations • craigslist: http://www.10gen.com/presentation/ mongosf2011/craigslist
• bit.ly: http://blip.tv/mongodb/bit-ly-user-history- auto-sharded-3723147 • shutterfly: http://www.10gen.com/presentation/ mongosv2010/shutterfly Monday, July 4, 2011

Mini-demo/tutorial • http://try.mongodb.org/ Monday, July 4, 2011

@mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongoN Facebook
| Twitter | LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit) Monday, July 4, 2011

MongoDB Introduction - Mongo Hamburg 2011

MongoDB Introduction - Mongo Hamburg 2011

More Decks by Brendan McAdams

Other Decks in Programming

Featured

Transcript