Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Introduction - Mongo Hamburg 2011

MongoDB Introduction - Mongo Hamburg 2011

Brendan McAdams

July 04, 2011
Tweet

More Decks by Brendan McAdams

Other Decks in Programming

Transcript

  1. Stuffing an object graph into a relational model is like

    fitting a square peg into a round hole. Monday, July 4, 2011
  2. Sure, we can use an ORM. But who are we

    really fooling? Monday, July 4, 2011
  3. Sure, we can use an ORM. But who are we

    really fooling? ... and who/what are we going to wake up next to in the morning? Monday, July 4, 2011
  4. This is the SQL Model mysql> select * from book;

    +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Monday, July 4, 2011
  5. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data Monday, July 4, 2011
  6. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables Monday, July 4, 2011
  7. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” Monday, July 4, 2011
  8. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... Monday, July 4, 2011
  9. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend Monday, July 4, 2011
  10. Joins are great and all ... • Potentially organizationally messy

    • Structure of a single object is NOT immediately clear to someone glancing at the shell data • We have to flatten our object out into three tables • 7 separate inserts just to add “Programming in Scala” • Once we turn the relational data back into objects ... • We still need to convert it to data for our frontend • I don’t know about you, but I have better things to do with my time. Monday, July 4, 2011
  11. The Same Data in MongoDB > db.books.find().forEach(printjson) { "_id" :

    ObjectId("4dfa6baa9c65dae09a4bbda3"), "title" : "The Demon-Haunted World: Science as a Candle in the Dark", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda4"), "title" : "Cosmos", "author" : [ { "first_name" : "Carl", "last_name" : "Sagan", "middle_name" : "Edward", "year_of_birth" : 1934 } ] } Monday, July 4, 2011
  12. The Same Data in MongoDB (Part 2) { "_id" :

    ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, July 4, 2011
  13. Access to the embedded objects is integral > db.books.find({"author.first_name": "Martin",

    "author.last_name": "Odersky"}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Monday, July 4, 2011
  14. As is manipulation of the embedded data > db.books.update({"author.first_name": "Bill",

    "author.last_name": "Venners"}, ... {$set: {"author.$.company": "Artima, Inc."}}) > db.books.update({"author.first_name": "Martin", "author.last_name": "Odersky"}, ... {$set: {"author.$.company": "Typesafe, Inc."}}) > db.books.findOne({"title": /Scala$/}) { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "author" : [ { "company" : "Typesafe, Inc.", "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "company" : "Artima, Inc.", "first_name" : "Bill", "last_name" : "Venners" } ], "title" : "Programming in Scala" } Monday, July 4, 2011
  15. Scale out write read shard1 rep_a1 rep_b1 rep_c2 shard2 rep_a2

    rep_b2 rep_c2 shard3 rep_a3 rep_b3 rep_c3 mongos  /   config  server mongos  /   config  server mongos  /   config  server Monday, July 4, 2011
  16. Why did non-relational databases arise? Problems with relational databases in

    the web world The Whys of Non-Relational Databases Monday, July 4, 2011
  17. Problem - Schema Evolution • Applications are evolving all the

    time • Applications need new fields • Applications need new indexes • Data is growing – sometimes very fast • Users need to be able to alter their schemas without making their data unavailable • The web world expects 24x7 service • RDBMSs can have a hard time doing this Monday, July 4, 2011
  18. Problem – Write Rates • Replication is a solution for

    high read loads • Sooner or later, writing becomes a bottleneck • Sharding – partitioning a logical database across multiple database instances • Joins and aggregation become a problem • Distributed transactions are too slow for the web • Manual management of shards • Choosing shard partitions • Rebalancing shards Monday, July 4, 2011
  19. An introduction to terminology you’re going to be seeing a

    lot Vocabulary of the Non-Relational World Monday, July 4, 2011
  20. Data Models • A non-relational database’s data model determines the

    kinds of items it can contain and how they can be retrieved • What can the system store, and what does it know about what it contains? • The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition • What kind of queries can you do? • SQL is a manifestation of the kinds of queries that fall out of relational algebra Monday, July 4, 2011
  21. Non-Relational Data Models • Key-value stores • Document stores •

    Column-oriented databases • Graph databases Monday, July 4, 2011
  22. Key-Value Stores • A mapping from a key to a

    value • The store doesn’t know anything about the the key or value • The store doesn’t know anything about the insides of the value • Operations • Set, get, or delete a key-value pair Monday, July 4, 2011
  23. Document Stores • The store is a container for documents

    • Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores • Can create “secondary indexes” • These provide the ability to query on any document field(s) • Operations: • Insert and delete documents • Update fields within documents Monday, July 4, 2011
  24. Column-Oriented Stores • Like a relational store, but flipped around:

    all data for a column is kept together • An index provides a means to get a column value for a record • Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop Monday, July 4, 2011
  25. Graph Databases • Stores vertex-to-vertex edges • Operations: • Getting

    and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints Monday, July 4, 2011
  26. Consistency Models • Relational databases support transactions • Can only

    see committed changes • Commit/abort span multiple changes • Read-only transaction flavors • Read committed, repeatable read, etc • Classic assumption: “I’m querying the one- and-only database” • Scaling reads and writes introduce different problems Monday, July 4, 2011
  27. Limitations of a Single Master • Replication can provide arbitrary

    read scalability • Subject to coping with read-consistency issues • Sooner or later, writing becomes a bottleneck • Physical limitations (seek time) • Throughput of a single I/O subsystem Monday, July 4, 2011
  28. Sharding • Paritition the primary key space via hashing •

    Set up a duplicate system for each shard • The write-rate limitation now applies to each shard • Joins or aggregation across shards are problematic • Can the data be re-sharded on a live system? • Can shards be re-balanced on a live system? Monday, July 4, 2011
  29. Multi-Site Operation • Failure of a single-master system’s master •

    A new master can be chosen • But what if there’s a network partition? • Can the application continue in read-only mode? Monday, July 4, 2011
  30. Dynamo • Now a generic term for multi-master systems •

    Writes can occur to any node • The same record can be updated on different nodes by different clients • All writes are replicated everywhere Monday, July 4, 2011
  31. Dynamo – the 2nd breakdown of consistency • Collisions can

    occur • Who wins? • A collision resolution strategy is required • Vector clocks • http://en.wikipedia.org/wiki/Vector_clock • Application access must be aware of this Monday, July 4, 2011
  32. The Commercial Landscape Data  Model Data  Model Data  Model Key-­‐Value

    Document Column-­‐ Oriented Consistency   Model Single  Master Membase MongoDB Consistency   Model Multi-­‐Master/ Dynamo Riak CouchDB Cassandra,   HBase,   Hypertable Monday, July 4, 2011
  33. Key Client Implementation Concerns • Monotonic reads • Can my

    reads go back in time? • Read-your-own-writes • If I issue a query immediately after an insert or update, will I see my changes? • Uninterrupted writes • Am I always guaranteed the ability to write? • Conflict Resolution • Do I need to have a conflict resolution strategy? Monday, July 4, 2011
  34. Using a Single-Master System • What does the intermediate agent

    or system do for… • Monotonic reads? • Read-your-own-writes? • Uninterrupted writes? • Conflict Resolution? Monday, July 4, 2011
  35. Using a Multi-Master System • What does the intermediate agent

    or system do for… • Monotonic reads? • Read-your-own-writes? • Uninterrupted writes? • Conflict Resolution? Monday, July 4, 2011
  36. Where MongoDB fits in the non-relational world MongoDB’s architecture and

    features Some real-world users MongoDB Monday, July 4, 2011
  37. MongoDB is a Document Store • MongoDB stores JSON objects

    as BSON • { LastName: ‘Flintstone’, FirstName: ‘Fred’, …} • Secondary Indexes • db.collection.ensureIndex({LastName : 1, FirstName : 1}); • Simple QBE-like query syntax • db.collection.find({LastName : ‘Flintstone’}); • db.collection.find({LastName : { $gte : ‘Flintstone’}); Monday, July 4, 2011
  38. MongoDB – Advanced Queries • Geo-spatial queries • Create a

    geo index • Find points near a given point, sorted by radial distance • Can be planar or spherical • Find points within a certain radial distance, within a bounding box, or a polygon • Built-in Map-Reduce • The caller provides map and reduce functions written in JavaScript Monday, July 4, 2011
  39. MongoDB is a Single-Master System • A database is served

    by members of a “replica set” • The system elects a primary (master) • Failure of the master is detected, and a new master is elected • Application writes get an error if there is no quorum to elect a new master • Reads continue to be fulfilled Monday, July 4, 2011
  40. MongoDB Supports Sharding • A collection can be sharded •

    Each shard is served by its own replica set • New shards (each a replica set) can be added at any time • Shard key ranges are automatically balanced Monday, July 4, 2011
  41. MongoDB Storage Management • Data is kept in memory-mapped files

    • Servers should have a lot of memory • Files are allocated as needed • Documents in a collection are kept on a list using a geographical addressing scheme • Indexes (B*-trees) point to documents using geographical addresses Monday, July 4, 2011
  42. MongoDB Server Management • Replica set members are aware of

    each other • A majority of votes is required to elect a new primary • Members can be assigned priorities to affect the election • e.g., an “invisible” replica can be created with zero priority for backup purposes Monday, July 4, 2011
  43. MongoDB Access • Drivers are available in many languages •

    10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala • Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R • http://www.mongodb.org/display/DOCS/Overview+- +Writing+Drivers+and+Tools Monday, July 4, 2011
  44. MongoDB Availability • Source • https://github.com/mongodb/mongo • Server • License:

    AGPL • http://www.mongodb.org/downloads • Drivers • License: Apache • http://www.mongodb.org/display/DOCS/Drivers Monday, July 4, 2011
  45. MongoDB – Hosted Services • http://www.mongodb.org/display/DOCS/ Hosting+Center • MongoHQ, Mongo

    Machine, MongoLab • RESTful access to collections Monday, July 4, 2011
  46. MongoDB Support • Paid Support • http://www.10gen.com/client-portal • 10gen Hosted

    Monitoring • Consulting, training • Free Support • http://groups.google.com/group/mongodb-user • http://stackoverflow.com/questions/tagged/ mongodb Monday, July 4, 2011
  47. • 2,000+ Production Deployments and growing. • NYTimes, MTV, Shutterfly,

    Foursquare, Craigslist, Disney, and more in Production. Monday, July 4, 2011
  48. • 2,000+ Production Deployments and growing. • NYTimes, MTV, Shutterfly,

    Foursquare, Craigslist, Disney, and more in Production. • Real, full indexes including sparse, covered & geospatial. Monday, July 4, 2011
  49. MongoDB Users • http://www.10gen.com/customers • http://www.10gen.com/presentations • craigslist: http://www.10gen.com/presentation/ mongosf2011/craigslist

    • bit.ly: http://blip.tv/mongodb/bit-ly-user-history- auto-sharded-3723147 • shutterfly: http://www.10gen.com/presentation/ mongosv2010/shutterfly Monday, July 4, 2011
  50. @mongodb conferences,  appearances,  and  meetups http://www.10gen.com/events http://bit.ly/mongoN   Facebook  

                     |                  Twitter                  |                  LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected] (twitter: @rit) Monday, July 4, 2011