Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema Design with MongoDB - Tony Hannan, Software Engineer, 10gen

mongodb
November 28, 2011

Schema Design with MongoDB - Tony Hannan, Software Engineer, 10gen

MongoDallas2011

Schema design is a critical step in making sure an application scales well. There are considerations for reads and writes, both with and without sharding. We'll go through a few use cases and examine how difference schemas impact performance.

mongodb

November 28, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Document model not too different from Relational model RDBMS MongoDB

    Table Collection Flat record (row) Full record (document) Index Index Join Embed or client-side join Transaction Single-document transaction
  2. RDBMS schema design • ER → Relational → Physical Design

    & Indexes 1. Every entity gets a table 2. Every many-to-many relationship gets a table 3. Denormalize to remove joins 4. Cluster tables to speed up joins 5. Index to speed up selection and joins • Steps 3 & 4 are optional but recommended
  3. MongoDB schema design • ER → Relational → Physical Design

    & Indexes 1. Every entity gets a table 2. Every many-to-many relationship gets a table 3. Denormalize to remove joins 4. Cluster tables to speed up remove joins 5. Index to speed up selection and client-side joins • Steps 3 & 4 are optional but recommended mandatory because MongoDB has no multi-object transactions and no (server-side) joins
  4. Client-side join • Schema A {id, a, bid, c} →

    B {id, x, y} • Join query as = db.A.find( {a: “foo”} ) bs = db.B.find( {id: {$in: distinct(map(bid, as))}} ) abs = [{id, a, {x, y}, c} | {id, a, b, c} <- as, {bid, x, y} <- bs, b == bid]
  5. Denormalization A {id, a, bid, c} → B {id, x,

    y} Embed: A {id, a, b: {x, y}, c} or Duplicate: A {id, a, b: {id, x}, c} → B {id, x, y}
  6. Clustered tables in RDBMS (not clustered indexes) • Schema A

    {id, a, c} <-->> B {id, aid, x, y} • Layout on disk A {id: 1, a, c} B {id, aid: 1, x, y} B {id, aid: 1, x, y} A {id: 2, a c} B {id, aid: 2, x, y} ...
  7. Clustered tables = embedded array in MongoDB A {id, a,

    c} <-->> B {id, aid, x, y} A {id, a, bs: [ {x, y} ], c}
  8. Art of schema design • What to … • Embed

    (both single document and array of documents) • Duplicate • Keep as client-side join • Index
  9. Embed dependent entities • Embed entities that always appear with

    their parent • Examples • Address • Comments • Items in a shopping cart Customer { address: {street, city, state, zip}, cart: [ {productId, quantity} ] … }
  10. Growing embedded arrays • Documents have to move when they

    grow beyond their current allocated space – There is padding so they don't move on every insert • Regularly growing/moving large objects slows down updates • Alternative is to not embed, however, then dependent entities are not colocated but interleaved with other unrelated dependents (resulting in slow retrieval) • A hybrid approach is to bucket dependents, so every N dependents reside together A {id, a, bs: [ {x, y} ], c} A {id, a, c} <-->> B {id, aid, bs: [ {x, y} ]} – Add new B bucket for every N inserts of {x, y}
  11. Duplicate • Duplicate where benefit of removing client-side join out

    ways cost of maintaining duplicates • Won't be able to update both copies atomically because of single-object transactions only
  12. Indexing • Index where speed benefit out ways space and

    update cost • You may index embedded fields even inside arrays • Every element of array gets indexed • Example Schema: Customer {id, address, cart: [ {productId, quantity} ]} Query: db.Customer.find( {cart.productId: 1234} ) Index: db.Customer.ensureIndex( {cart.productId: 1} ) • See next talk “Indexing and Query Optimization” for more
  13. Dynamic schema (schemaless) • Easy schema evolution • Can add/remove

    fields on the fly • Mixed types in same collection, good for subtypes • Eg. Employee collection holding Hourly and Salary employees {name, address, department, salary, ...} {name, address, department, hourlyRate, ...}
  14. Single-object transactions • Embedding dependents likely makes basic transactions hit

    just one document (object) • If you still have transactions that span multiple documents then • Consider if you can live without the transaction semantics • Use compensating transactions over single documents • Implement application level transactions using single-object transactions as locking primitive
  15. findAndModify • Combo query and update single object, atomically •

    Example: Priority queue • Schema: Queue {_id, priority, …} • Get and remove highest priority item on queue (atomically) x = db.Queue.findAndModify( {query: {}, sort: {priority: -1}, remove: true} )
  16. Compare & swap using single-object transaction • Update object as

    long as it hasn't changed (optimistic transaction) 1. Get object x = db.X.findOne({_id: 1234}) 2. Edit object 3. Save object as long as it hasn't been changed by someone else x.version ++ db.X.update({_id: 1234, version: x.version - 1}, x) r = db.getLastError() if (r.n == 0) throw tryAgain(x)