Schema Design Principles and Practice - Robert Stam, 10gen

MongoBoulder 2012 Schema Design Robert Stam [email protected]

Topics Introduction • Basic data modeling • Evolving a schema
Common patterns • Single table inheritance • One-to-many • Many-to-many • Trees

Beneﬁts of relational model Before relational model • Data and
logic combined After relational model • Separation of concerns • Data model independent of logic • Logic freed from concerns of data design MongoDB continues this separation

Normalization Goals • Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema • Make the model informative to users • Avoid bias toward a particular query In MongoDB • Similar goals apply • But rules are different

Relational model makes normalized data look like

Document databases make normalized data look like

Terminology Relational MongoDB Table Collection Row(s) Documents Index Index Join
Embedding and linking Partition Shard Partition key Shard key

Collections • Cheap to create (max 24000) • Collections don’t
have a schema • Individual documents have a schema • Common for documents in a collection to share a schema • Document schema can evolve • Consider using multiple related collections tied together by a naming convention: • e.g. LogData-2011-02-08

Document basics • Zero or more elements • Elements are
name/value pairs • Rich data types for values • JSON (extended for new types) • BSON

Data types • Numeric (Int32, Int64, Double) • String •
Boolean • DateTime (ms precision, always in UTC) • ObjectId • Others (Javascript, Regex, Binary, Null, ...) • Array • Nested document

Sample rich document > db.orders.findOne() { _id
: 1, customer : { customer_id : 1234, name : "John Doe", address : { line1 : "123 Main St", city : "Duncannon", state : "PA", zip : "12345-‐6789" } } items : [ { item_id : 111, ... } // data for first item { item_id : 222, ... } // data for next item ... ] }

Rich document advantages • Holistic representation • Still easy to
manipulate • Pre-joined for fast retrieval

Document size • Max 4MB in earlier MongoDB versions •
Max 16MB in current versions • Performance considerations long before reaching the maximum size

Document design • Design documents that map simply to your
application data > book = { _id : ObjectId("12345678901234567890abcd"), author : "Ernest Hemingway", title : "The Old Man and the Sea", tags : ["American Literature", "Sea", "Large Fish"] } Notes: • Every document must have a unique _id • MongoDB will generate one automatically if your document does not have an _id

Indexes • Speed up queries (dramatically) • Aren’t free •
Go to Indexing talk for more details

Extending the schema > comment = {
author : "Robert", text : "Great book", date : new Date() } > db.books.update( { title : "The Old Man and the Sea" }, { $inc : { comments_count : 1 }, $push : { comments : comment } } } >

Extended schema > db.books.find({ title : "The Old Man and
the Sea" }) { _id : ObjectId("12345678901234567890abcd"), author : "Ernest Hemingway", title : "The Old Man and the Sea", tags : ["American Literature", "Sea", "Large Fish"], comments_count : 1, comments : [ { author : "Robert", text : "Great book", date : ISODate("2012-‐01-‐31T22:29:14.000Z") } ] } >

Single table inheritance Shapes table: id type area radius side
length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2

Single table inheritance: MongoDB > db.shapes.find() { _id : 1,
type : "circle", area : 3.14, radius : 1 }, { _id : 2, type : "square", area : 4, side : 2 }, { _id : 3, type : "rect", area : 10, length : 5, width : 2 } // find shapes where radius > 0 > db.shapes.find({ radius : { $gt : 0 } }) { _id : 1, type : "circle", area : 3.14, radius : 1 }, // find shapes where area >= 4 > db.shapes.find({ area : { $gte : 4 } }) { _id : 2, type : "square", area : 4, side : 2 }, { _id : 3, type : "rect", area : 10, length : 5, width : 2 }

One-to-many Options • Embedded Array • Embedded Document • Normalized

One-to-many: embedded array > db.books.find() { author
: "Ernest Hemingway", title : "The Old Man and the Sea", comments : [ { author : "Robert", text : "Great book" }, { author : "Jim", text : "I didn't like it" } ] } >

One to many: embedded trees > db.books.find() {
author : "Ernest Hemingway", title : "The Old Man and the Sea", comments : [ { author : "Robert", text : "Great book" replies : [ { author : "Jim", text : "I didn't like it" } ] } ] } >

One-to-many: normalized > db.books.find() { _id :
1, author : "Ernest Hemingway", title : "The Old Man and the Sea", comment_ids : [1, 2] // probably redundant } > db.comments.find() { _id : 1, book_id : 1, author : "Robert", text : "Great book" } { _id : 2, book_id : 1, author : "Jim", text : "I didn't like it" } >

Many-to-many Example: • Product can be in many categories •
Category has many products

Many-to-many: products and categories > db.products.find() {
_id : 1, name : "Baseball bat", category_ids : [1, 2] } > db.categories.find() { _id : 1, name : "Sports Equipment", product_ids : [1, ...] } { _id : 2, name : "Baseball", product_ids : [1, ...] }

Many-to-many: queries // all products for a given category >
db.products.find({ category_ids : 1 }) // all categories for a given product > db.categories.find({ product_ids : 1 })

Many-to-many: products and categories (normalized) > db.products.find() {
_id : 1, name : "Baseball bat", category_ids : [1, 2] } > db.categories.find() { _id : 1, name : "Sports Equipment" } { _id : 2, name : "Baseball" }

Many-to-many: queries (normalized) // all products for a given category
> db.products.find({ category_ids : 1 }) // all categories for a given product > product = db.product.findOne({ _id : 1 }) > db.categories.find( { _id : { $in : product.category_ids } })

Trees Options: • Full tree in document • Parent links
• Child links • Parent and child links • Array of ancestors • Ancestor paths

Trees: full tree in document { comments
: [ { author : "Robert", text : "...", replies : [ { author : "Jim", text : "...", replies : [] } ] } ] } Pros: single document, performance, intuitive Cons: hard to search, hard to get partial results, document size limit could be reached

Trees: Parent and child links Parent links • Each node
is stored as a document • Contains the id of the parent Child links • Each node is stored as a document • Contains the ids of the children In some cases you might do both

Trees: array of ancestors > db.nodes.find() { _id : 1
} { _id : 2, ancestors : [1], parent : 1 } { _id : 3, ancestors : [1, 2], parent : 2 } { _id : 4, ancestors : [1, 2], parent : 2 } { _id : 5, ancestors : [1], parent : 1 } { _id : 6, ancestors : [1, 5], parent : 5 }

Trees: array of ancestors (queries) // find all children of
2 > db.nodes.find({ parent : 2 }) // find all descendants of 2 > db.nodes.find({ ancestors : 2 }) // find all ancestors of 6 > node = db.nodes.findOne({ _id = 6 }) > db.nodes.find({ _id : { $in : node.ancestors } }) // find all siblings of 3 > node = db.nodes.findOne({ _id = 3 }) > db.nodes.find({ parent : node.parent, _id : { $ne : 3 } })

Trees: ancestor paths store hierarchy as a path expression separate
each node by a delimiter (avoid "/" and ".") use regular expressions to find parts of a tree > db.nodes.find() { _id : 1, path : ",1," } { _id : 2, path : ",1,2," } { _id : 3, path : ",1,2,3," } { _id : 4, path : ",1,2,4," } { _id : 5, path : ",1,5," } { _id : 6, path : ",1,5,6," } variations: don't store leading or trailing delimiter don't store final id (it's the same as _id)

Trees: ancestor paths (queries) // find all descendents of 2
> db.nodes.find({ path : /,2,/ }) // find all children of 2 > db.nodes.find({ path : /,2,[^,]+,$/ }) or > db.nodes.find({ path : /,2,$/ }) // if _id is not on path // find all ancestors of 6 // not so easy // find all siblings of 3 // not so easy

Summary • Schema design is different in MongoDB • Basic
principles stay the same • Use rich documents • There's more than one right way • Focus on how your application uses the data • Rapidly evolve the schema to meet your requirements

Thank you Learn more • www.mongodb.org • www.10gen.com/events • www.10gen.com/webinars

© Copyright 2010 10gen Inc. @mongodb conferences, appearances, and meetups
http://www.10gen.com/events http://bit.ly/mongoB Facebook Twitter LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re Hiring ! [email protected]

Schema Design Principles and Practice - Robert ...

Schema Design Principles and Practice - Robert Stam, 10gen

More Decks by mongodb

Other Decks in Technology

Featured

Transcript