Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich documents
Slide 4
Slide 4 text
Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich documents
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
Slide 5
Slide 5 text
Ways to model data:
http://www.flickr.com/photos/42304632@N00/493639870/
Slide 6
Slide 6 text
Normalized
Slide 7
Slide 7 text
Denormalized
Slide 8
Slide 8 text
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON
Document
Index Index
Join Embedding
&
Linking
Slide 9
Slide 9 text
Schema-design Criteria
How can we manipulate
this data?
• Dynamic Queries
• Secondary Indexes
• Atomic Updates
• Map Reduce
Slide 10
Slide 10 text
Schema-design Criteria
How can we manipulate
this data?
• Dynamic Queries
• Secondary Indexes
• Atomic Updates
• Map Reduce
Access Patterns?
• Read / Write Ratio
• Types of updates
• Types of queries
• Data life-cycle
Slide 11
Slide 11 text
Schema-design Criteria
How can we manipulate
this data?
• Dynamic Queries
• Secondary Indexes
• Atomic Updates
• Map Reduce
Considerations
• No Joins
• Document writes are atomic
Access Patterns?
• Read / Write Ratio
• Types of updates
• Types of queries
• Data life-cycle
Slide 12
Slide 12 text
Here’s an example with a book:
Slide 13
Slide 13 text
A simple start:
Map the documents to your application.
book
=
{author:
“Hergé”,
date:
new
Date(),
text:
“Destination
Moon”,
tags:
[“comic”,
“adventure”]}
>
db.books.save(book)
Slide 14
Slide 14 text
>
db.books.find()
{
_id:
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author:
"Hergé",
date:
"Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)",
text:
"Destination
Moon",
tags:
[
"comic",
"adventure"
]
}
Notes:
• _id must be unique, but can be anything you’d like
• Default BSON ObjectId if one is not supplied
Find the document
Slide 15
Slide 15 text
Secondary index on “author”
>
db.books.ensureIndex({author:
1})
>
db.books.find({author:
'Hergé'})
{
_id:
ObjectId("4c4ba5c0672c685e5e8aabf3"),
date:
"Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)",
author:
"Hergé",
...
}
Add an index, find via index
Multi-key indexes
//
Build
an
index
on
the
‘tags’
array
>
db.books.ensureIndex({tags:
1})
Slide 18
Slide 18 text
Multi-key indexes
//
Build
an
index
on
the
‘tags’
array
>
db.books.ensureIndex({tags:
1})
//
find
posts
with
a
specific
tag
//
(This
will
use
an
index!)
>
db.books.find({tags:
‘comic’})
//
create
index
on
nested
documents:
>
db.books.ensureIndex({"comments.author":
1})
>
db.books.find({comments.author:”Kyle”})
The ‘dot’ operator
Slide 23
Slide 23 text
//
create
index
on
nested
documents:
>
db.books.ensureIndex({"comments.author":
1})
>
db.books.find({comments.author:”Kyle”})
//
create
index
comment
votes:
>
db.books.ensureIndex({comments.votes:
1})
//
find
all
books
with
any
comments
with
more
than
//
50
votes
>
db.books.ensureIndex({comments.votes:
{$gt:
50}})
The ‘dot’ operator
One to Many
- Embedded Array / Array Keys
- $slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
Slide 36
Slide 36 text
One to Many
- Embedded Array / Array Keys
- $slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
Slide 37
Slide 37 text
One to Many
- Embedded Array / Array Keys
- $slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
Slide 38
Slide 38 text
One to Many - patterns
- Embedded Array / Array Keys
- Embedded Array / Array Keys
- Embedded tree
- Normalized
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"adventure"}
//
All
products
for
a
given
category
>
db.products.find({category_ids:
ObjectId
("4c4ca25433fb5941681b912f")})
Alternative
Slide 48
Slide 48 text
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"adventure"}
//
All
products
for
a
given
category
>
db.products.find({category_ids:
ObjectId("4c4ca25433fb5941681b912f")})
//
All
categories
for
a
given
product
product
=
db.products.find(_id
:
some_id)
>
db.categories.find({_id
:
{$in
:
product.category_ids}})
Alternative
Slide 49
Slide 49 text
Trees
Full Tree in Document
{
comments:
[
{
author:
“Kyle”,
text:
“...”,
replies:
[
{author:
“Fred”,
text:
“...”,
replies:
[]}
]}
]
}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 16MB limit
Slide 50
Slide 50 text
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Slide 51
Slide 51 text
Array of Ancestors
- Store all Ancestors of a node
{
_id:
"a"
}
{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}
Slide 52
Slide 52 text
Array of Ancestors
- Store all Ancestors of a node
{
_id:
"a"
}
{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}
//find
all
descendants
of
b:
>
db.tree2.find({ancestors:
‘b’})
//find
all
direct
descendants
of
b:
>
db.tree2.find({parent:
‘b’})
Slide 53
Slide 53 text
Array of Ancestors
- Store all Ancestors of a node
{
_id:
"a"
}
{
_id:
"b",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"c",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"d",
ancestors:
[
"a",
"b"
],
parent:
"b"
}
{
_id:
"e",
ancestors:
[
"a"
],
parent:
"a"
}
{
_id:
"f",
ancestors:
[
"a",
"e"
],
parent:
"e"
}
//find
all
descendants
of
b:
>
db.tree2.find({ancestors:
‘b’})
//find
all
direct
descendants
of
b:
>
db.tree2.find({parent:
‘b’})
//find
all
ancestors
of
f:
>
ancestors
=
db.tree2.findOne({_id:’f’}).ancestors
>
db.tree2.find({_id:
{
$in
:
ancestors})
Slide 54
Slide 54 text
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
{
inprogress:
false,
priority:
1,
message:
“Rich
documents
FTW!”
...
}
Slide 55
Slide 55 text
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
{
inprogress:
false,
priority:
1,
message:
“Rich
documents
FTW!”
...
}
//
find
highest
priority
job
and
mark
as
in-‐progress
job
=
db.jobs.findAndModify({
query:
{inprogress:
false},
sort:
{priority:
-‐1),
update:
{$set:
{inprogress:
true,
started:
new
Date()}}})
Slide 56
Slide 56 text
Summary
Schema design is different in MongoDB
Basic data design principals stay the same
Focus on how the apps manipulates data
Rapidly evolve schema to meet your requirements
Enjoy your new freedom, use it wisely :-)