Slide 1

Slide 1 text

Entity Relationships in a Document Database MapReduce Views for SQL Users

Slide 2

Slide 2 text

Entity: An object de ned by its identity and a thread of continuity[1] 1. "Entity" Domain-driven Design Community .

Slide 3

Slide 3 text

Entity Relationship Model

Slide 4

Slide 4 text

Join vs. Collation

Slide 5

Slide 5 text

SELECT `publisher`.`id`, `publisher`.`name`, `book`.`title` FROM `publisher` FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id` ORDER BY `publisher`.`id`, `book`.`title`; SQL Query Joining Publishers and Books

Slide 6

Slide 6 text

Joined Result Set publisher.id publisher.name book.title oreilly O'Reilly Media Building iPhone Apps with HTML, CSS, and JavaScript oreilly O'Reilly Media CouchDB: The Definitive Guide oreilly O'Reilly Media DocBook: The Definitive Guide oreilly O'Reilly Media RESTful Web Services

Slide 7

Slide 7 text

Joined Result Set publisher.id publisher.name book.title oreilly O'Reilly Media Building iPhone Apps with HTML, CSS, and JavaScript oreilly O'Reilly Media CouchDB: The Definitive Guide oreilly O'Reilly Media DocBook: The Definitive Guide oreilly O'Reilly Media RESTful Web Services Publisher (“left”)

Slide 8

Slide 8 text

Joined Result Set publisher.id publisher.name book.title oreilly O'Reilly Media Building iPhone Apps with HTML, CSS, and JavaScript oreilly O'Reilly Media CouchDB: The Definitive Guide oreilly O'Reilly Media DocBook: The Definitive Guide oreilly O'Reilly Media RESTful Web Services Publisher (“left”) Book “right”

Slide 9

Slide 9 text

Collated Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services"

Slide 10

Slide 10 text

Collated Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services" Publisher

Slide 11

Slide 11 text

Collated Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services" Publisher Books

Slide 12

Slide 12 text

View Result Sets Made up of columns and rows Every row has the same three columns: • key • id • value Columns can contain a mixture of logical data types

Slide 13

Slide 13 text

One to Many Relationships

Slide 14

Slide 14 text

Embedded Entities: Nest related entities within a document

Slide 15

Slide 15 text

A single document represents the “one” entity Nested entities (JSON Array) represents the “many” entities Simplest way to create a one to many relationship Embedded Entities

Slide 16

Slide 16 text

Example: Publisher with Nested Books { "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS, and JavaScript" } ] }

Slide 17

Slide 17 text

function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } } } Map Function

Slide 18

Slide 18 text

Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services"

Slide 19

Slide 19 text

Only works if there aren’t a large number of related entities: • Too many nested entities can result in very large documents • Slow to transfer between client and server • Unwieldy to modify • Time-consuming to index Limitations

Slide 20

Slide 20 text

Related Documents: Reference an entity by its identi er

Slide 21

Slide 21 text

A document representing the “one” entity Separate documents for each “many” entity Each “many” entity references its related “one” entity by the “one” entity’s document identi er Makes for smaller documents Reduces the probability of document update con icts Related Documents

Slide 22

Slide 22 text

Example: Publisher { "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media" }

Slide 23

Slide 23 text

Example: Related Book { "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly" }

Slide 24

Slide 24 text

Map Function function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); } }

Slide 25

Slide 25 text

Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "9780596155896" "CouchDB: The Definitive Guide" ["oreilly",1] "9780596529260" "RESTful Web Services" ["oreilly",1] "9780596805791" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "9781565925809" "DocBook: The Definitive Guide"

Slide 26

Slide 26 text

When retrieving the entity on the “right” side of the relationship, one cannot include any data from the entity on the “left” side of the relationship without the use of an additional query Only works for one to many relationships Limitations

Slide 27

Slide 27 text

Many to Many Relationships

Slide 28

Slide 28 text

List of Keys: Reference entities by their identi ers

Slide 29

Slide 29 text

A document representing each “many” entity on the “left” side of the relationship Separate documents for each “many” entity on the “right” side of the relationship Each “many” entity on the “right” side of the relationship maintains a list of document identi ers for its related “many” entities on the “left” side of the relationship List of Keys

Slide 30

Slide 30 text

Books and Related Authors

Slide 31

Slide 31 text

Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide" }

Slide 32

Slide 32 text

Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work" }

Slide 33

Slide 33 text

Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide" }

Slide 34

Slide 34 text

Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ] }

Slide 35

Slide 35 text

Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ] }

Slide 36

Slide 36 text

Map Function function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } } }

Slide 37

Slide 37 text

Result Set key id value ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide" ["9780596805029",1] "walsh" "Norman Walsh" ["9781565920514",0] "9781565920514" "Making TeX Work" ["9781565920514",1] "walsh" "Norman Walsh" ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide" ["9781565925809",1] "muellner" "Leonard Muellner" ["9781565925809",1] "walsh" "Norman Walsh"

Slide 38

Slide 38 text

Authors and Related Books

Slide 39

Slide 39 text

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } } } Map Function

Slide 40

Slide 40 text

Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1] "muellner" {"_id":"9781565925809"} ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "walsh" {"_id":"9780596805029"} ["walsh",1] "walsh" {"_id":"9781565920514"} ["walsh",1] "walsh" {"_id":"9781565925809"}

Slide 41

Slide 41 text

Including Docs include_docs=true key id value doc (truncated) ["muellner",0] "muellner" … {"name":"Leonard Muellner"} ["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"} ["walsh",0] "walsh" … {"name":"Norman Walsh"} ["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"} ["walsh",1] "walsh" … {"title":"Making TeX Work"} ["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}

Slide 42

Slide 42 text

Or, we can reverse the references…

Slide 43

Slide 43 text

Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner" }

Slide 44

Slide 44 text

Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh" }

Slide 45

Slide 45 text

Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ] }

Slide 46

Slide 46 text

Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ] }

Slide 47

Slide 47 text

Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ] }

Slide 48

Slide 48 text

Map Function function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } } }

Slide 49

Slide 49 text

Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1] "9781565925809" "DocBook: The Definitive Guide" ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide" ["walsh",1] "9781565920514" "Making TeX Work" ["walsh",1] "9781565925809" "DocBook: The Definitive Guide"

Slide 50

Slide 50 text

Queries from the “right” side of the relationship cannot include any data from entities on the “left” side of the relationship (without the use of include_docs) A document representing an entity with lots of relationships could become quite large Limitations

Slide 51

Slide 51 text

Relationship Documents: Create a document to represent each individual relationship

Slide 52

Slide 52 text

A document representing each “many” entity on the “left” side of the relationship Separate documents for each “many” entity on the “right” side of the relationship Neither the “left” nor “right” side of the relationship contain any direct references to each other For each distinct relationship, a separate document includes the document identi ers for both the “left” and “right” sides of the relationship Relationship Documents

Slide 53

Slide 53 text

Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide" }

Slide 54

Slide 54 text

Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work" }

Slide 55

Slide 55 text

Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide" }

Slide 56

Slide 56 text

Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner" }

Slide 57

Slide 57 text

Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh" }

Slide 58

Slide 58 text

Example: Relationship Document { "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh" }

Slide 59

Slide 59 text

Example: Relationship Document { "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh" }

Slide 60

Slide 60 text

Example: Relationship Document { "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner" }

Slide 61

Slide 61 text

Example: Relationship Document { "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh" }

Slide 62

Slide 62 text

Books and Related Authors

Slide 63

Slide 63 text

function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); } } Map Function

Slide 64

Slide 64 text

Result Set key id value ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide" ["9780596805029",1] "44005f2c" {"_id":"walsh"} ["9781565920514",0] "9781565920514" "Making TeX Work" ["9781565920514",1] "44005f72" {"_id":"walsh"} ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide" ["9781565925809",1] "44006720" {"_id":"muellner"} ["9781565925809",1] "44006b0d" {"_id":"walsh"}

Slide 65

Slide 65 text

Including Docs include_docs=true key id value doc (truncated) ["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"} ["9780596805029",1] … … {"name":"Norman Walsh"} ["9781565920514",0] … … {"title":"Making TeX Work"} ["9781565920514",1] … … {"author","name":"Norman Walsh"} ["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"} ["9781565925809",1] … … {"name":"Leonard Muellner"} ["9781565925809",1] … … {"name":"Norman Walsh"}

Slide 66

Slide 66 text

Authors and Related Books

Slide 67

Slide 67 text

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); } } Map Function

Slide 68

Slide 68 text

Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1] "44006720" {"_id":"9781565925809"} ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "44005f2c" {"_id":"9780596805029"} ["walsh",1] "44005f72" {"_id":"9781565920514"} ["walsh",1] "44006b0d" {"_id":"9781565925809"}

Slide 69

Slide 69 text

Including Docs include_docs=true key id value doc (truncated) ["muellner",0] … … {"name":"Leonard Muellner"} ["muellner",1] … … {"title":"DocBook: The Definitive Guide"} ["walsh",0] … … {"name":"Norman Walsh"} ["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"} ["walsh",1] … … {"title":"Making TeX Work"} ["walsh",1] … … {"title":"DocBook: The Definitive Guide"}

Slide 70

Slide 70 text

Queries can only contain data from the “left” or “right” side of the relationship (without the use of include_docs) Maintaining relationship documents may require more work Limitations

Slide 71

Slide 71 text

Final Thoughts

Slide 72

Slide 72 text

Document databases have no tables (and therefore no columns) Indexes (views) are queried directly, instead of being used to optimize more generalized queries Result set columns can contain a mix of logical data types No built-in concept of relationships between documents Related entities can be embedded in a document, referenced from a document, or both Document Databases Compared to Relational Databases

Slide 73

Slide 73 text

Caveats No referential integrity No atomic transactions across document boundaries Some patterns may involve denormalized (i.e. redundant) data Data inconsistencies are inevitable (i.e. eventual consistency) Consider the implications of replication—what may seem consistent with one database may not be consistent across nodes (e.g. referencing entities that don’t yet exist on the node)

Slide 74

Slide 74 text

Additional Techniques Use the startkey and endkey parameters to retrieve one entity and its related entities: startkey=["9781565925809"]&endkey=["9781565925809",{}] De ne a reduce function and use grouping levels Use UUIDs rather than natural keys for better performance Use the bulk document API when writing Relationship Documents When using the List of Keys or Relationship Documents patterns, denormalize data so that you can have data from the “right” and “left” side of the relationship within your query results

Slide 75

Slide 75 text

Cheat Sheet One to Many Many to Many <= N* Relations > N* Relations Embedded Entities Related Documents List of Keys Relationship Documents ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ * where N is a large number for your system

Slide 76

Slide 76 text

http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/

Slide 77

Slide 77 text

Thank You @BradleyHolt http://bradley-holt.com [email protected] Copyright © 2011-2012 Bradley Holt. All rights reserved.