Slide 1

Slide 1 text

Entity Relationships in a Document Database MapReduce Views for SQL Users

Slide 2

Slide 2 text

You’re using a relational database, but have been relying heavily on denormalization to optimize read performance You would like to give up consistency in exchange for a high level of concurrency Your data model is a “ t” for documents (e.g. a CMS) When to Choose a Document Database

Slide 3

Slide 3 text

When Not to Choose a Document Database Your data ts better in a relational model—SQL is a powerful and mature language for working with relational data sets Consistency is critical to your application You haven’t bothered exploring scalability options for your current database

Slide 4

Slide 4 text

Incremental Map/Reduce "How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009. http://howfuckedismydatabase.com/nosql/ (24 October 2012).

Slide 5

Slide 5 text

Entity Relationship Model

Slide 6

Slide 6 text

Join vs. Collation

Slide 7

Slide 7 text

SELECT `publisher`.`id`, `publisher`.`name`, `book`.`title` FROM `publisher` FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id` ORDER BY `publisher`.`id`, `book`.`title`; SQL Query Joining Publishers and Books

Slide 8

Slide 8 text

Joined Result Set publisher.id publisher.name book.title oreilly O'Reilly Media Building iPhone Apps with HTML, CSS, and JavaScript oreilly O'Reilly Media CouchDB: The Definitive Guide oreilly O'Reilly Media DocBook: The Definitive Guide oreilly O'Reilly Media RESTful Web Services Publisher (“left”) Book “right”

Slide 9

Slide 9 text

Collated Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services" Publisher Books

Slide 10

Slide 10 text

View Result Sets Made up of columns and rows Every row has the same three columns: • key • id • value Columns can contain a mixture of logical data types

Slide 11

Slide 11 text

One to Many Relationships

Slide 12

Slide 12 text

Embedded Entities: Nest related entities within a document

Slide 13

Slide 13 text

A single document represents the “one” entity Nested entities (JSON Array) represents the “many” entities Simplest way to create a one to many relationship Embedded Entities

Slide 14

Slide 14 text

Example: Publisher with Nested Books { "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS, and JavaScript" } ] }

Slide 15

Slide 15 text

function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } } } Map Function

Slide 16

Slide 16 text

Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services"

Slide 17

Slide 17 text

Only works if there aren’t a large number of related entities: • Too many nested entities can result in very large documents • Slow to transfer between client and server • Unwieldy to modify • Time-consuming to index Limitations

Slide 18

Slide 18 text

Related Documents: Reference an entity by its identi er

Slide 19

Slide 19 text

A document representing the “one” entity Separate documents for each “many” entity Each “many” entity references its related “one” entity by the “one” entity’s document identi er Makes for smaller documents Reduces the probability of document update con icts Related Documents

Slide 20

Slide 20 text

Example: Publisher { "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media" }

Slide 21

Slide 21 text

Example: Related Book { "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly" }

Slide 22

Slide 22 text

Map Function function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); } }

Slide 23

Slide 23 text

Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1] "9780596155896" "CouchDB: The Definitive Guide" ["oreilly",1] "9780596529260" "RESTful Web Services" ["oreilly",1] "9780596805791" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "9781565925809" "DocBook: The Definitive Guide"

Slide 24

Slide 24 text

When retrieving the entity on the “right” side of the relationship, one cannot include any data from the entity on the “left” side of the relationship without the use of an additional query Only works for one to many relationships Limitations

Slide 25

Slide 25 text

Many to Many Relationships

Slide 26

Slide 26 text

List of Keys: Reference entities by their identi ers

Slide 27

Slide 27 text

A document representing each “many” entity on the “left” side of the relationship Separate documents for each “many” entity on the “right” side of the relationship Each “many” entity on the “right” side of the relationship maintains a list of document identi ers for its related “many” entities on the “left” side of the relationship List of Keys

Slide 28

Slide 28 text

Books and Related Authors

Slide 29

Slide 29 text

Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide" }

Slide 30

Slide 30 text

Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work" }

Slide 31

Slide 31 text

Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide" }

Slide 32

Slide 32 text

Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ] }

Slide 33

Slide 33 text

Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ] }

Slide 34

Slide 34 text

Map Function function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } } }

Slide 35

Slide 35 text

Result Set key id value ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide" ["9780596805029",1] "walsh" "Norman Walsh" ["9781565920514",0] "9781565920514" "Making TeX Work" ["9781565920514",1] "walsh" "Norman Walsh" ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide" ["9781565925809",1] "muellner" "Leonard Muellner" ["9781565925809",1] "walsh" "Norman Walsh"

Slide 36

Slide 36 text

Authors and Related Books

Slide 37

Slide 37 text

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } } } Map Function

Slide 38

Slide 38 text

Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1] "muellner" {"_id":"9781565925809"} ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "walsh" {"_id":"9780596805029"} ["walsh",1] "walsh" {"_id":"9781565920514"} ["walsh",1] "walsh" {"_id":"9781565925809"}

Slide 39

Slide 39 text

Including Docs include_docs=true key id value doc (truncated) ["muellner",0] "muellner" … {"name":"Leonard Muellner"} ["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"} ["walsh",0] "walsh" … {"name":"Norman Walsh"} ["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"} ["walsh",1] "walsh" … {"title":"Making TeX Work"} ["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}

Slide 40

Slide 40 text

Or, we can reverse the references…

Slide 41

Slide 41 text

Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner" }

Slide 42

Slide 42 text

Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh" }

Slide 43

Slide 43 text

Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ] }

Slide 44

Slide 44 text

Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ] }

Slide 45

Slide 45 text

Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ] }

Slide 46

Slide 46 text

Map Function function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } } }

Slide 47

Slide 47 text

Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1] "9781565925809" "DocBook: The Definitive Guide" ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide" ["walsh",1] "9781565920514" "Making TeX Work" ["walsh",1] "9781565925809" "DocBook: The Definitive Guide"

Slide 48

Slide 48 text

Queries from the “right” side of the relationship cannot include any data from entities on the “left” side of the relationship (without the use of include_docs) A document representing an entity with lots of relationships could become quite large Limitations

Slide 49

Slide 49 text

Relationship Documents: Create a document to represent each individual relationship

Slide 50

Slide 50 text

A document representing each “many” entity on the “left” side of the relationship Separate documents for each “many” entity on the “right” side of the relationship Neither the “left” nor “right” side of the relationship contain any direct references to each other For each distinct relationship, a separate document includes the document identi ers for both the “left” and “right” sides of the relationship Relationship Documents

Slide 51

Slide 51 text

Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide" }

Slide 52

Slide 52 text

Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work" }

Slide 53

Slide 53 text

Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide" }

Slide 54

Slide 54 text

Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner" }

Slide 55

Slide 55 text

Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh" }

Slide 56

Slide 56 text

Example: Relationship Document { "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh" }

Slide 57

Slide 57 text

Example: Relationship Document { "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh" }

Slide 58

Slide 58 text

Example: Relationship Document { "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner" }

Slide 59

Slide 59 text

Example: Relationship Document { "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh" }

Slide 60

Slide 60 text

Books and Related Authors

Slide 61

Slide 61 text

function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); } } Map Function

Slide 62

Slide 62 text

Result Set key id value ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide" ["9780596805029",1] "44005f2c" {"_id":"walsh"} ["9781565920514",0] "9781565920514" "Making TeX Work" ["9781565920514",1] "44005f72" {"_id":"walsh"} ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide" ["9781565925809",1] "44006720" {"_id":"muellner"} ["9781565925809",1] "44006b0d" {"_id":"walsh"}

Slide 63

Slide 63 text

Including Docs include_docs=true key id value doc (truncated) ["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"} ["9780596805029",1] … … {"name":"Norman Walsh"} ["9781565920514",0] … … {"title":"Making TeX Work"} ["9781565920514",1] … … {"author","name":"Norman Walsh"} ["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"} ["9781565925809",1] … … {"name":"Leonard Muellner"} ["9781565925809",1] … … {"name":"Norman Walsh"}

Slide 64

Slide 64 text

Authors and Related Books

Slide 65

Slide 65 text

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); } } Map Function

Slide 66

Slide 66 text

Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1] "44006720" {"_id":"9781565925809"} ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "44005f2c" {"_id":"9780596805029"} ["walsh",1] "44005f72" {"_id":"9781565920514"} ["walsh",1] "44006b0d" {"_id":"9781565925809"}

Slide 67

Slide 67 text

Including Docs include_docs=true key id value doc (truncated) ["muellner",0] … … {"name":"Leonard Muellner"} ["muellner",1] … … {"title":"DocBook: The Definitive Guide"} ["walsh",0] … … {"name":"Norman Walsh"} ["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"} ["walsh",1] … … {"title":"Making TeX Work"} ["walsh",1] … … {"title":"DocBook: The Definitive Guide"}

Slide 68

Slide 68 text

Queries can only contain data from the “left” or “right” side of the relationship (without the use of include_docs) Maintaining relationship documents may require more work Limitations

Slide 69

Slide 69 text

Doctrine’s Object-Document Mapper (ODM)

Slide 70

Slide 70 text

Doctrine CouchDB[1] 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/

Slide 71

Slide 71 text

Includes a CouchDB client library and ODM Maps documents using Doctrine’s persistence semantics Maps CouchDB views to PHP objects Document con ict resolution support Includes a write-behind feature for increased performance Features

Slide 72

Slide 72 text

De ning an Entity[1] /** @Document */ class BlogPost { /** @Id */ private $id; /** @Field(type="string") */ private $headline; /** @Field(type="string") */ private $text; /** @Field(type="datetime") */ private $publishDate; // getter/setter here } 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture

Slide 73

Slide 73 text

Persisting an Entity[1] $blogPost = new BlogPost(); $blogPost->setHeadline("Hello World!"); $blogPost->setText("This is a blog post going to be saved into CouchDB"); $blogPost->setPublishDate(new \DateTime("now")); $dm->persist($blogPost); $dm->flush(); 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture

Slide 74

Slide 74 text

Querying an Entity[1] // $dm is an instance of Doctrine\ODM\CouchDB \DocumentManager $blogPost = $dm->find("MyApp\Document\BlogPost", $theUUID); 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying

Slide 75

Slide 75 text

Doctrine MongoDB ODM [1] 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/

Slide 76

Slide 76 text

Features Maps documents using Doctrine’s persistence semantics Map embedded documents Map referenced documents Uses batch inserts Performs atomic updates

Slide 77

Slide 77 text

De ning Entities[1] /** @MappedSuperclass */ abstract class BaseEmployee { /** @Id */ private $id; /** @EmbedOne(targetDocument="Address") */ private $address; // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 78

Slide 78 text

De ning Entities[1] /** @Document */ class Employee extends BaseEmployee { /** @ReferenceOne(targetDocument="Documents \Manager") */ private $manager; // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 79

Slide 79 text

De ning Entities[1] /** @Document */ class Manager extends BaseEmployee { /** @ReferenceMany (targetDocument="Documents\Project") */ private $projects = array(); // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 80

Slide 80 text

De ning Entities[1] /** @EmbeddedDocument */ class Address { /** @String */ private $address; /** @String */ private $city; /** @String */ private $state; /** @String */ private $zipcode; // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 81

Slide 81 text

De ning Entities[1] /** @Document */ class Project { /** @Id */ private $id; /** @String */ private $name; public function __construct($name) { $this->name = $name; } // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 82

Slide 82 text

Persisting Entities[1] $employee = new Employee(); $employee->setName('Employee'); $employee->setSalary(50000.00); $employee->setStarted(new \DateTime()); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 83

Slide 83 text

Persisting Entities[1] $address = new Address(); $address->setAddress('555 Doctrine Rd.'); $address->setCity('Nashville'); $address->setState('TN'); $address->setZipcode('37209'); $employee->setAddress($address); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 84

Slide 84 text

Persisting Entities[1] $project = new Project('New Project'); $manager = new Manager(); $manager->setName('Manager'); $manager->setSalary(100000.00); $manager->setStarted(new \DateTime()); $manager->addProject($project); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 85

Slide 85 text

Persisting Entities[1] // $dm is an instance of Doctrine\ODM\MongoDB \DocumentManager $dm->persist($employee); $dm->persist($address); $dm->persist($project); $dm->persist($manager); $dm->flush(); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Slide 86

Slide 86 text

Querying an Entity // $dm is an instance of Doctrine\ODM\MongoDB \DocumentManager $manager = $dm->find("Documents\Manager", $theID);

Slide 87

Slide 87 text

Final Thoughts

Slide 88

Slide 88 text

Document databases have no tables (and therefore no columns) Indexes (views) are queried directly, instead of being used to optimize more generalized queries Result set columns can contain a mix of logical data types No built-in concept of relationships between documents Related entities can be embedded in a document, referenced from a document, or both Document Databases Compared to Relational Databases

Slide 89

Slide 89 text

Caveats No referential integrity No atomic transactions across document boundaries Some patterns may involve denormalized (i.e. redundant) data Data inconsistencies are inevitable (i.e. eventual consistency) Consider the implications of replication—what may seem consistent with one database may not be consistent across nodes (e.g. referencing entities that don’t yet exist on the node)

Slide 90

Slide 90 text

Additional Techniques Use the startkey and endkey parameters to retrieve one entity and its related entities: startkey=["9781565925809"]&endkey=["9781565925809",{}] De ne a reduce function and use grouping levels Use UUIDs rather than natural keys for better performance Use the bulk document API when writing Relationship Documents When using the List of Keys or Relationship Documents patterns, denormalize data so that you can have data from the “right” and “left” side of the relationship within your query results

Slide 91

Slide 91 text

Cheat Sheet One to Many Many to Many <= N* Relations > N* Relations Embedded Entities Related Documents List of Keys Relationship Documents ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ * where N is a large number for your system

Slide 92

Slide 92 text

http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/

Slide 93

Slide 93 text

Thank You @BradleyHolt http://bradley-holt.com https://joind.in/7040 Copyright © 2011-2012 Bradley Holt. All rights reserved.