Entity Relationships in a Document Database at ZendCon 2012

8896271ee9d6f46d4b1783c9566cb645?s=47 Bradley Holt
October 25, 2012

Entity Relationships in a Document Database at ZendCon 2012

Unlike relational databases, document databases like CouchDB and MongoDB do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents. This talk will explore how these entity relationship patterns equate to how entities are joined in a relational database. We'll take a look at the relevant differences between document databases and relational databases. For example, document databases do not have tables, each document can have its own schema, there is no built-in concept of relationships between documents, views/indexes are queried directly instead of being used to optimize more generalized queries, a column within a result set can contain a mix of logical data types, and there is typically no support for transactions across document boundaries.

8896271ee9d6f46d4b1783c9566cb645?s=128

Bradley Holt

October 25, 2012
Tweet

Transcript

  1. Entity Relationships in a Document Database MapReduce Views for SQL

    Users
  2. You’re using a relational database, but have been relying heavily

    on denormalization to optimize read performance You would like to give up consistency in exchange for a high level of concurrency Your data model is a “ t” for documents (e.g. a CMS) When to Choose a Document Database
  3. When Not to Choose a Document Database Your data ts

    better in a relational model—SQL is a powerful and mature language for working with relational data sets Consistency is critical to your application You haven’t bothered exploring scalability options for your current database
  4. Incremental Map/Reduce "How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009.

    http://howfuckedismydatabase.com/nosql/ (24 October 2012).
  5. Entity Relationship Model

  6. Join vs. Collation

  7. SELECT `publisher`.`id`, `publisher`.`name`, `book`.`title` FROM `publisher` FULL OUTER JOIN `book`

    ON `publisher`.`id` = `book`.`publisher_id` ORDER BY `publisher`.`id`, `book`.`title`; SQL Query Joining Publishers and Books
  8. Joined Result Set publisher.id publisher.name book.title oreilly O'Reilly Media Building

    iPhone Apps with HTML, CSS, and JavaScript oreilly O'Reilly Media CouchDB: The Definitive Guide oreilly O'Reilly Media DocBook: The Definitive Guide oreilly O'Reilly Media RESTful Web Services Publisher (“left”) Book “right”
  9. Collated Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media"

    ["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services" Publisher Books
  10. View Result Sets Made up of columns and rows Every

    row has the same three columns: • key • id • value Columns can contain a mixture of logical data types
  11. One to Many Relationships

  12. Embedded Entities: Nest related entities within a document

  13. A single document represents the “one” entity Nested entities (JSON

    Array) represents the “many” entities Simplest way to create a one to many relationship Embedded Entities
  14. Example: Publisher with Nested Books { "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media",

    "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS, and JavaScript" } ] }
  15. function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name);

    for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } } } Map Function
  16. Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1]

    "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "oreilly" "CouchDB: The Definitive Guide" ["oreilly",1] "oreilly" "DocBook: The Definitive Guide" ["oreilly",1] "oreilly" "RESTful Web Services"
  17. Only works if there aren’t a large number of related

    entities: • Too many nested entities can result in very large documents • Slow to transfer between client and server • Unwieldy to modify • Time-consuming to index Limitations
  18. Related Documents: Reference an entity by its identi er

  19. A document representing the “one” entity Separate documents for each

    “many” entity Each “many” entity references its related “one” entity by the “one” entity’s document identi er Makes for smaller documents Reduces the probability of document update con icts Related Documents
  20. Example: Publisher { "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media" }

  21. Example: Related Book { "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide",

    "publisher":"oreilly" }
  22. Map Function function(doc) { if ("publisher" == doc.collection) { emit([doc._id,

    0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); } }
  23. Result Set key id value ["oreilly",0] "oreilly" "O'Reilly Media" ["oreilly",1]

    "9780596155896" "CouchDB: The Definitive Guide" ["oreilly",1] "9780596529260" "RESTful Web Services" ["oreilly",1] "9780596805791" "Building iPhone Apps with HTML, CSS, and JavaScript" ["oreilly",1] "9781565925809" "DocBook: The Definitive Guide"
  24. When retrieving the entity on the “right” side of the

    relationship, one cannot include any data from the entity on the “left” side of the relationship without the use of an additional query Only works for one to many relationships Limitations
  25. Many to Many Relationships

  26. List of Keys: Reference entities by their identi ers

  27. A document representing each “many” entity on the “left” side

    of the relationship Separate documents for each “many” entity on the “right” side of the relationship Each “many” entity on the “right” side of the relationship maintains a list of document identi ers for its related “many” entities on the “left” side of the relationship List of Keys
  28. Books and Related Authors

  29. Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"

    }
  30. Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work" }

  31. Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide" }

  32. Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ]

    }
  33. Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809",

    "9781565920514" ] }
  34. Map Function function(doc) { if ("book" == doc.collection) { emit([doc._id,

    0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } } }
  35. Result Set key id value ["9780596805029",0] "9780596805029" "DocBook 5: The

    Definitive Guide" ["9780596805029",1] "walsh" "Norman Walsh" ["9781565920514",0] "9781565920514" "Making TeX Work" ["9781565920514",1] "walsh" "Norman Walsh" ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide" ["9781565925809",1] "muellner" "Leonard Muellner" ["9781565925809",1] "walsh" "Norman Walsh"
  36. Authors and Related Books

  37. function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name);

    for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } } } Map Function
  38. Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1]

    "muellner" {"_id":"9781565925809"} ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "walsh" {"_id":"9780596805029"} ["walsh",1] "walsh" {"_id":"9781565920514"} ["walsh",1] "walsh" {"_id":"9781565925809"}
  39. Including Docs include_docs=true key id value doc (truncated) ["muellner",0] "muellner"

    … {"name":"Leonard Muellner"} ["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"} ["walsh",0] "walsh" … {"name":"Norman Walsh"} ["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"} ["walsh",1] "walsh" … {"title":"Making TeX Work"} ["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}
  40. Or, we can reverse the references…

  41. Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner" }

  42. Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh" }

  43. Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide",

    "authors":[ "walsh" ] }
  44. Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh"

    ] }
  45. Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[

    "muellner", "walsh" ] }
  46. Map Function function(doc) { if ("author" == doc.collection) { emit([doc._id,

    0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } } }
  47. Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1]

    "9781565925809" "DocBook: The Definitive Guide" ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide" ["walsh",1] "9781565920514" "Making TeX Work" ["walsh",1] "9781565925809" "DocBook: The Definitive Guide"
  48. Queries from the “right” side of the relationship cannot include

    any data from entities on the “left” side of the relationship (without the use of include_docs) A document representing an entity with lots of relationships could become quite large Limitations
  49. Relationship Documents: Create a document to represent each individual relationship

  50. A document representing each “many” entity on the “left” side

    of the relationship Separate documents for each “many” entity on the “right” side of the relationship Neither the “left” nor “right” side of the relationship contain any direct references to each other For each distinct relationship, a separate document includes the document identi ers for both the “left” and “right” sides of the relationship Relationship Documents
  51. Example: Book { "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"

    }
  52. Example: Book { "_id":"9781565920514", "collection":"book", "title":"Making TeX Work" }

  53. Example: Book { "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide" }

  54. Example: Author { "_id":"muellner", "collection":"author", "name":"Leonard Muellner" }

  55. Example: Author { "_id":"walsh", "collection":"author", "name":"Norman Walsh" }

  56. Example: Relationship Document { "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh" }

  57. Example: Relationship Document { "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh" }

  58. Example: Relationship Document { "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner" }

  59. Example: Relationship Document { "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh" }

  60. Books and Related Authors

  61. function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title);

    } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); } } Map Function
  62. Result Set key id value ["9780596805029",0] "9780596805029" "DocBook 5: The

    Definitive Guide" ["9780596805029",1] "44005f2c" {"_id":"walsh"} ["9781565920514",0] "9781565920514" "Making TeX Work" ["9781565920514",1] "44005f72" {"_id":"walsh"} ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide" ["9781565925809",1] "44006720" {"_id":"muellner"} ["9781565925809",1] "44006b0d" {"_id":"walsh"}
  63. Including Docs include_docs=true key id value doc (truncated) ["9780596805029",0] …

    … {"title":"DocBook 5: The Definitive Guide"} ["9780596805029",1] … … {"name":"Norman Walsh"} ["9781565920514",0] … … {"title":"Making TeX Work"} ["9781565920514",1] … … {"author","name":"Norman Walsh"} ["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"} ["9781565925809",1] … … {"name":"Leonard Muellner"} ["9781565925809",1] … … {"name":"Norman Walsh"}
  64. Authors and Related Books

  65. function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name);

    } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); } } Map Function
  66. Result Set key id value ["muellner",0] "muellner" "Leonard Muellner" ["muellner",1]

    "44006720" {"_id":"9781565925809"} ["walsh",0] "walsh" "Norman Walsh" ["walsh",1] "44005f2c" {"_id":"9780596805029"} ["walsh",1] "44005f72" {"_id":"9781565920514"} ["walsh",1] "44006b0d" {"_id":"9781565925809"}
  67. Including Docs include_docs=true key id value doc (truncated) ["muellner",0] …

    … {"name":"Leonard Muellner"} ["muellner",1] … … {"title":"DocBook: The Definitive Guide"} ["walsh",0] … … {"name":"Norman Walsh"} ["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"} ["walsh",1] … … {"title":"Making TeX Work"} ["walsh",1] … … {"title":"DocBook: The Definitive Guide"}
  68. Queries can only contain data from the “left” or “right”

    side of the relationship (without the use of include_docs) Maintaining relationship documents may require more work Limitations
  69. Doctrine’s Object-Document Mapper (ODM)

  70. Doctrine CouchDB[1] 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/

  71. Includes a CouchDB client library and ODM Maps documents using

    Doctrine’s persistence semantics Maps CouchDB views to PHP objects Document con ict resolution support Includes a write-behind feature for increased performance Features
  72. De ning an Entity[1] /** @Document */ class BlogPost {

    /** @Id */ private $id; /** @Field(type="string") */ private $headline; /** @Field(type="string") */ private $text; /** @Field(type="datetime") */ private $publishDate; // getter/setter here } 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
  73. Persisting an Entity[1] $blogPost = new BlogPost(); $blogPost->setHeadline("Hello World!"); $blogPost->setText("This

    is a blog post going to be saved into CouchDB"); $blogPost->setPublishDate(new \DateTime("now")); $dm->persist($blogPost); $dm->flush(); 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
  74. Querying an Entity[1] // $dm is an instance of Doctrine\ODM\CouchDB

    \DocumentManager $blogPost = $dm->find("MyApp\Document\BlogPost", $theUUID); 1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying
  75. Doctrine MongoDB ODM [1] 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/

  76. Features Maps documents using Doctrine’s persistence semantics Map embedded documents

    Map referenced documents Uses batch inserts Performs atomic updates
  77. De ning Entities[1] /** @MappedSuperclass */ abstract class BaseEmployee {

    /** @Id */ private $id; /** @EmbedOne(targetDocument="Address") */ private $address; // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  78. De ning Entities[1] /** @Document */ class Employee extends BaseEmployee

    { /** @ReferenceOne(targetDocument="Documents \Manager") */ private $manager; // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  79. De ning Entities[1] /** @Document */ class Manager extends BaseEmployee

    { /** @ReferenceMany (targetDocument="Documents\Project") */ private $projects = array(); // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  80. De ning Entities[1] /** @EmbeddedDocument */ class Address { /**

    @String */ private $address; /** @String */ private $city; /** @String */ private $state; /** @String */ private $zipcode; // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  81. De ning Entities[1] /** @Document */ class Project { /**

    @Id */ private $id; /** @String */ private $name; public function __construct($name) { $this->name = $name; } // ... } 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  82. Persisting Entities[1] $employee = new Employee(); $employee->setName('Employee'); $employee->setSalary(50000.00); $employee->setStarted(new \DateTime());

    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  83. Persisting Entities[1] $address = new Address(); $address->setAddress('555 Doctrine Rd.'); $address->setCity('Nashville');

    $address->setState('TN'); $address->setZipcode('37209'); $employee->setAddress($address); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  84. Persisting Entities[1] $project = new Project('New Project'); $manager = new

    Manager(); $manager->setName('Manager'); $manager->setSalary(100000.00); $manager->setStarted(new \DateTime()); $manager->addProject($project); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  85. Persisting Entities[1] // $dm is an instance of Doctrine\ODM\MongoDB \DocumentManager

    $dm->persist($employee); $dm->persist($address); $dm->persist($project); $dm->persist($manager); $dm->flush(); 1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
  86. Querying an Entity // $dm is an instance of Doctrine\ODM\MongoDB

    \DocumentManager $manager = $dm->find("Documents\Manager", $theID);
  87. Final Thoughts

  88. Document databases have no tables (and therefore no columns) Indexes

    (views) are queried directly, instead of being used to optimize more generalized queries Result set columns can contain a mix of logical data types No built-in concept of relationships between documents Related entities can be embedded in a document, referenced from a document, or both Document Databases Compared to Relational Databases
  89. Caveats No referential integrity No atomic transactions across document boundaries

    Some patterns may involve denormalized (i.e. redundant) data Data inconsistencies are inevitable (i.e. eventual consistency) Consider the implications of replication—what may seem consistent with one database may not be consistent across nodes (e.g. referencing entities that don’t yet exist on the node)
  90. Additional Techniques Use the startkey and endkey parameters to retrieve

    one entity and its related entities: startkey=["9781565925809"]&endkey=["9781565925809",{}] De ne a reduce function and use grouping levels Use UUIDs rather than natural keys for better performance Use the bulk document API when writing Relationship Documents When using the List of Keys or Relationship Documents patterns, denormalize data so that you can have data from the “right” and “left” side of the relationship within your query results
  91. Cheat Sheet One to Many Many to Many <= N*

    Relations > N* Relations Embedded Entities Related Documents List of Keys Relationship Documents ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ * where N is a large number for your system
  92. http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/

  93. Thank You @BradleyHolt http://bradley-holt.com https://joind.in/7040 Copyright © 2011-2012 Bradley Holt.

    All rights reserved.