Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Entity Relationships in a Document Database at ZendCon 2012

Bradley Holt
October 25, 2012

Entity Relationships in a Document Database at ZendCon 2012

Unlike relational databases, document databases like CouchDB and MongoDB do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents. This talk will explore how these entity relationship patterns equate to how entities are joined in a relational database. We'll take a look at the relevant differences between document databases and relational databases. For example, document databases do not have tables, each document can have its own schema, there is no built-in concept of relationships between documents, views/indexes are queried directly instead of being used to optimize more generalized queries, a column within a result set can contain a mix of logical data types, and there is typically no support for transactions across document boundaries.

Bradley Holt

October 25, 2012
Tweet

More Decks by Bradley Holt

Other Decks in Programming

Transcript

  1. Entity Relationships in
    a Document Database
    MapReduce Views for SQL Users

    View Slide

  2. You’re using a relational database, but have been relying
    heavily on denormalization to optimize read performance
    You would like to give up consistency in exchange for a
    high level of concurrency
    Your data model is a “ t” for documents (e.g. a CMS)
    When to Choose a
    Document Database

    View Slide

  3. When Not to Choose a
    Document Database
    Your data ts better in a relational model—SQL is a powerful
    and mature language for working with relational data sets
    Consistency is critical to your application
    You haven’t bothered exploring scalability options for
    your current database

    View Slide

  4. Incremental Map/Reduce
    "How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009. http://howfuckedismydatabase.com/nosql/ (24 October 2012).

    View Slide

  5. Entity
    Relationship
    Model

    View Slide

  6. Join vs. Collation

    View Slide

  7. SELECT
    `publisher`.`id`,
    `publisher`.`name`,
    `book`.`title`
    FROM `publisher`
    FULL OUTER JOIN `book`
    ON `publisher`.`id` = `book`.`publisher_id`
    ORDER BY
    `publisher`.`id`,
    `book`.`title`;
    SQL Query Joining
    Publishers and Books

    View Slide

  8. Joined Result Set
    publisher.id publisher.name book.title
    oreilly O'Reilly Media
    Building iPhone Apps with
    HTML, CSS, and JavaScript
    oreilly O'Reilly Media
    CouchDB: The Definitive
    Guide
    oreilly O'Reilly Media
    DocBook: The Definitive
    Guide
    oreilly O'Reilly Media RESTful Web Services
    Publisher (“left”) Book “right”

    View Slide

  9. Collated Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "oreilly"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "oreilly"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "oreilly"
    "DocBook: The Definitive
    Guide"
    ["oreilly",1] "oreilly" "RESTful Web Services"
    Publisher
    Books

    View Slide

  10. View Result Sets
    Made up of columns and rows
    Every row has the same three columns:
    • key
    • id
    • value
    Columns can contain a mixture of logical data types

    View Slide

  11. One to Many Relationships

    View Slide

  12. Embedded Entities:
    Nest related entities within a document

    View Slide

  13. A single document represents the “one” entity
    Nested entities (JSON Array) represents the “many” entities
    Simplest way to create a one to many relationship
    Embedded Entities

    View Slide

  14. Example: Publisher
    with Nested Books
    {
    "_id":"oreilly",
    "collection":"publisher",
    "name":"O'Reilly Media",
    "books":[
    { "title":"CouchDB: The Definitive Guide" },
    { "title":"RESTful Web Services" },
    { "title":"DocBook: The Definitive Guide" },
    { "title":"Building iPhone Apps with HTML, CSS,
    and JavaScript" }
    ]
    }

    View Slide

  15. function(doc) {
    if ("publisher" == doc.collection) {
    emit([doc._id, 0], doc.name);
    for (var i in doc.books) {
    emit([doc._id, 1], doc.books[i].title);
    }
    }
    }
    Map Function

    View Slide

  16. Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "oreilly"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "oreilly"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "oreilly"
    "DocBook: The Definitive
    Guide"
    ["oreilly",1] "oreilly" "RESTful Web Services"

    View Slide

  17. Only works if there aren’t a large number of related entities:
    • Too many nested entities can result in very large documents
    • Slow to transfer between client and server
    • Unwieldy to modify
    • Time-consuming to index
    Limitations

    View Slide

  18. Related Documents:
    Reference an entity by its identi er

    View Slide

  19. A document representing the “one” entity
    Separate documents for each “many” entity
    Each “many” entity references its related
    “one” entity by the “one” entity’s document identi er
    Makes for smaller documents
    Reduces the probability of document update con icts
    Related Documents

    View Slide

  20. Example: Publisher
    {
    "_id":"oreilly",
    "collection":"publisher",
    "name":"O'Reilly Media"
    }

    View Slide

  21. Example: Related Book
    {
    "_id":"9780596155896",
    "collection":"book",
    "title":"CouchDB: The Definitive Guide",
    "publisher":"oreilly"
    }

    View Slide

  22. Map Function
    function(doc) {
    if ("publisher" == doc.collection) {
    emit([doc._id, 0], doc.name);
    }
    if ("book" == doc.collection) {
    emit([doc.publisher, 1], doc.title);
    }
    }

    View Slide

  23. Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "9780596155896"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "9780596529260" "RESTful Web Services"
    ["oreilly",1] "9780596805791"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "9781565925809"
    "DocBook: The Definitive
    Guide"

    View Slide

  24. When retrieving the entity on the “right” side of the relationship,
    one cannot include any data from the entity on the “left” side of
    the relationship without the use of an additional query
    Only works for one to many relationships
    Limitations

    View Slide

  25. Many to Many Relationships

    View Slide

  26. List of Keys:
    Reference entities by their identi ers

    View Slide

  27. A document representing each “many” entity on the “left” side
    of the relationship
    Separate documents for each “many” entity on the “right” side
    of the relationship
    Each “many” entity on the “right” side of the relationship
    maintains a list of document identi ers for its related “many”
    entities on the “left” side of the relationship
    List of Keys

    View Slide

  28. Books and Related Authors

    View Slide

  29. Example: Book
    {
    "_id":"9780596805029",
    "collection":"book",
    "title":"DocBook 5: The Definitive Guide"
    }

    View Slide

  30. Example: Book
    {
    "_id":"9781565920514",
    "collection":"book",
    "title":"Making TeX Work"
    }

    View Slide

  31. Example: Book
    {
    "_id":"9781565925809",
    "collection":"book",
    "title":"DocBook: The Definitive Guide"
    }

    View Slide

  32. Example: Author
    {
    "_id":"muellner",
    "collection":"author",
    "name":"Leonard Muellner",
    "books":[
    "9781565925809"
    ]
    }

    View Slide

  33. Example: Author
    {
    "_id":"walsh",
    "collection":"author",
    "name":"Norman Walsh",
    "books":[
    "9780596805029",
    "9781565925809",
    "9781565920514"
    ]
    }

    View Slide

  34. Map Function
    function(doc) {
    if ("book" == doc.collection) {
    emit([doc._id, 0], doc.title);
    }
    if ("author" == doc.collection) {
    for (var i in doc.books) {
    emit([doc.books[i], 1], doc.name);
    }
    }
    }

    View Slide

  35. Result Set
    key id value
    ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
    ["9780596805029",1] "walsh" "Norman Walsh"
    ["9781565920514",0] "9781565920514" "Making TeX Work"
    ["9781565920514",1] "walsh" "Norman Walsh"
    ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
    ["9781565925809",1] "muellner" "Leonard Muellner"
    ["9781565925809",1] "walsh" "Norman Walsh"

    View Slide

  36. Authors and Related Books

    View Slide

  37. function(doc) {
    if ("author" == doc.collection) {
    emit([doc._id, 0], doc.name);
    for (var i in doc.books) {
    emit([doc._id, 1], {"_id":doc.books[i]});
    }
    }
    }
    Map Function

    View Slide

  38. Result Set
    key id value
    ["muellner",0] "muellner" "Leonard Muellner"
    ["muellner",1] "muellner" {"_id":"9781565925809"}
    ["walsh",0] "walsh" "Norman Walsh"
    ["walsh",1] "walsh" {"_id":"9780596805029"}
    ["walsh",1] "walsh" {"_id":"9781565920514"}
    ["walsh",1] "walsh" {"_id":"9781565925809"}

    View Slide

  39. Including Docs
    include_docs=true
    key id value doc (truncated)
    ["muellner",0] "muellner" … {"name":"Leonard Muellner"}
    ["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}
    ["walsh",0] "walsh" … {"name":"Norman Walsh"}
    ["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}
    ["walsh",1] "walsh" … {"title":"Making TeX Work"}
    ["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}

    View Slide

  40. Or, we can reverse the references…

    View Slide

  41. Example: Author
    {
    "_id":"muellner",
    "collection":"author",
    "name":"Leonard Muellner"
    }

    View Slide

  42. Example: Author
    {
    "_id":"walsh",
    "collection":"author",
    "name":"Norman Walsh"
    }

    View Slide

  43. Example: Book
    {
    "_id":"9780596805029",
    "collection":"book",
    "title":"DocBook 5: The Definitive Guide",
    "authors":[
    "walsh"
    ]
    }

    View Slide

  44. Example: Book
    {
    "_id":"9781565920514",
    "collection":"book",
    "title":"Making TeX Work",
    "authors":[
    "walsh"
    ]
    }

    View Slide

  45. Example: Book
    {
    "_id":"9781565925809",
    "collection":"book",
    "title":"DocBook: The Definitive Guide",
    "authors":[
    "muellner",
    "walsh"
    ]
    }

    View Slide

  46. Map Function
    function(doc) {
    if ("author" == doc.collection) {
    emit([doc._id, 0], doc.name);
    }
    if ("book" == doc.collection) {
    for (var i in doc.authors) {
    emit([doc.authors[i], 1], doc.title);
    }
    }
    }

    View Slide

  47. Result Set
    key id value
    ["muellner",0] "muellner" "Leonard Muellner"
    ["muellner",1] "9781565925809" "DocBook: The Definitive Guide"
    ["walsh",0] "walsh" "Norman Walsh"
    ["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"
    ["walsh",1] "9781565920514" "Making TeX Work"
    ["walsh",1] "9781565925809" "DocBook: The Definitive Guide"

    View Slide

  48. Queries from the “right” side of the relationship cannot include
    any data from entities on the “left” side of the relationship
    (without the use of include_docs)
    A document representing an entity with lots of relationships
    could become quite large
    Limitations

    View Slide

  49. Relationship Documents:
    Create a document to represent each
    individual relationship

    View Slide

  50. A document representing each “many” entity on the “left” side
    of the relationship
    Separate documents for each “many” entity on the “right” side
    of the relationship
    Neither the “left” nor “right” side of the relationship contain any
    direct references to each other
    For each distinct relationship, a separate document includes the
    document identi ers for both the “left” and “right” sides of the
    relationship
    Relationship Documents

    View Slide

  51. Example: Book
    {
    "_id":"9780596805029",
    "collection":"book",
    "title":"DocBook 5: The Definitive Guide"
    }

    View Slide

  52. Example: Book
    {
    "_id":"9781565920514",
    "collection":"book",
    "title":"Making TeX Work"
    }

    View Slide

  53. Example: Book
    {
    "_id":"9781565925809",
    "collection":"book",
    "title":"DocBook: The Definitive Guide"
    }

    View Slide

  54. Example: Author
    {
    "_id":"muellner",
    "collection":"author",
    "name":"Leonard Muellner"
    }

    View Slide

  55. Example: Author
    {
    "_id":"walsh",
    "collection":"author",
    "name":"Norman Walsh"
    }

    View Slide

  56. Example:
    Relationship Document
    {
    "_id":"44005f2c",
    "collection":"book-author",
    "book":"9780596805029",
    "author":"walsh"
    }

    View Slide

  57. Example:
    Relationship Document
    {
    "_id":"44005f72",
    "collection":"book-author",
    "book":"9781565920514",
    "author":"walsh"
    }

    View Slide

  58. Example:
    Relationship Document
    {
    "_id":"44006720",
    "collection":"book-author",
    "book":"9781565925809",
    "author":"muellner"
    }

    View Slide

  59. Example:
    Relationship Document
    {
    "_id":"44006b0d",
    "collection":"book-author",
    "book":"9781565925809",
    "author":"walsh"
    }

    View Slide

  60. Books and Related Authors

    View Slide

  61. function(doc) {
    if ("book" == doc.collection) {
    emit([doc._id, 0], doc.title);
    }
    if ("book-author" == doc.collection) {
    emit([doc.book, 1], {"_id":doc.author});
    }
    }
    Map Function

    View Slide

  62. Result Set
    key id value
    ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
    ["9780596805029",1] "44005f2c" {"_id":"walsh"}
    ["9781565920514",0] "9781565920514" "Making TeX Work"
    ["9781565920514",1] "44005f72" {"_id":"walsh"}
    ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
    ["9781565925809",1] "44006720" {"_id":"muellner"}
    ["9781565925809",1] "44006b0d" {"_id":"walsh"}

    View Slide

  63. Including Docs
    include_docs=true
    key id value doc (truncated)
    ["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}
    ["9780596805029",1] … … {"name":"Norman Walsh"}
    ["9781565920514",0] … … {"title":"Making TeX Work"}
    ["9781565920514",1] … … {"author","name":"Norman Walsh"}
    ["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}
    ["9781565925809",1] … … {"name":"Leonard Muellner"}
    ["9781565925809",1] … … {"name":"Norman Walsh"}

    View Slide

  64. Authors and Related Books

    View Slide

  65. function(doc) {
    if ("author" == doc.collection) {
    emit([doc._id, 0], doc.name);
    }
    if ("book-author" == doc.collection) {
    emit([doc.author, 1], {"_id":doc.book});
    }
    }
    Map Function

    View Slide

  66. Result Set
    key id value
    ["muellner",0] "muellner" "Leonard Muellner"
    ["muellner",1] "44006720" {"_id":"9781565925809"}
    ["walsh",0] "walsh" "Norman Walsh"
    ["walsh",1] "44005f2c" {"_id":"9780596805029"}
    ["walsh",1] "44005f72" {"_id":"9781565920514"}
    ["walsh",1] "44006b0d" {"_id":"9781565925809"}

    View Slide

  67. Including Docs
    include_docs=true
    key id value doc (truncated)
    ["muellner",0] … … {"name":"Leonard Muellner"}
    ["muellner",1] … … {"title":"DocBook: The Definitive Guide"}
    ["walsh",0] … … {"name":"Norman Walsh"}
    ["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}
    ["walsh",1] … … {"title":"Making TeX Work"}
    ["walsh",1] … … {"title":"DocBook: The Definitive Guide"}

    View Slide

  68. Queries can only contain data from the “left” or “right” side of the
    relationship (without the use of include_docs)
    Maintaining relationship documents may require more work
    Limitations

    View Slide

  69. Doctrine’s Object-Document
    Mapper (ODM)

    View Slide

  70. Doctrine CouchDB[1]
    1. http://docs.doctrine-project.org/projects/doctrine-couchdb/

    View Slide

  71. Includes a CouchDB client library and ODM
    Maps documents using Doctrine’s persistence semantics
    Maps CouchDB views to PHP objects
    Document con ict resolution support
    Includes a write-behind feature for increased performance
    Features

    View Slide

  72. De ning an Entity[1]
    /** @Document */
    class BlogPost
    {
    /** @Id */
    private $id;
    /** @Field(type="string") */
    private $headline;
    /** @Field(type="string") */
    private $text;
    /** @Field(type="datetime") */
    private $publishDate;
    // getter/setter here
    }
    1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture

    View Slide

  73. Persisting an Entity[1]
    $blogPost = new BlogPost();
    $blogPost->setHeadline("Hello World!");
    $blogPost->setText("This is a blog post going to
    be saved into CouchDB");
    $blogPost->setPublishDate(new \DateTime("now"));
    $dm->persist($blogPost);
    $dm->flush();
    1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture

    View Slide

  74. Querying an Entity[1]
    // $dm is an instance of Doctrine\ODM\CouchDB
    \DocumentManager
    $blogPost = $dm->find("MyApp\Document\BlogPost",
    $theUUID);
    1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying

    View Slide

  75. Doctrine MongoDB ODM [1]
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/

    View Slide

  76. Features
    Maps documents using Doctrine’s persistence semantics
    Map embedded documents
    Map referenced documents
    Uses batch inserts
    Performs atomic updates

    View Slide

  77. De ning Entities[1]
    /** @MappedSuperclass */
    abstract class BaseEmployee
    {
    /** @Id */
    private $id;
    /** @EmbedOne(targetDocument="Address") */
    private $address;
    // ...
    }
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  78. De ning Entities[1]
    /** @Document */
    class Employee extends BaseEmployee
    {
    /** @ReferenceOne(targetDocument="Documents
    \Manager") */
    private $manager;
    // ...
    }
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  79. De ning Entities[1]
    /** @Document */
    class Manager extends BaseEmployee
    {
    /** @ReferenceMany
    (targetDocument="Documents\Project") */
    private $projects = array();
    // ...
    }
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  80. De ning Entities[1]
    /** @EmbeddedDocument */
    class Address
    {
    /** @String */
    private $address;
    /** @String */
    private $city;
    /** @String */
    private $state;
    /** @String */
    private $zipcode;
    // ...
    }
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  81. De ning Entities[1]
    /** @Document */
    class Project
    {
    /** @Id */
    private $id;
    /** @String */
    private $name;
    public function __construct($name)
    {
    $this->name = $name;
    }
    // ...
    }
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  82. Persisting Entities[1]
    $employee = new Employee();
    $employee->setName('Employee');
    $employee->setSalary(50000.00);
    $employee->setStarted(new \DateTime());
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  83. Persisting Entities[1]
    $address = new Address();
    $address->setAddress('555 Doctrine Rd.');
    $address->setCity('Nashville');
    $address->setState('TN');
    $address->setZipcode('37209');
    $employee->setAddress($address);
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  84. Persisting Entities[1]
    $project = new Project('New Project');
    $manager = new Manager();
    $manager->setName('Manager');
    $manager->setSalary(100000.00);
    $manager->setStarted(new \DateTime());
    $manager->addProject($project);
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  85. Persisting Entities[1]
    // $dm is an instance of Doctrine\ODM\MongoDB
    \DocumentManager
    $dm->persist($employee);
    $dm->persist($address);
    $dm->persist($project);
    $dm->persist($manager);
    $dm->flush();
    1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

    View Slide

  86. Querying an Entity
    // $dm is an instance of Doctrine\ODM\MongoDB
    \DocumentManager
    $manager = $dm->find("Documents\Manager",
    $theID);

    View Slide

  87. Final Thoughts

    View Slide

  88. Document databases have no tables (and therefore no columns)
    Indexes (views) are queried directly, instead of being used to
    optimize more generalized queries
    Result set columns can contain a mix of logical data types
    No built-in concept of relationships between documents
    Related entities can be embedded in a document, referenced from
    a document, or both
    Document Databases Compared
    to Relational Databases

    View Slide

  89. Caveats
    No referential integrity
    No atomic transactions across document boundaries
    Some patterns may involve denormalized (i.e. redundant) data
    Data inconsistencies are inevitable (i.e. eventual consistency)
    Consider the implications of replication—what may seem
    consistent with one database may not be consistent across nodes
    (e.g. referencing entities that don’t yet exist on the node)

    View Slide

  90. Additional Techniques
    Use the startkey and endkey parameters to retrieve one entity and
    its related entities:
    startkey=["9781565925809"]&endkey=["9781565925809",{}]
    De ne a reduce function and use grouping levels
    Use UUIDs rather than natural keys for better performance
    Use the bulk document API when writing Relationship Documents
    When using the List of Keys or Relationship Documents patterns,
    denormalize data so that you can have data from the “right” and
    “left” side of the relationship within your query results

    View Slide

  91. Cheat Sheet
    One to Many
    Many to Many
    <= N* Relations
    > N* Relations
    Embedded
    Entities
    Related
    Documents
    List of Keys
    Relationship
    Documents
    ✓ ✓
    ✓ ✓
    ✓ ✓
    ✓ ✓
    * where N is a large number for your system

    View Slide

  92. http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/

    View Slide

  93. Thank You
    @BradleyHolt
    http://bradley-holt.com
    https://joind.in/7040
    Copyright © 2011-2012 Bradley Holt. All rights reserved.

    View Slide