Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Entity Relationships in a Document Database at CouchConf Boston

Entity Relationships in a Document Database at CouchConf Boston

Unlike relational databases, document databases like CouchDB and Couchbase do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents.

Bradley Holt

May 15, 2012
Tweet

More Decks by Bradley Holt

Other Decks in Technology

Transcript

  1. Entity Relationships in
    a Document Database
    MapReduce Views for SQL Users

    View Slide

  2. Entity:
    An object de ned by its identity
    and a thread of continuity[1]
    1. "Entity" Domain-driven Design Community .

    View Slide

  3. Entity
    Relationship
    Model

    View Slide

  4. Join vs. Collation

    View Slide

  5. SELECT
    `publisher`.`id`,
    `publisher`.`name`,
    `book`.`title`
    FROM `publisher`
    FULL OUTER JOIN `book`
    ON `publisher`.`id` = `book`.`publisher_id`
    ORDER BY
    `publisher`.`id`,
    `book`.`title`;
    SQL Query Joining
    Publishers and Books

    View Slide

  6. Joined Result Set
    publisher.id publisher.name book.title
    oreilly O'Reilly Media
    Building iPhone Apps with
    HTML, CSS, and JavaScript
    oreilly O'Reilly Media
    CouchDB: The Definitive
    Guide
    oreilly O'Reilly Media
    DocBook: The Definitive
    Guide
    oreilly O'Reilly Media RESTful Web Services

    View Slide

  7. Joined Result Set
    publisher.id publisher.name book.title
    oreilly O'Reilly Media
    Building iPhone Apps with
    HTML, CSS, and JavaScript
    oreilly O'Reilly Media
    CouchDB: The Definitive
    Guide
    oreilly O'Reilly Media
    DocBook: The Definitive
    Guide
    oreilly O'Reilly Media RESTful Web Services
    Publisher (“left”)

    View Slide

  8. Joined Result Set
    publisher.id publisher.name book.title
    oreilly O'Reilly Media
    Building iPhone Apps with
    HTML, CSS, and JavaScript
    oreilly O'Reilly Media
    CouchDB: The Definitive
    Guide
    oreilly O'Reilly Media
    DocBook: The Definitive
    Guide
    oreilly O'Reilly Media RESTful Web Services
    Publisher (“left”) Book “right”

    View Slide

  9. Collated Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "oreilly"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "oreilly"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "oreilly"
    "DocBook: The Definitive
    Guide"
    ["oreilly",1] "oreilly" "RESTful Web Services"

    View Slide

  10. Collated Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "oreilly"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "oreilly"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "oreilly"
    "DocBook: The Definitive
    Guide"
    ["oreilly",1] "oreilly" "RESTful Web Services"
    Publisher

    View Slide

  11. Collated Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "oreilly"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "oreilly"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "oreilly"
    "DocBook: The Definitive
    Guide"
    ["oreilly",1] "oreilly" "RESTful Web Services"
    Publisher
    Books

    View Slide

  12. View Result Sets
    Made up of columns and rows
    Every row has the same three columns:
    • key
    • id
    • value
    Columns can contain a mixture of logical data types

    View Slide

  13. One to Many Relationships

    View Slide

  14. Embedded Entities:
    Nest related entities within a document

    View Slide

  15. A single document represents the “one” entity
    Nested entities (JSON Array) represents the “many” entities
    Simplest way to create a one to many relationship
    Embedded Entities

    View Slide

  16. Example: Publisher
    with Nested Books
    {
    "_id":"oreilly",
    "collection":"publisher",
    "name":"O'Reilly Media",
    "books":[
    { "title":"CouchDB: The Definitive Guide" },
    { "title":"RESTful Web Services" },
    { "title":"DocBook: The Definitive Guide" },
    { "title":"Building iPhone Apps with HTML, CSS,
    and JavaScript" }
    ]
    }

    View Slide

  17. function(doc) {
    if ("publisher" == doc.collection) {
    emit([doc._id, 0], doc.name);
    for (var i in doc.books) {
    emit([doc._id, 1], doc.books[i].title);
    }
    }
    }
    Map Function

    View Slide

  18. Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "oreilly"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "oreilly"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "oreilly"
    "DocBook: The Definitive
    Guide"
    ["oreilly",1] "oreilly" "RESTful Web Services"

    View Slide

  19. Only works if there aren’t a large number of related entities:
    • Too many nested entities can result in very large documents
    • Slow to transfer between client and server
    • Unwieldy to modify
    • Time-consuming to index
    Limitations

    View Slide

  20. Related Documents:
    Reference an entity by its identi er

    View Slide

  21. A document representing the “one” entity
    Separate documents for each “many” entity
    Each “many” entity references its related
    “one” entity by the “one” entity’s document identi er
    Makes for smaller documents
    Reduces the probability of document update con icts
    Related Documents

    View Slide

  22. Example: Publisher
    {
    "_id":"oreilly",
    "collection":"publisher",
    "name":"O'Reilly Media"
    }

    View Slide

  23. Example: Related Book
    {
    "_id":"9780596155896",
    "collection":"book",
    "title":"CouchDB: The Definitive Guide",
    "publisher":"oreilly"
    }

    View Slide

  24. Map Function
    function(doc) {
    if ("publisher" == doc.collection) {
    emit([doc._id, 0], doc.name);
    }
    if ("book" == doc.collection) {
    emit([doc.publisher, 1], doc.title);
    }
    }

    View Slide

  25. Result Set
    key id value
    ["oreilly",0] "oreilly" "O'Reilly Media"
    ["oreilly",1] "9780596155896"
    "CouchDB: The Definitive
    Guide"
    ["oreilly",1] "9780596529260" "RESTful Web Services"
    ["oreilly",1] "9780596805791"
    "Building iPhone Apps with
    HTML, CSS, and JavaScript"
    ["oreilly",1] "9781565925809"
    "DocBook: The Definitive
    Guide"

    View Slide

  26. When retrieving the entity on the “right” side of the relationship,
    one cannot include any data from the entity on the “left” side of
    the relationship without the use of an additional query
    Only works for one to many relationships
    Limitations

    View Slide

  27. Many to Many Relationships

    View Slide

  28. List of Keys:
    Reference entities by their identi ers

    View Slide

  29. A document representing each “many” entity on the “left” side
    of the relationship
    Separate documents for each “many” entity on the “right” side
    of the relationship
    Each “many” entity on the “right” side of the relationship
    maintains a list of document identi ers for its related “many”
    entities on the “left” side of the relationship
    List of Keys

    View Slide

  30. Books and Related Authors

    View Slide

  31. Example: Book
    {
    "_id":"9780596805029",
    "collection":"book",
    "title":"DocBook 5: The Definitive Guide"
    }

    View Slide

  32. Example: Book
    {
    "_id":"9781565920514",
    "collection":"book",
    "title":"Making TeX Work"
    }

    View Slide

  33. Example: Book
    {
    "_id":"9781565925809",
    "collection":"book",
    "title":"DocBook: The Definitive Guide"
    }

    View Slide

  34. Example: Author
    {
    "_id":"muellner",
    "collection":"author",
    "name":"Leonard Muellner",
    "books":[
    "9781565925809"
    ]
    }

    View Slide

  35. Example: Author
    {
    "_id":"walsh",
    "collection":"author",
    "name":"Norman Walsh",
    "books":[
    "9780596805029",
    "9781565925809",
    "9781565920514"
    ]
    }

    View Slide

  36. Map Function
    function(doc) {
    if ("book" == doc.collection) {
    emit([doc._id, 0], doc.title);
    }
    if ("author" == doc.collection) {
    for (var i in doc.books) {
    emit([doc.books[i], 1], doc.name);
    }
    }
    }

    View Slide

  37. Result Set
    key id value
    ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
    ["9780596805029",1] "walsh" "Norman Walsh"
    ["9781565920514",0] "9781565920514" "Making TeX Work"
    ["9781565920514",1] "walsh" "Norman Walsh"
    ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
    ["9781565925809",1] "muellner" "Leonard Muellner"
    ["9781565925809",1] "walsh" "Norman Walsh"

    View Slide

  38. Authors and Related Books

    View Slide

  39. function(doc) {
    if ("author" == doc.collection) {
    emit([doc._id, 0], doc.name);
    for (var i in doc.books) {
    emit([doc._id, 1], {"_id":doc.books[i]});
    }
    }
    }
    Map Function

    View Slide

  40. Result Set
    key id value
    ["muellner",0] "muellner" "Leonard Muellner"
    ["muellner",1] "muellner" {"_id":"9781565925809"}
    ["walsh",0] "walsh" "Norman Walsh"
    ["walsh",1] "walsh" {"_id":"9780596805029"}
    ["walsh",1] "walsh" {"_id":"9781565920514"}
    ["walsh",1] "walsh" {"_id":"9781565925809"}

    View Slide

  41. Including Docs
    include_docs=true
    key id value doc (truncated)
    ["muellner",0] "muellner" … {"name":"Leonard Muellner"}
    ["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}
    ["walsh",0] "walsh" … {"name":"Norman Walsh"}
    ["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}
    ["walsh",1] "walsh" … {"title":"Making TeX Work"}
    ["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}

    View Slide

  42. Or, we can reverse the references…

    View Slide

  43. Example: Author
    {
    "_id":"muellner",
    "collection":"author",
    "name":"Leonard Muellner"
    }

    View Slide

  44. Example: Author
    {
    "_id":"walsh",
    "collection":"author",
    "name":"Norman Walsh"
    }

    View Slide

  45. Example: Book
    {
    "_id":"9780596805029",
    "collection":"book",
    "title":"DocBook 5: The Definitive Guide",
    "authors":[
    "walsh"
    ]
    }

    View Slide

  46. Example: Book
    {
    "_id":"9781565920514",
    "collection":"book",
    "title":"Making TeX Work",
    "authors":[
    "walsh"
    ]
    }

    View Slide

  47. Example: Book
    {
    "_id":"9781565925809",
    "collection":"book",
    "title":"DocBook: The Definitive Guide",
    "authors":[
    "muellner",
    "walsh"
    ]
    }

    View Slide

  48. Map Function
    function(doc) {
    if ("author" == doc.collection) {
    emit([doc._id, 0], doc.name);
    }
    if ("book" == doc.collection) {
    for (var i in doc.authors) {
    emit([doc.authors[i], 1], doc.title);
    }
    }
    }

    View Slide

  49. Result Set
    key id value
    ["muellner",0] "muellner" "Leonard Muellner"
    ["muellner",1] "9781565925809" "DocBook: The Definitive Guide"
    ["walsh",0] "walsh" "Norman Walsh"
    ["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"
    ["walsh",1] "9781565920514" "Making TeX Work"
    ["walsh",1] "9781565925809" "DocBook: The Definitive Guide"

    View Slide

  50. Queries from the “right” side of the relationship cannot include
    any data from entities on the “left” side of the relationship
    (without the use of include_docs)
    A document representing an entity with lots of relationships
    could become quite large
    Limitations

    View Slide

  51. Relationship Documents:
    Create a document to represent each
    individual relationship

    View Slide

  52. A document representing each “many” entity on the “left” side
    of the relationship
    Separate documents for each “many” entity on the “right” side
    of the relationship
    Neither the “left” nor “right” side of the relationship contain any
    direct references to each other
    For each distinct relationship, a separate document includes the
    document identi ers for both the “left” and “right” sides of the
    relationship
    Relationship Documents

    View Slide

  53. Example: Book
    {
    "_id":"9780596805029",
    "collection":"book",
    "title":"DocBook 5: The Definitive Guide"
    }

    View Slide

  54. Example: Book
    {
    "_id":"9781565920514",
    "collection":"book",
    "title":"Making TeX Work"
    }

    View Slide

  55. Example: Book
    {
    "_id":"9781565925809",
    "collection":"book",
    "title":"DocBook: The Definitive Guide"
    }

    View Slide

  56. Example: Author
    {
    "_id":"muellner",
    "collection":"author",
    "name":"Leonard Muellner"
    }

    View Slide

  57. Example: Author
    {
    "_id":"walsh",
    "collection":"author",
    "name":"Norman Walsh"
    }

    View Slide

  58. Example:
    Relationship Document
    {
    "_id":"44005f2c",
    "collection":"book-author",
    "book":"9780596805029",
    "author":"walsh"
    }

    View Slide

  59. Example:
    Relationship Document
    {
    "_id":"44005f72",
    "collection":"book-author",
    "book":"9781565920514",
    "author":"walsh"
    }

    View Slide

  60. Example:
    Relationship Document
    {
    "_id":"44006720",
    "collection":"book-author",
    "book":"9781565925809",
    "author":"muellner"
    }

    View Slide

  61. Example:
    Relationship Document
    {
    "_id":"44006b0d",
    "collection":"book-author",
    "book":"9781565925809",
    "author":"walsh"
    }

    View Slide

  62. Books and Related Authors

    View Slide

  63. function(doc) {
    if ("book" == doc.collection) {
    emit([doc._id, 0], doc.title);
    }
    if ("book-author" == doc.collection) {
    emit([doc.book, 1], {"_id":doc.author});
    }
    }
    Map Function

    View Slide

  64. Result Set
    key id value
    ["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
    ["9780596805029",1] "44005f2c" {"_id":"walsh"}
    ["9781565920514",0] "9781565920514" "Making TeX Work"
    ["9781565920514",1] "44005f72" {"_id":"walsh"}
    ["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
    ["9781565925809",1] "44006720" {"_id":"muellner"}
    ["9781565925809",1] "44006b0d" {"_id":"walsh"}

    View Slide

  65. Including Docs
    include_docs=true
    key id value doc (truncated)
    ["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}
    ["9780596805029",1] … … {"name":"Norman Walsh"}
    ["9781565920514",0] … … {"title":"Making TeX Work"}
    ["9781565920514",1] … … {"author","name":"Norman Walsh"}
    ["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}
    ["9781565925809",1] … … {"name":"Leonard Muellner"}
    ["9781565925809",1] … … {"name":"Norman Walsh"}

    View Slide

  66. Authors and Related Books

    View Slide

  67. function(doc) {
    if ("author" == doc.collection) {
    emit([doc._id, 0], doc.name);
    }
    if ("book-author" == doc.collection) {
    emit([doc.author, 1], {"_id":doc.book});
    }
    }
    Map Function

    View Slide

  68. Result Set
    key id value
    ["muellner",0] "muellner" "Leonard Muellner"
    ["muellner",1] "44006720" {"_id":"9781565925809"}
    ["walsh",0] "walsh" "Norman Walsh"
    ["walsh",1] "44005f2c" {"_id":"9780596805029"}
    ["walsh",1] "44005f72" {"_id":"9781565920514"}
    ["walsh",1] "44006b0d" {"_id":"9781565925809"}

    View Slide

  69. Including Docs
    include_docs=true
    key id value doc (truncated)
    ["muellner",0] … … {"name":"Leonard Muellner"}
    ["muellner",1] … … {"title":"DocBook: The Definitive Guide"}
    ["walsh",0] … … {"name":"Norman Walsh"}
    ["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}
    ["walsh",1] … … {"title":"Making TeX Work"}
    ["walsh",1] … … {"title":"DocBook: The Definitive Guide"}

    View Slide

  70. Queries can only contain data from the “left” or “right” side of the
    relationship (without the use of include_docs)
    Maintaining relationship documents may require more work
    Limitations

    View Slide

  71. Final Thoughts

    View Slide

  72. Document databases have no tables (and therefore no columns)
    Indexes (views) are queried directly, instead of being used to
    optimize more generalized queries
    Result set columns can contain a mix of logical data types
    No built-in concept of relationships between documents
    Related entities can be embedded in a document, referenced from
    a document, or both
    Document Databases Compared
    to Relational Databases

    View Slide

  73. Caveats
    No referential integrity
    No atomic transactions across document boundaries
    Some patterns may involve denormalized (i.e. redundant) data
    Data inconsistencies are inevitable (i.e. eventual consistency)
    Consider the implications of replication—what may seem
    consistent with one database may not be consistent across nodes
    (e.g. referencing entities that don’t yet exist on the node)

    View Slide

  74. Additional Techniques
    Use the startkey and endkey parameters to retrieve one entity and
    its related entities:
    startkey=["9781565925809"]&endkey=["9781565925809",{}]
    De ne a reduce function and use grouping levels
    Use UUIDs rather than natural keys for better performance
    Use the bulk document API when writing Relationship Documents
    When using the List of Keys or Relationship Documents patterns,
    denormalize data so that you can have data from the “right” and
    “left” side of the relationship within your query results

    View Slide

  75. Cheat Sheet
    One to Many
    Many to Many
    <= N* Relations
    > N* Relations
    Embedded
    Entities
    Related
    Documents
    List of Keys
    Relationship
    Documents
    ✓ ✓
    ✓ ✓
    ✓ ✓
    ✓ ✓
    * where N is a large number for your system

    View Slide

  76. http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/

    View Slide

  77. Thank You
    @BradleyHolt
    http://bradley-holt.com
    [email protected]
    Copyright © 2011-2012 Bradley Holt. All rights reserved.

    View Slide