Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Modeling with MongoDB

Data Modeling with MongoDB

Quick-start to data modeling with MongoDB. Find the actual talk here: https://www.youtube.com/watch?v=3GHZd0zv170

Yulia Genkina

April 07, 2021
Tweet

Other Decks in Technology

Transcript

  1. Concerns Modeling for RDBMS Step 1: Define the Schema Step

    2: Develop the application and queries
  2. Concerns Modeling for RDBMS Step 1: Define the Schema Step

    2: Develop the application and queries ? ?
  3. Concerns Modeling for RDBMS Step 1: Define the Schema Step

    2: Develop the application and queries
  4. Concerns Modeling for RDBMS Step 1: Define the Schema Step

    2: Develop the application and queries
  5. Develop the Application Define the Data Model Improve the Data

    Model Data Modeling with MongoDB Improve the Application
  6. Designed for the usage pattern Data model evolution is easy

    Can evolve without any downtime Many design options Improve the Data Model Improve the Application
  7. There Is No Magic Formula, but There Is A Method

    Data model is defined at the application level Design is part of each phase of the application lifetime What affects the data model: o The data that your application needs o Application’s read and write usage of the data
  8. Evaluate the application workload • Data size • A list

    of operations ranked by importance Step-by-step Iteration • Data size • Database queries and indexes • Current operations and assumptions ü Business domain expertise ü Current and predicted scenarios ü Production logs and stats
  9. Evaluate the application workload Map out entities and their relationships

    • Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) Step-by-step Iteration • Data size • Database queries and indexes • Current operations and assumptions • Business domain expertise • Current and predicted scenarios • Production logs and stats
  10. What Can Be Linked? Relationships: • One-to-one • One-to-many •

    Many-to-many Example: Entities and relationships in a Blog users • name • email 1-to-N articles • title • date • text tags • name • url categories • name • url comments • name • url N-to-N 1-to-N N-to-N 1-to-N
  11. One-to-One Linked Book = { // either side can track

    "_id": 1, "title": "Harry Potter and the Methods of Rationality", "slug": "9781857150193-hpmor", "author": 1, // more fields follow… } Author = { "_id": 1, "firstName": "Eliezer", "lastName": "Yudkowsky" "book": 1, // more fields follow… }
  12. One-to-One Embedded Book = { "_id": 1, "title": "Harry Potter

    and the Methods of Rationality", "slug": "9781857150193-hpmor", "author": { "firstName": "Eliezer", "lastName": "Yudkowsky" }, // more fields follow… }
  13. One-to-Many: Array in Parent Author= { "_id": 1, "firstName": "Eliezer",

    "lastName": "Yudkowsky", "books": [1, 5, 17], // more fields follow… }
  14. One-to-Many: Scalar in Child Book1= { "_id": 1, "title": "Harry

    Potter and the Methods of Rationality", "slug": "9781857150193-hpmor", "author": 1, // more fields follow… } Book2= { "_id": 5, "title": "How to Actually Change Your Mind", "slug": "1939311179490-how-to-change", "author": 1, // more fields follow… }
  15. Many-to-Many: Arrays on either side Book = { //either side

    can track "_id": 5, "title": "Harry Potter and the Methods of Rationality", "slug": "9781857150193-hpmor", "authors": [1, 3], // more fields follow… } Author = { "_id": 1, "firstName": "Eliezer", "lastName": "Yudkowsky", "books": [5, 7], // more fields follow… }
  16. Queries by articles or users Queries by articles Embed &Link

    Embed All articles • title • date • text articles • title • date • text tags [] • name • url categories [] • name • url comments[] • name • url users • name • email tags [] • name • url categories [] • name • url comments[] • name • url users • name • email 1-to-N
  17. To Link or Embed? How often does the embedded information

    get accessed? Is the data queried using the embedded information? Does the embedded information change often?
  18. Evaluate the application workload Map out entities and their relationships

    Finalize the data model for each collection • Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) • Identify and apply relevant design patterns Step-by-step Iteration • Collections with documents fields and shapes for each • Data size • Database queries and indexes • Current operations assumptions, and growth projections • Business domain expertise • Current and predicted scenarios • Production logs and stats
  19. The Bucket Pattern Document Approach Tabular Approach New document for

    each sensor reading New document per time unit per sensor
  20. The Bucket Pattern Really benefits from the document model Used

    to store small, related data items • Bank Transactions – related by account and date • IoT Readings – related by sensor and date Reduces index sizes by a large magnitude Increases speed of retrieval of related data Enables the Computed Pattern
  21. The Bucket Pattern Implementation sensor = 5, value = 22,

    time = Date('2020-05-11') db.iot.updateOne({ "sensor": reading.sensor, "valcount": { "$lt": 200 } }, { "$push": { "readings": { "v": value, "t": time } }, "$inc": { "valcount": 1 } }, { upsert: true }) { "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "readings": [ {"v": 11, "t": Date("2020-05-09")}, {"v": 81, "t": Date("2020-05-10")}, {"v": 22, "t": Date("2020-05-11")} ] } }
  22. The Computed Pattern "Never recompute what you can precompute" Reads

    are often more common than writes When updating the database, update some summary records too Can be thought of as a caching pattern Compute on write is less work than compute on read
  23. Computed Pattern with the Bucket Pattern sensor = 5, value

    = 22, time = Date('2020-05-11') db.iot.updateOne({ "sensor": reading.sensor, "valcount": { $lt:200 } }, { "$push": { "readings": { "v": value, "t": time } }, "$inc": { "valcount": 1, "tot": value } }, { upsert: true }) { "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114, "readings": [ { "v": 11, "t": Date("2020-05-09” )}, { "v": 81, "t": Date("2020-05-10” )}, { "v": 22, "t": Date("2020-05-11” )} ] }
  24. Other Patterns and Where To Find Them MongoDB Blog, MongoDB

    Developer Portal and MongoDB University are all great resources to continue learning about data modeling and patterns. Design Patterns: Elements of Reusable Object- Oriented Software – a book! Other talks at this conference: • Advanced Schema Design Patterns • A Complete Methodology to Data Modeling • Using JSON Schema to Save Lives • Attribute Pattern and the Wildcard Index: Is the Attribute Pattern Obsolete? Learning
  25. Evaluate the application workload • Data size • A list

    of operations ranked by importance Step 1 • Business domain expertise • Current and predicted scenarios • Production logs and stats • Data size • Database queries and indexes • Current operations assumptions, and growth projections
  26. Evaluate the Application Workload 1000 stores 10 Million items 100

    Million user accounts Analytics • 500 thousand new accounts per week • Logging in 20 times a year • Looking up 100 items per year • Creating 5 carts per year • Reviewing 2 items per year 50 employees per stores 1 store lookup per customer per year 100 reviews per item 500 thousand updates per day Placing 4 items in the cart Buying an average of 2 items per cart 10 data scientists each running 10 queries a day
  27. Workload Evaluation Summary Most important queries • r2: user views

    a specific item – has to be under 1 ms • w3: user adds item to cart – write concern: majority Required indexes • {"category": 1, "item_name": 1} • {"category": 1, "item_name": 1, "price": 1} • {"username": 1} and more.. Assumptions and Projections • Data will be stored for a maximum of 5 years • Number of items sold and number of users will double each year List of Entities: • carts • categories • items • reviews • staff • stores • users • views
  28. Evaluate the application workload Map out entities and their relationships

    • Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) Step-by-step Iteration • Business domain expertise • Current and predicted scenarios • Production logs and stats • Collections with documents fields and shapes for each • Data size • Database queries and indexes • Current operations assumptions, and growth projections
  29. Entity Relationship Diagram users 1-to-N 1-to-N N-to-N 1-to-N carts views

    items reviews users staff stores 1-to-N N-to-N N-to-N N-to-N
  30. Collections Relationship Diagram (Better) items categories stores reviews users views

    carts staff N-to-N N-to-N Accommodate for assumptions. Embed & Link! 1-to-N 1-to-N 1-to-N
  31. Evaluate the application workload Map out entities and their relationships

    Finalize the data model for each collection • Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) • Identify and apply relevant schema patterns Step-by-step Iteration • Business domain expertise • Current and predicted scenarios • Production logs and stats • Collections with documents fields and shapes for each • Data size • Database queries and indexes • Current operations assumptions, and growth projections
  32. Apply all the Patterns! Patterns Used: • Schema Versioning •

    Subset • Computed • Bucket • Extended Reference
  33. Your Data Model Will Evolve Small team Medium team Large

    team Very big team team Just like your application
  34. Tailor the Data Model Small team Medium team Large team

    Very big team team To your unique setup • Shared hosted DB • Small team • Large Sharded Cluster • Replica Set
  35. Flexible Data Modeling Approach For a Simpler data model focus

    on: For a bit of both: For the most Performant data model focus on: Evaluate the application workload The most frequent operation • Data size • The most frequent operations • Data size • The most frequent operations • The most important operations Map out the entities and their relationships Embedding data Embedding and linking data Embedding and linking data Finalize schema for each collection Use few patterns Use as many patterns as necessary Use as many patterns as necessary
  36. Visit our product "booths" for new features, like the new

    Schema Advisor in Atlas! mongodb.com/live/product #MDBlive