Data Modeling with MongoDB

Data Modeling with MongoDB Yulia Genkina Curriculum Engineer @ MongoDB

Key Considerations Agenda

Linking vs. Embedding Agenda Key Considerations

Agenda Linking vs. Embedding Key Considerations Design Patterns

Sub - Bullet points Linking vs. Embedding Key Considerations Use
Case Example Design Patterns

Agenda Linking vs. Embedding Key Considerations Use Case Example Design
Patterns Conclusion

RDBMS approach to data modeling vs. MongoDB Let’s Compare

Concerns Modeling for RDBMS Step 1: Define the Schema Step
2: Develop the application and queries

2: Develop the application and queries ? ?

2: Develop the application and queries

Develop the Application Define the Data Model Improve the Data
Model Data Modeling with MongoDB Improve the Application

Designed for the usage pattern Data model evolution is easy
Can evolve without any downtime Many design options Improve the Data Model Improve the Application

For Data Modeling with MongoDB Key Considerations

There Is No Magic Formula, but There Is A Method
Data model is defined at the application level Design is part of each phase of the application lifetime What affects the data model: o The data that your application needs o Application’s read and write usage of the data

Methodology to Achieve a Near Magic Almost Formula Data Modeling

Evaluate the application workload • Data size • A list
of operations ranked by importance Step-by-step Iteration • Data size • Database queries and indexes • Current operations and assumptions ü Business domain expertise ü Current and predicted scenarios ü Production logs and stats

Evaluate the application workload Map out entities and their relationships
• Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) Step-by-step Iteration • Data size • Database queries and indexes • Current operations and assumptions • Business domain expertise • Current and predicted scenarios • Production logs and stats

Which is the Right Decision and What Does it Mean?
Link vs. Embed

What Can Be Linked? Relationships: • One-to-one • One-to-many •
Many-to-many Example: Entities and relationships in a Blog users • name • email 1-to-N articles • title • date • text tags • name • url categories • name • url comments • name • url N-to-N 1-to-N N-to-N 1-to-N

One-to-One Linked Book = { // either side can track
"_id": 1, "title": "Harry Potter and the Methods of Rationality", "slug": "9781857150193-hpmor", "author": 1, // more fields follow… } Author = { "_id": 1, "firstName": "Eliezer", "lastName": "Yudkowsky" "book": 1, // more fields follow… }

One-to-One Embedded Book = { "_id": 1, "title": "Harry Potter
and the Methods of Rationality", "slug": "9781857150193-hpmor", "author": { "firstName": "Eliezer", "lastName": "Yudkowsky" }, // more fields follow… }

One-to-Many: Array in Parent Author= { "_id": 1, "firstName": "Eliezer",
"lastName": "Yudkowsky", "books": [1, 5, 17], // more fields follow… }

One-to-Many: Scalar in Child Book1= { "_id": 1, "title": "Harry
Potter and the Methods of Rationality", "slug": "9781857150193-hpmor", "author": 1, // more fields follow… } Book2= { "_id": 5, "title": "How to Actually Change Your Mind", "slug": "1939311179490-how-to-change", "author": 1, // more fields follow… }

Many-to-Many: Arrays on either side Book = { //either side
can track "_id": 5, "title": "Harry Potter and the Methods of Rationality", "slug": "9781857150193-hpmor", "authors": [1, 3], // more fields follow… } Author = { "_id": 1, "firstName": "Eliezer", "lastName": "Yudkowsky", "books": [5, 7], // more fields follow… }

Queries by articles or users Queries by articles Embed &Link
Embed All articles • title • date • text articles • title • date • text tags [] • name • url categories [] • name • url comments[] • name • url users • name • email tags [] • name • url categories [] • name • url comments[] • name • url users • name • email 1-to-N

To Link or Embed? How often does the embedded information
get accessed? Is the data queried using the embedded information? Does the embedded information change often?

Finalize the data model for each collection • Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) • Identify and apply relevant design patterns Step-by-step Iteration • Collections with documents fields and shapes for each • Data size • Database queries and indexes • Current operations assumptions, and growth projections • Business domain expertise • Current and predicted scenarios • Production logs and stats

Brief introduction Design Patterns

The Schema Versioning Pattern

The Bucket Pattern Document Approach Tabular Approach New document for
each sensor reading New document per time unit per sensor

The Bucket Pattern Really benefits from the document model Used
to store small, related data items • Bank Transactions – related by account and date • IoT Readings – related by sensor and date Reduces index sizes by a large magnitude Increases speed of retrieval of related data Enables the Computed Pattern

The Bucket Pattern Implementation sensor = 5, value = 22,
time = Date('2020-05-11') db.iot.updateOne({ "sensor": reading.sensor, "valcount": { "$lt": 200 } }, { "$push": { "readings": { "v": value, "t": time } }, "$inc": { "valcount": 1 } }, { upsert: true }) { "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "readings": [ {"v": 11, "t": Date("2020-05-09")}, {"v": 81, "t": Date("2020-05-10")}, {"v": 22, "t": Date("2020-05-11")} ] } }

The Computed Pattern CPU work

The Computed Pattern

The Computed Pattern "Never recompute what you can precompute" Reads
are often more common than writes When updating the database, update some summary records too Can be thought of as a caching pattern Compute on write is less work than compute on read

Computed Pattern with the Bucket Pattern sensor = 5, value
= 22, time = Date('2020-05-11') db.iot.updateOne({ "sensor": reading.sensor, "valcount": { $lt:200 } }, { "$push": { "readings": { "v": value, "t": time } }, "$inc": { "valcount": 1, "tot": value } }, { upsert: true }) { "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114, "readings": [ { "v": 11, "t": Date("2020-05-09” )}, { "v": 81, "t": Date("2020-05-10” )}, { "v": 22, "t": Date("2020-05-11” )} ] }

Other Patterns and Where To Find Them MongoDB Blog, MongoDB
Developer Portal and MongoDB University are all great resources to continue learning about data modeling and patterns. Design Patterns: Elements of Reusable Object- Oriented Software – a book! Other talks at this conference: • Advanced Schema Design Patterns • A Complete Methodology to Data Modeling • Using JSON Schema to Save Lives • Attribute Pattern and the Wildcard Index: Is the Attribute Pattern Obsolete? Learning

A Use Case Example Design an Online Shopping App: MongoMart

Evaluate the application workload • Data size • A list
of operations ranked by importance Step 1 • Business domain expertise • Current and predicted scenarios • Production logs and stats • Data size • Database queries and indexes • Current operations assumptions, and growth projections

Evaluate the Application Workload 1000 stores 10 Million items 100
Million user accounts Analytics • 500 thousand new accounts per week • Logging in 20 times a year • Looking up 100 items per year • Creating 5 carts per year • Reviewing 2 items per year 50 employees per stores 1 store lookup per customer per year 100 reviews per item 500 thousand updates per day Placing 4 items in the cart Buying an average of 2 items per cart 10 data scientists each running 10 queries a day

Workload Evaluation Summary Most important queries • r2: user views
a specific item – has to be under 1 ms • w3: user adds item to cart – write concern: majority Required indexes • {"category": 1, "item_name": 1} • {"category": 1, "item_name": 1, "price": 1} • {"username": 1} and more.. Assumptions and Projections • Data will be stored for a maximum of 5 years • Number of items sold and number of users will double each year List of Entities: • carts • categories • items • reviews • staff • stores • users • views

• Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) Step-by-step Iteration • Business domain expertise • Current and predicted scenarios • Production logs and stats • Collections with documents fields and shapes for each • Data size • Database queries and indexes • Current operations assumptions, and growth projections

Entity Relationship Diagram users 1-to-N 1-to-N N-to-N 1-to-N carts views
items reviews users staff stores 1-to-N N-to-N N-to-N N-to-N

Collections Relationship Diagram (Simple) items categories stores reviews users views
carts staff 1-to-N N-to-N N-to-N Embed Everything!

Collections Relationship Diagram (Better) items categories stores reviews users views
carts staff N-to-N N-to-N Accommodate for assumptions. Embed & Link! 1-to-N 1-to-N 1-to-N

Finalize the data model for each collection • Data size • A list of operations ranked by importance • CRD: Collection relationship Diagram (Link or Embed? ) • Identify and apply relevant schema patterns Step-by-step Iteration • Business domain expertise • Current and predicted scenarios • Production logs and stats • Collections with documents fields and shapes for each • Data size • Database queries and indexes • Current operations assumptions, and growth projections

Apply all the Patterns! Patterns Used: • Schema Versioning •
Subset • Computed • Bucket • Extended Reference

And additional considerations Conclusion

Your Data Model Will Evolve Small team Medium team Large
team Very big team team Just like your application

Tailor the Data Model Small team Medium team Large team
Very big team team To your unique setup • Shared hosted DB • Small team • Large Sharded Cluster • Replica Set

Flexible Data Modeling Approach For a Simpler data model focus
on: For a bit of both: For the most Performant data model focus on: Evaluate the application workload The most frequent operation • Data size • The most frequent operations • Data size • The most frequent operations • The most important operations Map out the entities and their relationships Embedding data Embedding and linking data Embedding and linking data Finalize schema for each collection Use few patterns Use as many patterns as necessary Use as many patterns as necessary

Visit our product "booths" for new features, like the new
Schema Advisor in Atlas! mongodb.com/live/product #MDBlive

Special Thanks to: John Page, Daniel Coupal, Eoin Brazil for
excellent content support #MDBlive

Data Modeling with MongoDB

Data Modeling with MongoDB

Other Decks in Technology

Featured

Transcript