DOs and DON’Ts of MongoDB

Slide 1

Slide 1 text

DO DOS S AND DON'T AND DON'TS S Jeremy Mikola jmikola

Slide 2

Slide 2 text

TOPICS TOPICS Schema Design Write Operations Reading and Querying Replication and Sharding Deployment and Ops Object Document Mappers

Slide 3

Slide 3 text

SCHEMA DESIGN SCHEMA DESIGN

Slide 4

Slide 4 text

Making do without joins Making do without joins References, embedded objects, both? Don’t be afraid to denormalize.

Slide 5

Slide 5 text

Data Locality Data Locality { _id: "jmikola", name: "Jeremy Mikola", friends: [ "bjori", "derickr" ] } vs. { _id: "jmikola", name: "Jeremy Mikola", friends: [ { id: "bjori", name: "Hannes Magnusson" }, { id: "derickr", name: "Derick Rethans" } ] }

Slide 6

Slide 6 text

Store computed values for querying Store computed values for querying Counts or array lengths can be indexed and sorted. Easily updated with $inc and $set.

Slide 7

Slide 7 text

Don’t create eld paths willy-nilly Don’t create eld paths willy-nilly > db.messages.findOne({}, { isReadByParticipant: 1 }) { _id: ObjectId("4fce28482516ed983884b158"), isReadByParticipant: { "4fce05e42516ed9838756f17": false, "4fce05e42516ed9838756f18": true, "4fce05e42516ed9838756f19": true, "4fce05e42516ed9838756f1a": false, "4fce05e42516ed9838756f1b": false } } How can we index this?

Slide 8

Slide 8 text

Multi-key indexing to the rescue Multi-key indexing to the rescue > db.messages.findOne({}, { unreadForParticipants: 1 }) { _id: ObjectId("4fce28482516ed983884b158"), unreadForParticipants: [ "4fce05e42516ed9838756f17", "4fce05e42516ed9838756f1a", "4fce05e42516ed9838756f1b" ] } A tidbit from FOSMessageBundle

Slide 9

Slide 9 text

Multi-key indexing for EAV Multi-key indexing for EAV { _id: "product-1", size: "large", color: "blue" } vs. { _id: "product-1", attributes: [ { k: "size", v: "large" }, { k: "color", v: "blue" } ] } or { _id: "product-1", attributes: [ "size=large", "color=blue" ] }

Slide 10

Slide 10 text

Field paths and trees structures Field paths and trees structures > db.trees.findOne() { _id: ObjectId("5966500e01c7635140447bba"), name: "Conifer", subCategories: [ { name: "Pine", subCategories: [ { name: "Larch" }, { name: "Spruce" }, { name: "Douglas Fir" } ] }, { name: "Cypress", subCategories: [ … ] } ] } One document contains the entire family. How can we query this?

Slide 11

Slide 11 text

A better tree schema A better tree schema > db.trees.find() { _id: "Conifer" } { _id: "Pine", parent: "Conifer" } { _id: "Larch", parent: "Pine" } How can we query for an entire branch?

Slide 12

Slide 12 text

But wait, there’s more! But wait, there’s more! > db.trees.find() { _id: "Conifer", path: ["Conifer"] } { _id: "Pine", path: ["Pine", "Conifer"] } { _id: "Larch", path: ["Larch", "Pine", "Conifer"] } Multi-key indexing on path allows branch querying Relationship querying possible with single- eld index(es)

Slide 13

Slide 13 text

Don’t abuse schema exibility Don’t abuse schema exibility Create schemas that support your query patterns. Then create indexes for those queries.

Slide 14

Slide 14 text

Make the most of your indexes Make the most of your indexes Kill 2+ birds with one stone Compound and multi-key indexes Mind your read/write ratio Ensure query selectivity

Slide 15

Slide 15 text

Don’t shoot in the dark Don’t shoot in the dark your cursors. Explain slow queries. Pro le

Slide 16

Slide 16 text

Further Reading Further Reading – John Nunemaker – Derick Rethans – MongoDB manual MongoDB for Analytics Importing OpenStreetMap Data Use Cases

Slide 17

Slide 17 text

GETTING YOUR DATA GETTING YOUR DATA INTO MONGODB INTO MONGODB And keeping it there… And keeping it there…

Slide 18

Slide 18 text

Write Concern Write Concern 0 No write acknowledgement 1 Write acknowledgement from the primary (default) Write acknowledgement from n nodes "majority" Write acknowledgement from the majority of voting nodes (includes journaling) Write acknowledgement to a node with the given tag set Additional wtimeout and journal options.

Slide 19

Slide 19 text

save() save() is an anti-pattern is an anti-pattern document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.save(document); Query, modify, and overwrite.

Slide 20

Slide 20 text

Understanding Understanding save() save()’s syntactic sugar ’s syntactic sugar

Slide 21

Slide 21 text

Overloaded vs. Explicit Methods Overloaded vs. Explicit Methods Our drops save() and de nes new operations for each legacy method mode: CRUD speci cation insert() update() remove() insertOne() insertMany() updateOne() updateMany() replaceOne() deleteOne() deleteMany()

Slide 22

Slide 22 text

Use atomic operators when possible Use atomic operators when possible document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.replaceOne({ _id: "jmikola" }, document); or document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.updateOne( { _id: "jmikola" }, { $set: { friends: document["friends"] }} ); vs. db.users.updateOne( { _id: "jmikola" }, { $push: { friends: "pgodel" }} );

Slide 23

Slide 23 text

Atomicity in MongoDB Atomicity in MongoDB No transactions for multi-document writes. Emulate transactions with . two-phase commits Single document updates are atomic. Can we query and update atomically?

Slide 24

Slide 24 text

The The command command ndAndModify ndAndModify Atomically selects and modi es a document in one of three modes: findOneAndDelete() findOneAndReplace() findOneAndUpdate()

Slide 25

Slide 25 text

Implementing a simple job queue Implementing a simple job queue // Insert a request to borrow a library book db.loans.insertOne({ _id: { borrower: "bjori", book: ObjectId("…") }, approved: false, pending: false, priority: 1 }); // Mark the highest priority request as pending request = db.loans.findOneAndUpdate( { pending: false }, { $set: { pending: true }}, { returnNewDocument: true, sort: { priority: -1 } } );

Slide 26

Slide 26 text

Further Reading Further Reading How To Write Resilient MongoDB Applications bit.ly/resilient-applications

Slide 27

Slide 27 text

READING AND QUERYING READING AND QUERYING

Slide 28

Slide 28 text

If you remember only two things… If you remember only two things… Index your queries. Know your . working set

Slide 29

Slide 29 text

Read Preference Read Preference "primary" Select the primary (default) "primaryPreferred" Select the primary if available; fall back to a secondary "secondary" Select a secondary "secondaryPreferred" Select a secondary if available; fall back to the primary "nearest" Select the node with least network latency may be used for more ne-grained selection Tag sets

Slide 30

Slide 30 text

Read Concern Read Concern "local" Return the node’s most recent data, which may be rolled back (default) "majority" Return the node’s most recent data acknowledged by a majority of replica set "linearizable" Return the primary’s most recent data written with a "majority" write concern and acknowledged prior to the start of the query (i.e. data cannot be rolled back if journaled)

Slide 31

Slide 31 text

JavaScript Evaluation JavaScript Evaluation and . eval $where Would you use in JavaScript? eval()

Slide 32

Slide 32 text

MapReduce MapReduce We’ll make an allowance for JavaScript here. But try the rst. aggregation framework

Slide 33

Slide 33 text

Aggregation Framework Aggregation Framework

Slide 34

Slide 34 text

Aggregation Framework Aggregation Framework { _id: "His Majesty's Dragon", subjects: ["Fantasy", "Historical"], published: ISODate("2006-03-28T00:00:00.000Z") } ▼ db.books.aggregate([ { $sort: { created: 1 }}, { $unwind: "$subjects" }, { $group: { _id: "$subjects", total: { $sum: 1 }, firstPublished: { $first: { $year: "$published" }} }} ]); ▼ { _id: "Fantasy", total: 6, "firstPublished": 2002 }, { _id: "Historical", total: 7, "firstPublished": 1974 }, { _id: "World Literature", total: 2, "firstPublished": 1995 }

Slide 35

Slide 35 text

"But MongoDB doesn’t do joins!" "But MongoDB doesn’t do joins!" orders collection: { _id: 1, item: "abc", price: 12, quantity: 2 } { _id: 2, item: "jkl", price: 20, quantity: 1 } { _id: 3 } inventory collection: { _id: 1, sku: "abc", description: "product 1", instock: 120 } { _id: 2, sku: "def", description: "product 2", instock: 80 } { _id: 3, sku: "ijk", description: "product 3", instock: 60 } { _id: 4, sku: "jkl", description: "product 4", instock: 70 } { _id: 5, sku: null, description: "Incomplete" } { _id: 6 }

Slide 36

Slide 36 text

"But MongoDB doesn’t do joins!" "But MongoDB doesn’t do joins!" db.orders.aggregate([ { $lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" }} ]); ▼ { _id: 1, item: "abc", price: 12, quantity: 2, inventory_docs : [ { _id: 1, sku: "abc", description: "product 1", instock: 120 } ] } Usable with unsharded collections in the same database

Slide 37

Slide 37 text

Limiting Execution Time Limiting Execution Time query and command option maxTimeMS Do not rely on client-side socket timeouts

Slide 38

Slide 38 text

REPLICATION AND SHARDING REPLICATION AND SHARDING

Slide 39

Slide 39 text

Replication vs. Sharding Replication vs. Sharding Replication is the tool for data safety, high availability, and disaster recovery. Sharding is the tool for scaling a system.

Slide 40

Slide 40 text

Replication Replication

Slide 41

Slide 41 text

What does replication do for us? What does replication do for us?

Slide 42

Slide 42 text

Replication provides failover recovery Replication provides failover recovery

Slide 43

Slide 43 text

Making the most of replication Making the most of replication Always have an odd number of voting members. Nodes can be (e.g. purpose, location). tagged Take advantage of , , and . priority hidden delay Use tags to de ne . custom write concerns

Slide 44

Slide 44 text

Sharding Sharding Not shown: a very tedious deployment process

Slide 45

Slide 45 text

What does sharding do for us? What does sharding do for us?

Slide 46

Slide 46 text

Sharding provides horizontal scalability Sharding provides horizontal scalability

Slide 47

Slide 47 text

Sharding Sharding Each shard is a single mongod or replica set that stores a portion of the total data set. The shard key speci es one or more elds and determines the distribution of documents among the cluster's shards. MongoDB attempts to keep chunks evenly distributed among the shards.

Slide 48

Slide 48 text

Select a good shard key Select a good shard key This is the most important decision. Once a collection is sharded, the shard key and its values are immutable!

Slide 49

Slide 49 text

Ranged Sharding Ranged Sharding

Slide 50

Slide 50 text

Shard Key Distribution Shard Key Distribution

Slide 51

Slide 51 text

Hashed Sharding Hashed Sharding

Slide 52

Slide 52 text

Right-balanced Access

Slide 53

Slide 53 text

Random Access

Slide 54

Slide 54 text

Segmented Access

Slide 55

Slide 55 text

Zone Sharding Zone Sharding Zones are shard key ranges that can be associated with one or more shards. Isolate subset of data to speci c set of shards Enforce geographic distribution of data Route data based on hardware/performance

Slide 56

Slide 56 text

Zone Sharding Zone Sharding

Slide 57

Slide 57 text

DEPLOYMENT AND OPS DEPLOYMENT AND OPS

Slide 58

Slide 58 text

Security Checklist Security Checklist Enable and enforce authentication Con gure role-based access control Con gure TLS/SSL for connections Limit network exposure Encrypt and protect database les Run MongoDB as a dedicated user Hardened server and network con guration

Slide 59

Slide 59 text

Operations Checklist Operations Checklist Adjust replica set oplog size Enable journaling for writes Driver connection pooling (if applicable) Filesystem choice (XFS and NTFS preferred) Schedule and test backup processes Monitor database metrics and hardware Tweak operating system con guration

Slide 60

Slide 60 text

Monitoring, Backup, Automation Monitoring, Backup, Automation mongodb.com/cloud/cloud-manager

Slide 61

Slide 61 text

MongoDB as a Service MongoDB as a Service atlas.mongodb.com

Slide 62

Slide 62 text

OBJECT DOCUMENT MAPPERS OBJECT DOCUMENT MAPPERS

Slide 63

Slide 63 text

ODMs are a great tool ODMs are a great tool Employ a real document model Framework and library integration Accelerate application development Abstract the database layer Watch out for that last one. Grok your DB and driver before abstracting it.

Slide 64

Slide 64 text

The same principles apply The same principles apply Essentially the ORM can handle about 80- 90% of the mapping problems, but that last chunk always needs careful work by somebody who really understands how a relational database works. — Martin Fowler in OrmHate

Slide 65

Slide 65 text

Be an informed user Be an informed user Active Record vs. Data Mapper Are changes written with atomic modi ers? How is replication and sharding integrated? How are references handled? How are embedded documents managed? Are commands beyond basic CRUD supported? Is the driver API available if needed?

Slide 66

Slide 66 text

Thanks! Thanks! Questions? Questions?

Slide 67

Slide 67 text

Photo Credits Photo Credits http://dilbert.com/strips/comic/1996-02-28