Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DOs and DON’Ts of MongoDB

DOs and DON’Ts of MongoDB

Presented November 19, 2017 at Database Camp NYC: http://www.db.camp/2017

Presented July 14, 2017 at OpenWest: https://joind.in/talk/c8846

Presented January 14, 2016 at Ski PHP: https://joind.in/talk/view/16644

Presented May 9, 2014 at OpenWest: https://joind.in/talk/view/11191

Presented March 15, 2013 at Midwest PHP: https://joind.in/talk/view/10542

Presented October 10, 2013 at ZendCon: http://joind.in/talk/view/9101

Presented September 18, 2013 at Web & PHP Conference: https://joind.in/talk/view/8870

Presented February 9, 2013 at Sunshine PHP: https://joind.in/talk/view/8021

Reveal.js presentation published at: http://jmikola.github.com/slides/mongodb_dos_and_donts/

Jeremy Mikola

July 14, 2017
Tweet

More Decks by Jeremy Mikola

Other Decks in Programming

Transcript

  1. TOPICS TOPICS Schema Design Write Operations Reading and Querying Replication

    and Sharding Deployment and Ops Object Document Mappers
  2. Making do without joins Making do without joins References, embedded

    objects, both? Don’t be afraid to denormalize.
  3. Data Locality Data Locality { _id: "jmikola", name: "Jeremy Mikola",

    friends: [ "bjori", "derickr" ] } vs. { _id: "jmikola", name: "Jeremy Mikola", friends: [ { id: "bjori", name: "Hannes Magnusson" }, { id: "derickr", name: "Derick Rethans" } ] }
  4. Store computed values for querying Store computed values for querying

    Counts or array lengths can be indexed and sorted. Easily updated with $inc and $set.
  5. Don’t create eld paths willy-nilly Don’t create eld paths willy-nilly

    > db.messages.findOne({}, { isReadByParticipant: 1 }) { _id: ObjectId("4fce28482516ed983884b158"), isReadByParticipant: { "4fce05e42516ed9838756f17": false, "4fce05e42516ed9838756f18": true, "4fce05e42516ed9838756f19": true, "4fce05e42516ed9838756f1a": false, "4fce05e42516ed9838756f1b": false } } How can we index this?
  6. Multi-key indexing to the rescue Multi-key indexing to the rescue

    > db.messages.findOne({}, { unreadForParticipants: 1 }) { _id: ObjectId("4fce28482516ed983884b158"), unreadForParticipants: [ "4fce05e42516ed9838756f17", "4fce05e42516ed9838756f1a", "4fce05e42516ed9838756f1b" ] } A tidbit from FOSMessageBundle
  7. Multi-key indexing for EAV Multi-key indexing for EAV { _id:

    "product-1", size: "large", color: "blue" } vs. { _id: "product-1", attributes: [ { k: "size", v: "large" }, { k: "color", v: "blue" } ] } or { _id: "product-1", attributes: [ "size=large", "color=blue" ] }
  8. Field paths and trees structures Field paths and trees structures

    > db.trees.findOne() { _id: ObjectId("5966500e01c7635140447bba"), name: "Conifer", subCategories: [ { name: "Pine", subCategories: [ { name: "Larch" }, { name: "Spruce" }, { name: "Douglas Fir" } ] }, { name: "Cypress", subCategories: [ … ] } ] } One document contains the entire family. How can we query this?
  9. A better tree schema A better tree schema > db.trees.find()

    { _id: "Conifer" } { _id: "Pine", parent: "Conifer" } { _id: "Larch", parent: "Pine" } How can we query for an entire branch?
  10. But wait, there’s more! But wait, there’s more! > db.trees.find()

    { _id: "Conifer", path: ["Conifer"] } { _id: "Pine", path: ["Pine", "Conifer"] } { _id: "Larch", path: ["Larch", "Pine", "Conifer"] } Multi-key indexing on path allows branch querying Relationship querying possible with single- eld index(es)
  11. Don’t abuse schema exibility Don’t abuse schema exibility Create schemas

    that support your query patterns. Then create indexes for those queries.
  12. Make the most of your indexes Make the most of

    your indexes Kill 2+ birds with one stone Compound and multi-key indexes Mind your read/write ratio Ensure query selectivity
  13. Don’t shoot in the dark Don’t shoot in the dark

    your cursors. Explain slow queries. Pro le
  14. Further Reading Further Reading – John Nunemaker – Derick Rethans

    – MongoDB manual MongoDB for Analytics Importing OpenStreetMap Data Use Cases
  15. GETTING YOUR DATA GETTING YOUR DATA INTO MONGODB INTO MONGODB

    And keeping it there… And keeping it there…
  16. Write Concern Write Concern 0 No write acknowledgement 1 Write

    acknowledgement from the primary (default) <integer> Write acknowledgement from n nodes "majority" Write acknowledgement from the majority of voting nodes (includes journaling) <string> Write acknowledgement to a node with the given tag set Additional wtimeout and journal options.
  17. save() save() is an anti-pattern is an anti-pattern document =

    db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.save(document); Query, modify, and overwrite.
  18. Overloaded vs. Explicit Methods Overloaded vs. Explicit Methods Our drops

    save() and de nes new operations for each legacy method mode: CRUD speci cation insert() update() remove() insertOne() insertMany() updateOne() updateMany() replaceOne() deleteOne() deleteMany()
  19. Use atomic operators when possible Use atomic operators when possible

    document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.replaceOne({ _id: "jmikola" }, document); or document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.updateOne( { _id: "jmikola" }, { $set: { friends: document["friends"] }} ); vs. db.users.updateOne( { _id: "jmikola" }, { $push: { friends: "pgodel" }} );
  20. Atomicity in MongoDB Atomicity in MongoDB No transactions for multi-document

    writes. Emulate transactions with . two-phase commits Single document updates are atomic. Can we query and update atomically?
  21. The The command command ndAndModify ndAndModify Atomically selects and modi

    es a document in one of three modes: findOneAndDelete() findOneAndReplace() findOneAndUpdate()
  22. Implementing a simple job queue Implementing a simple job queue

    // Insert a request to borrow a library book db.loans.insertOne({ _id: { borrower: "bjori", book: ObjectId("…") }, approved: false, pending: false, priority: 1 }); // Mark the highest priority request as pending request = db.loans.findOneAndUpdate( { pending: false }, { $set: { pending: true }}, { returnNewDocument: true, sort: { priority: -1 } } );
  23. If you remember only two things… If you remember only

    two things… Index your queries. Know your . working set
  24. Read Preference Read Preference "primary" Select the primary (default) "primaryPreferred"

    Select the primary if available; fall back to a secondary "secondary" Select a secondary "secondaryPreferred" Select a secondary if available; fall back to the primary "nearest" Select the node with least network latency may be used for more ne-grained selection Tag sets
  25. Read Concern Read Concern "local" Return the node’s most recent

    data, which may be rolled back (default) "majority" Return the node’s most recent data acknowledged by a majority of replica set "linearizable" Return the primary’s most recent data written with a "majority" write concern and acknowledged prior to the start of the query (i.e. data cannot be rolled back if journaled)
  26. Aggregation Framework Aggregation Framework { _id: "His Majesty's Dragon", subjects:

    ["Fantasy", "Historical"], published: ISODate("2006-03-28T00:00:00.000Z") } ▼ db.books.aggregate([ { $sort: { created: 1 }}, { $unwind: "$subjects" }, { $group: { _id: "$subjects", total: { $sum: 1 }, firstPublished: { $first: { $year: "$published" }} }} ]); ▼ { _id: "Fantasy", total: 6, "firstPublished": 2002 }, { _id: "Historical", total: 7, "firstPublished": 1974 }, { _id: "World Literature", total: 2, "firstPublished": 1995 }
  27. "But MongoDB doesn’t do joins!" "But MongoDB doesn’t do joins!"

    orders collection: { _id: 1, item: "abc", price: 12, quantity: 2 } { _id: 2, item: "jkl", price: 20, quantity: 1 } { _id: 3 } inventory collection: { _id: 1, sku: "abc", description: "product 1", instock: 120 } { _id: 2, sku: "def", description: "product 2", instock: 80 } { _id: 3, sku: "ijk", description: "product 3", instock: 60 } { _id: 4, sku: "jkl", description: "product 4", instock: 70 } { _id: 5, sku: null, description: "Incomplete" } { _id: 6 }
  28. "But MongoDB doesn’t do joins!" "But MongoDB doesn’t do joins!"

    db.orders.aggregate([ { $lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" }} ]); ▼ { _id: 1, item: "abc", price: 12, quantity: 2, inventory_docs : [ { _id: 1, sku: "abc", description: "product 1", instock: 120 } ] } Usable with unsharded collections in the same database
  29. Limiting Execution Time Limiting Execution Time query and command option

    maxTimeMS Do not rely on client-side socket timeouts
  30. Replication vs. Sharding Replication vs. Sharding Replication is the tool

    for data safety, high availability, and disaster recovery. Sharding is the tool for scaling a system.
  31. Making the most of replication Making the most of replication

    Always have an odd number of voting members. Nodes can be (e.g. purpose, location). tagged Take advantage of , , and . priority hidden delay Use tags to de ne . custom write concerns
  32. Sharding Sharding Each shard is a single mongod or replica

    set that stores a portion of the total data set. The shard key speci es one or more elds and determines the distribution of documents among the cluster's shards. MongoDB attempts to keep chunks evenly distributed among the shards.
  33. Select a good shard key Select a good shard key

    This is the most important decision. Once a collection is sharded, the shard key and its values are immutable!
  34. Zone Sharding Zone Sharding Zones are shard key ranges that

    can be associated with one or more shards. Isolate subset of data to speci c set of shards Enforce geographic distribution of data Route data based on hardware/performance
  35. Security Checklist Security Checklist Enable and enforce authentication Con gure

    role-based access control Con gure TLS/SSL for connections Limit network exposure Encrypt and protect database les Run MongoDB as a dedicated user Hardened server and network con guration
  36. Operations Checklist Operations Checklist Adjust replica set oplog size Enable

    journaling for writes Driver connection pooling (if applicable) Filesystem choice (XFS and NTFS preferred) Schedule and test backup processes Monitor database metrics and hardware Tweak operating system con guration
  37. ODMs are a great tool ODMs are a great tool

    Employ a real document model Framework and library integration Accelerate application development Abstract the database layer Watch out for that last one. Grok your DB and driver before abstracting it.
  38. The same principles apply The same principles apply Essentially the

    ORM can handle about 80- 90% of the mapping problems, but that last chunk always needs careful work by somebody who really understands how a relational database works. — Martin Fowler in OrmHate
  39. Be an informed user Be an informed user Active Record

    vs. Data Mapper Are changes written with atomic modi ers? How is replication and sharding integrated? How are references handled? How are embedded documents managed? Are commands beyond basic CRUD supported? Is the driver API available if needed?