DOs and DON’Ts of MongoDB

DOs and DON’Ts of MongoDB

Presented November 19, 2017 at Database Camp NYC: http://www.db.camp/2017

Presented July 14, 2017 at OpenWest: https://joind.in/talk/c8846

Presented January 14, 2016 at Ski PHP: https://joind.in/talk/view/16644

Presented May 9, 2014 at OpenWest: https://joind.in/talk/view/11191

Presented March 15, 2013 at Midwest PHP: https://joind.in/talk/view/10542

Presented October 10, 2013 at ZendCon: http://joind.in/talk/view/9101

Presented September 18, 2013 at Web & PHP Conference: https://joind.in/talk/view/8870

Presented February 9, 2013 at Sunshine PHP: https://joind.in/talk/view/8021

Reveal.js presentation published at: http://jmikola.github.com/slides/mongodb_dos_and_donts/

F23700b51dc0c196c1dc02f84aeeecdf?s=128

Jeremy Mikola

July 14, 2017
Tweet

Transcript

  1. DO DOS S AND DON'T AND DON'TS S Jeremy Mikola

    jmikola
  2. TOPICS TOPICS Schema Design Write Operations Reading and Querying Replication

    and Sharding Deployment and Ops Object Document Mappers
  3. SCHEMA DESIGN SCHEMA DESIGN

  4. Making do without joins Making do without joins References, embedded

    objects, both? Don’t be afraid to denormalize.
  5. Data Locality Data Locality { _id: "jmikola", name: "Jeremy Mikola",

    friends: [ "bjori", "derickr" ] } vs. { _id: "jmikola", name: "Jeremy Mikola", friends: [ { id: "bjori", name: "Hannes Magnusson" }, { id: "derickr", name: "Derick Rethans" } ] }
  6. Store computed values for querying Store computed values for querying

    Counts or array lengths can be indexed and sorted. Easily updated with $inc and $set.
  7. Don’t create eld paths willy-nilly Don’t create eld paths willy-nilly

    > db.messages.findOne({}, { isReadByParticipant: 1 }) { _id: ObjectId("4fce28482516ed983884b158"), isReadByParticipant: { "4fce05e42516ed9838756f17": false, "4fce05e42516ed9838756f18": true, "4fce05e42516ed9838756f19": true, "4fce05e42516ed9838756f1a": false, "4fce05e42516ed9838756f1b": false } } How can we index this?
  8. Multi-key indexing to the rescue Multi-key indexing to the rescue

    > db.messages.findOne({}, { unreadForParticipants: 1 }) { _id: ObjectId("4fce28482516ed983884b158"), unreadForParticipants: [ "4fce05e42516ed9838756f17", "4fce05e42516ed9838756f1a", "4fce05e42516ed9838756f1b" ] } A tidbit from FOSMessageBundle
  9. Multi-key indexing for EAV Multi-key indexing for EAV { _id:

    "product-1", size: "large", color: "blue" } vs. { _id: "product-1", attributes: [ { k: "size", v: "large" }, { k: "color", v: "blue" } ] } or { _id: "product-1", attributes: [ "size=large", "color=blue" ] }
  10. Field paths and trees structures Field paths and trees structures

    > db.trees.findOne() { _id: ObjectId("5966500e01c7635140447bba"), name: "Conifer", subCategories: [ { name: "Pine", subCategories: [ { name: "Larch" }, { name: "Spruce" }, { name: "Douglas Fir" } ] }, { name: "Cypress", subCategories: [ … ] } ] } One document contains the entire family. How can we query this?
  11. A better tree schema A better tree schema > db.trees.find()

    { _id: "Conifer" } { _id: "Pine", parent: "Conifer" } { _id: "Larch", parent: "Pine" } How can we query for an entire branch?
  12. But wait, there’s more! But wait, there’s more! > db.trees.find()

    { _id: "Conifer", path: ["Conifer"] } { _id: "Pine", path: ["Pine", "Conifer"] } { _id: "Larch", path: ["Larch", "Pine", "Conifer"] } Multi-key indexing on path allows branch querying Relationship querying possible with single- eld index(es)
  13. Don’t abuse schema exibility Don’t abuse schema exibility Create schemas

    that support your query patterns. Then create indexes for those queries.
  14. Make the most of your indexes Make the most of

    your indexes Kill 2+ birds with one stone Compound and multi-key indexes Mind your read/write ratio Ensure query selectivity
  15. Don’t shoot in the dark Don’t shoot in the dark

    your cursors. Explain slow queries. Pro le
  16. Further Reading Further Reading – John Nunemaker – Derick Rethans

    – MongoDB manual MongoDB for Analytics Importing OpenStreetMap Data Use Cases
  17. GETTING YOUR DATA GETTING YOUR DATA INTO MONGODB INTO MONGODB

    And keeping it there… And keeping it there…
  18. Write Concern Write Concern 0 No write acknowledgement 1 Write

    acknowledgement from the primary (default) <integer> Write acknowledgement from n nodes "majority" Write acknowledgement from the majority of voting nodes (includes journaling) <string> Write acknowledgement to a node with the given tag set Additional wtimeout and journal options.
  19. save() save() is an anti-pattern is an anti-pattern document =

    db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.save(document); Query, modify, and overwrite.
  20. Understanding Understanding save() save()’s syntactic sugar ’s syntactic sugar

  21. Overloaded vs. Explicit Methods Overloaded vs. Explicit Methods Our drops

    save() and de nes new operations for each legacy method mode: CRUD speci cation insert() update() remove() insertOne() insertMany() updateOne() updateMany() replaceOne() deleteOne() deleteMany()
  22. Use atomic operators when possible Use atomic operators when possible

    document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.replaceOne({ _id: "jmikola" }, document); or document = db.users.findOne({ _id: "jmikola" }); document["friends"].push("pgodel"); db.users.updateOne( { _id: "jmikola" }, { $set: { friends: document["friends"] }} ); vs. db.users.updateOne( { _id: "jmikola" }, { $push: { friends: "pgodel" }} );
  23. Atomicity in MongoDB Atomicity in MongoDB No transactions for multi-document

    writes. Emulate transactions with . two-phase commits Single document updates are atomic. Can we query and update atomically?
  24. The The command command ndAndModify ndAndModify Atomically selects and modi

    es a document in one of three modes: findOneAndDelete() findOneAndReplace() findOneAndUpdate()
  25. Implementing a simple job queue Implementing a simple job queue

    // Insert a request to borrow a library book db.loans.insertOne({ _id: { borrower: "bjori", book: ObjectId("…") }, approved: false, pending: false, priority: 1 }); // Mark the highest priority request as pending request = db.loans.findOneAndUpdate( { pending: false }, { $set: { pending: true }}, { returnNewDocument: true, sort: { priority: -1 } } );
  26. Further Reading Further Reading How To Write Resilient MongoDB Applications

    bit.ly/resilient-applications
  27. READING AND QUERYING READING AND QUERYING

  28. If you remember only two things… If you remember only

    two things… Index your queries. Know your . working set
  29. Read Preference Read Preference "primary" Select the primary (default) "primaryPreferred"

    Select the primary if available; fall back to a secondary "secondary" Select a secondary "secondaryPreferred" Select a secondary if available; fall back to the primary "nearest" Select the node with least network latency may be used for more ne-grained selection Tag sets
  30. Read Concern Read Concern "local" Return the node’s most recent

    data, which may be rolled back (default) "majority" Return the node’s most recent data acknowledged by a majority of replica set "linearizable" Return the primary’s most recent data written with a "majority" write concern and acknowledged prior to the start of the query (i.e. data cannot be rolled back if journaled)
  31. JavaScript Evaluation JavaScript Evaluation and . eval $where Would you

    use in JavaScript? eval()
  32. MapReduce MapReduce We’ll make an allowance for JavaScript here. But

    try the rst. aggregation framework
  33. Aggregation Framework Aggregation Framework

  34. Aggregation Framework Aggregation Framework { _id: "His Majesty's Dragon", subjects:

    ["Fantasy", "Historical"], published: ISODate("2006-03-28T00:00:00.000Z") } ▼ db.books.aggregate([ { $sort: { created: 1 }}, { $unwind: "$subjects" }, { $group: { _id: "$subjects", total: { $sum: 1 }, firstPublished: { $first: { $year: "$published" }} }} ]); ▼ { _id: "Fantasy", total: 6, "firstPublished": 2002 }, { _id: "Historical", total: 7, "firstPublished": 1974 }, { _id: "World Literature", total: 2, "firstPublished": 1995 }
  35. "But MongoDB doesn’t do joins!" "But MongoDB doesn’t do joins!"

    orders collection: { _id: 1, item: "abc", price: 12, quantity: 2 } { _id: 2, item: "jkl", price: 20, quantity: 1 } { _id: 3 } inventory collection: { _id: 1, sku: "abc", description: "product 1", instock: 120 } { _id: 2, sku: "def", description: "product 2", instock: 80 } { _id: 3, sku: "ijk", description: "product 3", instock: 60 } { _id: 4, sku: "jkl", description: "product 4", instock: 70 } { _id: 5, sku: null, description: "Incomplete" } { _id: 6 }
  36. "But MongoDB doesn’t do joins!" "But MongoDB doesn’t do joins!"

    db.orders.aggregate([ { $lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" }} ]); ▼ { _id: 1, item: "abc", price: 12, quantity: 2, inventory_docs : [ { _id: 1, sku: "abc", description: "product 1", instock: 120 } ] } Usable with unsharded collections in the same database
  37. Limiting Execution Time Limiting Execution Time query and command option

    maxTimeMS Do not rely on client-side socket timeouts
  38. REPLICATION AND SHARDING REPLICATION AND SHARDING

  39. Replication vs. Sharding Replication vs. Sharding Replication is the tool

    for data safety, high availability, and disaster recovery. Sharding is the tool for scaling a system.
  40. Replication Replication

  41. What does replication do for us? What does replication do

    for us?
  42. Replication provides failover recovery Replication provides failover recovery

  43. Making the most of replication Making the most of replication

    Always have an odd number of voting members. Nodes can be (e.g. purpose, location). tagged Take advantage of , , and . priority hidden delay Use tags to de ne . custom write concerns
  44. Sharding Sharding Not shown: a very tedious deployment process

  45. What does sharding do for us? What does sharding do

    for us?
  46. Sharding provides horizontal scalability Sharding provides horizontal scalability

  47. Sharding Sharding Each shard is a single mongod or replica

    set that stores a portion of the total data set. The shard key speci es one or more elds and determines the distribution of documents among the cluster's shards. MongoDB attempts to keep chunks evenly distributed among the shards.
  48. Select a good shard key Select a good shard key

    This is the most important decision. Once a collection is sharded, the shard key and its values are immutable!
  49. Ranged Sharding Ranged Sharding

  50. Shard Key Distribution Shard Key Distribution

  51. Hashed Sharding Hashed Sharding

  52. Right-balanced Access

  53. Random Access

  54. Segmented Access

  55. Zone Sharding Zone Sharding Zones are shard key ranges that

    can be associated with one or more shards. Isolate subset of data to speci c set of shards Enforce geographic distribution of data Route data based on hardware/performance
  56. Zone Sharding Zone Sharding

  57. DEPLOYMENT AND OPS DEPLOYMENT AND OPS

  58. Security Checklist Security Checklist Enable and enforce authentication Con gure

    role-based access control Con gure TLS/SSL for connections Limit network exposure Encrypt and protect database les Run MongoDB as a dedicated user Hardened server and network con guration
  59. Operations Checklist Operations Checklist Adjust replica set oplog size Enable

    journaling for writes Driver connection pooling (if applicable) Filesystem choice (XFS and NTFS preferred) Schedule and test backup processes Monitor database metrics and hardware Tweak operating system con guration
  60. Monitoring, Backup, Automation Monitoring, Backup, Automation mongodb.com/cloud/cloud-manager

  61. MongoDB as a Service MongoDB as a Service atlas.mongodb.com

  62. OBJECT DOCUMENT MAPPERS OBJECT DOCUMENT MAPPERS

  63. ODMs are a great tool ODMs are a great tool

    Employ a real document model Framework and library integration Accelerate application development Abstract the database layer Watch out for that last one. Grok your DB and driver before abstracting it.
  64. The same principles apply The same principles apply Essentially the

    ORM can handle about 80- 90% of the mapping problems, but that last chunk always needs careful work by somebody who really understands how a relational database works. — Martin Fowler in OrmHate
  65. Be an informed user Be an informed user Active Record

    vs. Data Mapper Are changes written with atomic modi ers? How is replication and sharding integrated? How are references handled? How are embedded documents managed? Are commands beyond basic CRUD supported? Is the driver API available if needed?
  66. Thanks! Thanks! Questions? Questions?

  67. Photo Credits Photo Credits http://dilbert.com/strips/comic/1996-02-28