$30 off During Our Annual Pro Sale. View Details »

DOs and DON’Ts of MongoDB

DOs and DON’Ts of MongoDB

Presented November 19, 2017 at Database Camp NYC: http://www.db.camp/2017

Presented July 14, 2017 at OpenWest: https://joind.in/talk/c8846

Presented January 14, 2016 at Ski PHP: https://joind.in/talk/view/16644

Presented May 9, 2014 at OpenWest: https://joind.in/talk/view/11191

Presented March 15, 2013 at Midwest PHP: https://joind.in/talk/view/10542

Presented October 10, 2013 at ZendCon: http://joind.in/talk/view/9101

Presented September 18, 2013 at Web & PHP Conference: https://joind.in/talk/view/8870

Presented February 9, 2013 at Sunshine PHP: https://joind.in/talk/view/8021

Reveal.js presentation published at: http://jmikola.github.com/slides/mongodb_dos_and_donts/

Jeremy Mikola

July 14, 2017
Tweet

More Decks by Jeremy Mikola

Other Decks in Programming

Transcript

  1. DO
    DOS
    S AND DON'T
    AND DON'TS
    S
    Jeremy Mikola
    jmikola

    View Slide

  2. TOPICS
    TOPICS
    Schema Design
    Write Operations
    Reading and Querying
    Replication and Sharding
    Deployment and Ops
    Object Document Mappers

    View Slide

  3. SCHEMA DESIGN
    SCHEMA DESIGN

    View Slide

  4. Making do without joins
    Making do without joins
    References, embedded objects, both?
    Don’t be afraid to denormalize.

    View Slide

  5. Data Locality
    Data Locality
    {
    _id: "jmikola",
    name: "Jeremy Mikola",
    friends: [ "bjori", "derickr" ]
    }
    vs.
    {
    _id: "jmikola",
    name: "Jeremy Mikola",
    friends: [
    { id: "bjori", name: "Hannes Magnusson" },
    { id: "derickr", name: "Derick Rethans" }
    ]
    }

    View Slide

  6. Store computed values for querying
    Store computed values for querying
    Counts or array lengths can be indexed and sorted.
    Easily updated with $inc and $set.

    View Slide

  7. Don’t create eld paths willy-nilly
    Don’t create eld paths willy-nilly
    > db.messages.findOne({}, { isReadByParticipant: 1 })
    {
    _id: ObjectId("4fce28482516ed983884b158"),
    isReadByParticipant: {
    "4fce05e42516ed9838756f17": false,
    "4fce05e42516ed9838756f18": true,
    "4fce05e42516ed9838756f19": true,
    "4fce05e42516ed9838756f1a": false,
    "4fce05e42516ed9838756f1b": false
    }
    }
    How can we index this?

    View Slide

  8. Multi-key indexing to the rescue
    Multi-key indexing to the rescue
    > db.messages.findOne({}, { unreadForParticipants: 1 })
    {
    _id: ObjectId("4fce28482516ed983884b158"),
    unreadForParticipants: [
    "4fce05e42516ed9838756f17",
    "4fce05e42516ed9838756f1a",
    "4fce05e42516ed9838756f1b"
    ]
    }
    A tidbit from FOSMessageBundle

    View Slide

  9. Multi-key indexing for EAV
    Multi-key indexing for EAV
    {
    _id: "product-1",
    size: "large",
    color: "blue"
    }
    vs.
    {
    _id: "product-1",
    attributes: [
    { k: "size", v: "large" },
    { k: "color", v: "blue" }
    ]
    }
    or
    {
    _id: "product-1",
    attributes: [
    "size=large",
    "color=blue"
    ]
    }

    View Slide

  10. Field paths and trees structures
    Field paths and trees structures
    > db.trees.findOne()
    {
    _id: ObjectId("5966500e01c7635140447bba"),
    name: "Conifer",
    subCategories: [
    {
    name: "Pine",
    subCategories: [
    { name: "Larch" },
    { name: "Spruce" },
    { name: "Douglas Fir" }
    ]
    },
    {
    name: "Cypress",
    subCategories: [ … ]
    }
    ]
    }
    One document contains the entire family.
    How can we query this?

    View Slide

  11. A better tree schema
    A better tree schema
    > db.trees.find()
    {
    _id: "Conifer"
    }
    {
    _id: "Pine",
    parent: "Conifer"
    }
    {
    _id: "Larch",
    parent: "Pine"
    }
    How can we query for an entire branch?

    View Slide

  12. But wait, there’s more!
    But wait, there’s more!
    > db.trees.find()
    {
    _id: "Conifer",
    path: ["Conifer"]
    }
    {
    _id: "Pine",
    path: ["Pine", "Conifer"]
    }
    {
    _id: "Larch",
    path: ["Larch", "Pine", "Conifer"]
    }
    Multi-key indexing on path allows branch querying
    Relationship querying possible with single- eld index(es)

    View Slide

  13. Don’t abuse schema exibility
    Don’t abuse schema exibility
    Create schemas that support your query patterns.
    Then create indexes for those queries.

    View Slide

  14. Make the most of your indexes
    Make the most of your indexes
    Kill 2+ birds with one stone
    Compound and multi-key indexes
    Mind your read/write ratio
    Ensure query selectivity

    View Slide

  15. Don’t shoot in the dark
    Don’t shoot in the dark
    your cursors.
    Explain
    slow queries.
    Pro le

    View Slide

  16. Further Reading
    Further Reading
    – John Nunemaker
    – Derick Rethans
    – MongoDB manual
    MongoDB for Analytics
    Importing OpenStreetMap Data
    Use Cases

    View Slide

  17. GETTING YOUR DATA
    GETTING YOUR DATA
    INTO MONGODB
    INTO MONGODB
    And keeping it there…
    And keeping it there…

    View Slide

  18. Write Concern
    Write Concern
    0 No write acknowledgement
    1 Write acknowledgement from the primary
    (default)
    Write acknowledgement from n nodes
    "majority" Write acknowledgement from the majority
    of voting nodes (includes journaling)
    Write acknowledgement to a node with the
    given tag set
    Additional wtimeout and journal options.

    View Slide

  19. save()
    save() is an anti-pattern
    is an anti-pattern
    document = db.users.findOne({ _id: "jmikola" });
    document["friends"].push("pgodel");
    db.users.save(document);
    Query, modify, and overwrite.

    View Slide

  20. Understanding
    Understanding save()
    save()’s syntactic sugar
    ’s syntactic sugar

    View Slide

  21. Overloaded vs. Explicit Methods
    Overloaded vs. Explicit Methods
    Our drops save() and de nes
    new operations for each legacy method mode:
    CRUD speci cation
    insert() update() remove()
    insertOne()
    insertMany()
    updateOne()
    updateMany()
    replaceOne()
    deleteOne()
    deleteMany()

    View Slide

  22. Use atomic operators when possible
    Use atomic operators when possible
    document = db.users.findOne({ _id: "jmikola" });
    document["friends"].push("pgodel");
    db.users.replaceOne({ _id: "jmikola" }, document);
    or
    document = db.users.findOne({ _id: "jmikola" });
    document["friends"].push("pgodel");
    db.users.updateOne(
    { _id: "jmikola" },
    { $set: { friends: document["friends"] }}
    );
    vs.
    db.users.updateOne(
    { _id: "jmikola" },
    { $push: { friends: "pgodel" }}
    );

    View Slide

  23. Atomicity in MongoDB
    Atomicity in MongoDB
    No transactions for multi-document writes.
    Emulate transactions with .
    two-phase commits
    Single document updates are atomic.
    Can we query and update atomically?

    View Slide

  24. The
    The command
    command
    ndAndModify
    ndAndModify
    Atomically selects and modi es a
    document in one of three modes:
    findOneAndDelete()
    findOneAndReplace()
    findOneAndUpdate()

    View Slide

  25. Implementing a simple job queue
    Implementing a simple job queue
    // Insert a request to borrow a library book
    db.loans.insertOne({
    _id: { borrower: "bjori", book: ObjectId("…") },
    approved: false,
    pending: false,
    priority: 1
    });
    // Mark the highest priority request as pending
    request = db.loans.findOneAndUpdate(
    { pending: false },
    { $set: { pending: true }},
    {
    returnNewDocument: true,
    sort: { priority: -1 }
    }
    );

    View Slide

  26. Further Reading
    Further Reading
    How To Write Resilient
    MongoDB Applications
    bit.ly/resilient-applications

    View Slide

  27. READING AND QUERYING
    READING AND QUERYING

    View Slide

  28. If you remember only two things…
    If you remember only two things…
    Index your queries.
    Know your .
    working set

    View Slide

  29. Read Preference
    Read Preference
    "primary" Select the primary (default)
    "primaryPreferred" Select the primary if available;
    fall back to a secondary
    "secondary" Select a secondary
    "secondaryPreferred" Select a secondary if available;
    fall back to the primary
    "nearest" Select the node with least
    network latency
    may be used for more ne-grained selection
    Tag sets

    View Slide

  30. Read Concern
    Read Concern
    "local" Return the node’s most recent data,
    which may be rolled back (default)
    "majority" Return the node’s most recent data
    acknowledged by a majority of replica
    set
    "linearizable" Return the primary’s most recent data
    written with a "majority" write
    concern and acknowledged prior to the
    start of the query (i.e. data cannot be
    rolled back if journaled)

    View Slide

  31. JavaScript Evaluation
    JavaScript Evaluation
    and .
    eval $where
    Would you use in JavaScript?
    eval()

    View Slide

  32. MapReduce
    MapReduce
    We’ll make an allowance for JavaScript here.
    But try the rst.
    aggregation framework

    View Slide

  33. Aggregation Framework
    Aggregation Framework

    View Slide

  34. Aggregation Framework
    Aggregation Framework
    {
    _id: "His Majesty's Dragon",
    subjects: ["Fantasy", "Historical"],
    published: ISODate("2006-03-28T00:00:00.000Z")
    }

    db.books.aggregate([
    { $sort: { created: 1 }},
    { $unwind: "$subjects" },
    { $group: {
    _id: "$subjects",
    total: { $sum: 1 },
    firstPublished: { $first: { $year: "$published" }}
    }}
    ]);

    { _id: "Fantasy", total: 6, "firstPublished": 2002 },
    { _id: "Historical", total: 7, "firstPublished": 1974 },
    { _id: "World Literature", total: 2, "firstPublished": 1995 }

    View Slide

  35. "But MongoDB doesn’t do joins!"
    "But MongoDB doesn’t do joins!"
    orders collection:
    { _id: 1, item: "abc", price: 12, quantity: 2 }
    { _id: 2, item: "jkl", price: 20, quantity: 1 }
    { _id: 3 }
    inventory collection:
    { _id: 1, sku: "abc", description: "product 1", instock: 120 }
    { _id: 2, sku: "def", description: "product 2", instock: 80 }
    { _id: 3, sku: "ijk", description: "product 3", instock: 60 }
    { _id: 4, sku: "jkl", description: "product 4", instock: 70 }
    { _id: 5, sku: null, description: "Incomplete" }
    { _id: 6 }

    View Slide

  36. "But MongoDB doesn’t do joins!"
    "But MongoDB doesn’t do joins!"
    db.orders.aggregate([
    { $lookup: {
    from: "inventory",
    localField: "item",
    foreignField: "sku",
    as: "inventory_docs"
    }}
    ]);

    {
    _id: 1,
    item: "abc",
    price: 12,
    quantity: 2,
    inventory_docs : [
    { _id: 1, sku: "abc", description: "product 1", instock: 120 }
    ]
    }
    Usable with unsharded collections in the same database

    View Slide

  37. Limiting Execution Time
    Limiting Execution Time
    query and command option
    maxTimeMS
    Do not rely on client-side socket timeouts

    View Slide

  38. REPLICATION AND SHARDING
    REPLICATION AND SHARDING

    View Slide

  39. Replication vs. Sharding
    Replication vs. Sharding
    Replication is the tool for data safety,
    high availability, and disaster recovery.
    Sharding is the tool for scaling a system.

    View Slide

  40. Replication
    Replication

    View Slide

  41. What does replication do for us?
    What does replication do for us?

    View Slide

  42. Replication provides failover recovery
    Replication provides failover recovery

    View Slide

  43. Making the most of replication
    Making the most of replication
    Always have an odd number of voting members.
    Nodes can be (e.g. purpose, location).
    tagged
    Take advantage of , , and .
    priority hidden delay
    Use tags to de ne .
    custom write concerns

    View Slide

  44. Sharding
    Sharding
    Not shown: a very tedious deployment process

    View Slide

  45. What does sharding do for us?
    What does sharding do for us?

    View Slide

  46. Sharding provides horizontal scalability
    Sharding provides horizontal scalability

    View Slide

  47. Sharding
    Sharding
    Each shard is a single mongod or replica set
    that stores a portion of the total data set.
    The shard key speci es one or more elds and determines
    the distribution of documents among the cluster's shards.
    MongoDB attempts to keep chunks
    evenly distributed among the shards.

    View Slide

  48. Select a good shard key
    Select a good shard key
    This is the most important decision.
    Once a collection is sharded,
    the shard key and its values are immutable!

    View Slide

  49. Ranged Sharding
    Ranged Sharding

    View Slide

  50. Shard Key Distribution
    Shard Key Distribution

    View Slide

  51. Hashed Sharding
    Hashed Sharding

    View Slide

  52. Right-balanced Access

    View Slide

  53. Random Access

    View Slide

  54. Segmented Access

    View Slide

  55. Zone Sharding
    Zone Sharding
    Zones are shard key ranges that can be
    associated with one or more shards.
    Isolate subset of data to speci c set of shards
    Enforce geographic distribution of data
    Route data based on hardware/performance

    View Slide

  56. Zone Sharding
    Zone Sharding

    View Slide

  57. DEPLOYMENT AND OPS
    DEPLOYMENT AND OPS

    View Slide

  58. Security Checklist
    Security Checklist
    Enable and enforce authentication
    Con gure role-based access control
    Con gure TLS/SSL for connections
    Limit network exposure
    Encrypt and protect database les
    Run MongoDB as a dedicated user
    Hardened server and network con guration

    View Slide

  59. Operations Checklist
    Operations Checklist
    Adjust replica set oplog size
    Enable journaling for writes
    Driver connection pooling (if applicable)
    Filesystem choice (XFS and NTFS preferred)
    Schedule and test backup processes
    Monitor database metrics and hardware
    Tweak operating system con guration

    View Slide

  60. Monitoring, Backup, Automation
    Monitoring, Backup, Automation
    mongodb.com/cloud/cloud-manager

    View Slide

  61. MongoDB as a Service
    MongoDB as a Service
    atlas.mongodb.com

    View Slide

  62. OBJECT DOCUMENT MAPPERS
    OBJECT DOCUMENT MAPPERS

    View Slide

  63. ODMs are a great tool
    ODMs are a great tool
    Employ a real document model
    Framework and library integration
    Accelerate application development
    Abstract the database layer
    Watch out for that last one.
    Grok your DB and driver before abstracting it.

    View Slide

  64. The same principles apply
    The same principles apply
    Essentially the ORM can handle about 80-
    90% of the mapping problems, but that last
    chunk always needs careful work by
    somebody who really understands how a
    relational database works.
    — Martin Fowler in OrmHate

    View Slide

  65. Be an informed user
    Be an informed user
    Active Record vs. Data Mapper
    Are changes written with atomic modi ers?
    How is replication and sharding integrated?
    How are references handled?
    How are embedded documents managed?
    Are commands beyond basic CRUD supported?
    Is the driver API available if needed?

    View Slide

  66. Thanks!
    Thanks!
    Questions?
    Questions?

    View Slide

  67. Photo Credits
    Photo Credits
    http://dilbert.com/strips/comic/1996-02-28

    View Slide