Bulletproof MongoDB - Speaker Deck

Slide 1

Slide 1 text

Bulletproof MongoDB Jeremy Mikola @jmikola

Slide 2

Slide 2 text

A Little About Myself

Slide 3

Slide 3 text

Shots Fired

Slide 4

Slide 4 text

Some Topics to Cover ● Deployment models ● Driver internals ● Driver configuration ● Application concepts ● Retrying operations ● Transactions

Slide 5

Slide 5 text

Deployment Models High availability and/or horizontal scaling

Slide 6

Slide 6 text

Standalone mongod

Slide 7

Slide 7 text

Replication

Slide 8

Slide 8 text

Replication: Heartbeats

Slide 9

Slide 9 text

Replication: Failover

Slide 10

Slide 10 text

Sharding

Slide 11

Slide 11 text

Sharding

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Sharding: Routing

Slide 14

Slide 14 text

Driver Internals From connection strings to monitoring

Slide 15

Slide 15 text

Starting with a connection string mongodb://user:[email protected]:27017/?replicaSet=rs0 Connection String spec instructs how to parse this to yield: host identifiers, authentication credentials, connection options A mongodb+srv:// scheme indicates Initial DNS Seedlist Discovery, which may yield additional host identifiers Atlas uses this to provide shorter, more resilient connection strings

Slide 16

Slide 16 text

The first handshake Drivers issue an isMaster command on all newly established connections This uses OP_QUERY instead of OP_MSG for backwards compatibility Drivers can also provide client metadata The isMaster response reports the server’s min and max wire versions Used for protocol negotiation, feature discovery, detecting imposters No authentication or compression at this step

Slide 17

Slide 17 text

Authentication and compression After the handshake, drivers know what auth and compression protocols (if any) are supported by the server Drivers also advertise what compression they support in client metadata Auth spec defines command conversations for various auth mechanisms Compression spec defines OP_COMPRESSED as an envelope for other opcodes Compression is never used for certain commands (e.g. isMaster, auth)

Slide 18

Slide 18 text

Server discovery and monitoring SDAM defines structures for topology and server descriptions, a strategy for periodic monitoring, and a state machine for updating descriptions Drivers can infer initial topology type and servers from the connection string Unknown types address ambiguity (e.g. seed list without replicaSet option) isMaster response affirms a server’s type and may also update the topology

Slide 19

Slide 19 text

Single-threaded applications have many app servers, each with a pool of workers, each responsible for serving one request at a time Different application deployments App Server Cluster Cluster Multi-threaded and async applications have a limited number of app servers responsible for serving incoming requests concurrently App Server App Server App Server App Server App Server App Server

Slide 20

Slide 20 text

Different approaches to monitoring Multi-threaded and asynchronous drivers monitor the topology in a background “thread” and maintain a separate connection pool for application usage Monitoring thread does not share sockets with the connection pool (rationale) Single-threaded drivers share sockets for monitoring and application usage and perform monitoring during server selection (i.e. procuring a socket) Separate sockets would be redundant and/or costly Forgo connection pools for persistent sockets

Slide 21

Slide 21 text

Robustness improvements to monitoring Use connectTimeoutMS in lieu of socketTimeoutMS Retry isMaster once to quickly recover dropped sockets (rationale) Drivers internally invoke monitoring as needed (e.g. after “not master” error) Optimizations for single-threaded drivers Ignore inaccessible servers for cooldownMS (five seconds) Monitoring can be parallelized with async IO

Slide 22

Slide 22 text

Server selection Relies on SDAM for an up-to-date view of the topology and its servers Server Selection uses a loop to filter the topology to a server description Straightforward algorithm for multi-threaded and async drivers, but single-threaded drivers must invoke SDAM during the loop Random selection within a latency window if multiple servers are eligible A server description can be exchanged for a socket

Slide 23

Slide 23 text

How this fits in with PHP test->foo; $collection->drop(); $collection->insertOne(['hello' => 'world']); $cursor = $collection->find(); foreach ($cursor as $document) { var_dump($document); } object(MongoDB\Model\BSONDocument)#4 (1) { ["storage":"ArrayObject":private]=> array(2) { ["_id"]=> object(MongoDB\BSON\ObjectId)#18 (1) { ["oid"]=> string(24) "5c07e4822dbd7b79db17f192" } ["hello"]=> string(5) "world" } }

Slide 24

Slide 24 text

URI is parsed during Client construction class Client { public function __construct($uri = 'mongodb://127.0.0.1/', array $uriOptions = [], array $driverOptions = []) { $this->manager = new Manager($uri, $uriOptions, $driverOptions); } }

Slide 25

Slide 25 text

Server selection initializes SDAM class Collection { public function drop(array $options = []) { $server = $this->manager->selectServer(new ReadPreference('primary')); $operation = new DropCollection($this->databaseName, $this->collectionName, $options); return $operation->execute($server); } }

Slide 26

Slide 26 text

Driver Configuration You’ve got a few options

Slide 27

Slide 27 text

connectTimeoutMS is the timeout for initial socket connections and internal monitoring activity (defaults to 10 seconds) Consider tuning closer to the expected max latency of the database servers socketTimeoutMS pertains to application operations (defaults to 300 seconds) Comparable to PHP’s own default_socket_timeout. Be mindful of PHP’s max_execution_time. Configuring socket timeouts

Slide 28

Slide 28 text

heartbeatFrequencyMS is the monitoring interval (defaults to 60 seconds for single-threaded drivers; minimum is 500ms) socketCheckIntervalMS determines if a socket is considered inactive and must be re-checked before use (defaults to 5 seconds) Specifically for single-threaded drivers. Like retrying isMaster, this helps insulate applications from network errors. Configuring monitoring

Slide 29

Slide 29 text

Configuring server selection localThresholdMS defines the latency window for selecting an eligible server (defaults to 15ms) serverSelectionTimeoutMS is maximum amount of time to spend in the server selection loop (defaults to 30 seconds) serverSelectionTryOnce allows the application to “fail fast” Specifically for single-threaded drivers, where this defaults to true Disabling try-once behavior can improve resiliency at the expense of time

Slide 30

Slide 30 text

The argument for “fail fast” behavior

Slide 31

Slide 31 text

Application Concepts

Slide 32

Slide 32 text

Write Concern w controls acknowledgement behavior majority scales with the size of the replica set majority and journaling collectively guarantee durability and avoid data loss due to roll backs

Slide 33

Slide 33 text

Read Concern local and available are most permissive majority guarantees that the data has been acknowledged by a majority linearizable provides additional guarantees over majority to avoid returning stale data. Introduced in MongoDB 3.4 to satisfy the Jepsen test framework. Peter Bailis provides an accessible definition of linearizability snapshot may be used with majority-committed transactions to guarantee that reads within that transaction use a snapshot of majority-committed data

Slide 34

Slide 34 text

We haven’t got all day Operations can limit their execution time with maxTimeMS Server will track processing time and abort at the next interrupt point Socket timeouts can be expensive for both the client and server Write concerns can also use wtimeout to limit waiting time for replication Distinguish write concern errors from write errors

Slide 35

Slide 35 text

Causal Consistency Casual relationship when an operation logically depends on a preceding operation Causal consistency comes with several guarantees Read your own writes, monotonic reads/writes, and writes follow reads Satisfied by majority read and write concerns (when durability required) Applications can obtain causal consistency by using explicit sessions (examples)

Slide 36

Slide 36 text

Logical Sessions Sessions maintain cluster-wide state about the user and their operations In earlier versions of MongoDB, state was tied to connection objects Sessions live throughout a cluster and are not tied to connection objects Sessions can be created and used as an explicit option for database operations Group operations by passing the same session (e.g. causal consistency) By default, drivers will use an implicit session for single operations

Slide 37

Slide 37 text

Abort, Retry, Fail?

Slide 38

Slide 38 text

Don’t do this function retry(Closure $retry, $numRetries = 1) { if ($numRetries < 1) { return $retry(); } for ($i = 0; $i <= $numRetries; $i++) { try { return $retry(); } catch (MongoDB\Driver\Exception $e) { if ($i === $numRetries) { throw $e; } } } }

Slide 39

Slide 39 text

What’s the problem with retrying? Operations can change the state of the system Reads or writes may continue to run on the server after the client moves on. Write operations may not be idempotent and safe to execute multiple times. At best, retrying may waste time or consume resources At worst, retrying may inadvertently alter the data itself

Slide 40

Slide 40 text

Know your errors Any retry strategy should consider the kind of failure Transient network error, persistent outage, command error A retry attempt may be necessary to differentiate transience from persistence If a command response reports failure, retrying probably isn’t going to help

Slide 41

Slide 41 text

Retryable errors Any network error (e.g. socket timeout, dropped connection) Server response clearly indicating a transient error (e.g. “not master”) Most commonly caused by a replica set failover or step down

Slide 42

Slide 42 text

Retrying reads Queries that return a single document are always safe to retry Short-running queries that return a single batch of documents (i.e. will not leave behind a cursor) may be safe to retry Drivers will aim to retry most read commands in MongoDB 4.2 Requires server functionality to detect dropped sockets and abort operations getMore cannot be retried, since cursor iteration is forward only

Slide 43

Slide 43 text

Retrying writes Given that: ● Sessions are cluster-wide and exist beyond the scope of a connection ● Each write can be uniquely identified by a session and statement ID ● Drivers can rely on SDAM and server selection to re-select the primary Drivers can safely retry single-document writes (or bulks thereof) by resending the original command to the primary and trusting the server to Do the Right Thing™ If the write already executed, return the result we missed If the write never executed, do it now and return its result

Slide 44

Slide 44 text

Retrying wants a server selection loop Drivers invoke server selection for each retry attempt PHP’s default try-once behavior is unlikely to find a new primary after a failover, since replica set elections can take a few seconds (electionTimeoutMillis) Reducing election times for planned maintenance (SERVER-35624) Combining retryWrites=true with serverSelectionTryOnce=false can fully insulate an application’s writes from replica set failovers (https://git.io/fNbW0)

Slide 45

Slide 45 text

Taking advantage of retryable writes Add retryWrites=true to your connection string, disable try-once behavior (serverSelectionTryOnce=false), and tune serverSelectionTimeoutMS closer to expected election time (e.g. 15 seconds) Atlas already advises this, which helps with its automated maintenance Use the driver as you would normally Multi-document writes (e.g. updateMany) may still fail; you’re no worse off Single-document writes may still fail after one retry attempt

Slide 46

Slide 46 text

60% of the time… It works every time.

Slide 47

Slide 47 text

Transactions ACID compliance only took ten years…

Slide 48

Slide 48 text

Getting to this point MongoDB 3.0 introduced the WiredTiger storage engine MongoDB 3.2 made WiredTiger the default, introduced read concerns, and made significant improvements to the replication protocol MongoDB 3.6 introduced logical sessions, which was the underlying framework for causal consistency and retryable writes MongoDB 4.0 introduced multi-document transactions for replica sets by leveraging the logical session API and WiredTiger storage engine MongoDB 4.2 will add transaction support for sharded clusters

Slide 49

Slide 49 text

Transactions at a glance All operations within a transaction must route to the same member (i.e. primary) Read and write concerns are specified once, when starting a transaction While many operations are supported, there are some restrictions (e.g. DDL) Databases and collections must exist prior to starting the transaction Cursors created outside a transaction cannot be used within, and vice versa

Slide 50

Slide 50 text

Transactions in PHP startSession(); $session->startTransaction(); $client->test->foo->insertOne(['x' => 1], ['session' => $session]); $client->test->bar->insertOne(['y' => 2], ['session' => $session]); $session->commitTransaction();

Slide 51

Slide 51 text

Retrying transactions Drivers automatically retry commit and abort commands once Applications can retry commits additional times if desired Other read and write operations are not retried. Transactions and retryable writes (i.e. retryWrites=true) are mutually exclusive. Entire transactions may be retried if an operation fails with a transient error Use a majority write concern when retrying transactions for durability

Slide 52

Slide 52 text

Knowing when to retry transactions Any RuntimeException thrown by the driver or library may be associated with one or more error labels, which can be checked using the hasErrorLabel() method TransientTransactionError implies the entire transaction can be retried UnknownTransactionCommitResult implies a commit can be retried Applications can, and should, handle both cases (example)

Slide 53

Slide 53 text

(take a breath)

Slide 54

Slide 54 text

Resources and Further Reading MongoDB PHP driver documentation and specifications https://php.net/mongodb https://docs.mongodb.com/php-library/ https://github.com/mongodb/specifications MongoDB Manual (CRUD concepts, retryable writes, transactions) https://docs.mongodb.com/manual/core/crud/ https://docs.mongodb.com/manual/core/retryable-writes/ https://docs.mongodb.com/manual/core/transactions/ How to Write Resilient MongoDB Applications — A. Jesse Jiryu Davis https://emptysqua.re/blog/how-to-write-resilient-mongodb-applications/ It’s 10pm: Do You Know Where Your Writes Are? — Jeremy Mikola https://speakerdeck.com/jmikola/its-10pm-do-you-know-where-your-writes-are

Slide 55

Slide 55 text

Thanks! Jeremy Mikola @jmikola

Slide 56

Slide 56 text

Image Credits ● https://docs.mongodb.com/ ● https://twitter.com/dcousineau/status/613127314545737728 ● https://imgur.com/gallery/B58uJxA ● http://www.kollected.com/Mars-Rover-Curiosity ● https://scryfall.com/card/ugl/2/the-cheese-stands-alone ● https://www.reddit.com/r/thinkpad/comments/8lzftd/exploded_x220_wallpaper_with_a_slightly_different/ ● https://skitterphoto.com/photos/232/mixer-knobs ● https://www.gsb.stanford.edu/insights/end-traffic-jams-it-might-not-be-dream ● https://pixabay.com/en/blueprint-ruler-architecture-964630/ ● https://obrazki.elektroda.pl/5336891800_1520705708.jpg ● https://www.youtube.com/watch?v=IKiSPUc2Jck ● http://markinternational.info/coding-hd-wallpaper/222275975.html