Bulletproof MongoDB

Bulletproof MongoDB Jeremy Mikola @jmikola

A Little About Myself

Shots Fired

Some Topics to Cover • Deployment models • Driver internals
• Driver configuration • Application concepts • Retrying operations • Transactions

Deployment Models High availability and/or horizontal scaling

Standalone mongod

Replication

Replication: Heartbeats

Replication: Failover

Sharding

Sharding: Routing

Driver Internals From connection strings to monitoring

Starting with a connection string mongodb://user:[email protected]:27017/?replicaSet=rs0 Connection String spec instructs
how to parse this to yield: host identifiers, authentication credentials, connection options A mongodb+srv:// scheme indicates Initial DNS Seedlist Discovery, which may yield additional host identifiers Atlas uses this to provide shorter, more resilient connection strings

The first handshake Drivers issue an isMaster command on all
newly established connections This uses OP_QUERY instead of OP_MSG for backwards compatibility Drivers can also provide client metadata The isMaster response reports the server’s min and max wire versions Used for protocol negotiation, feature discovery, detecting imposters No authentication or compression at this step

Authentication and compression After the handshake, drivers know what auth
and compression protocols (if any) are supported by the server Drivers also advertise what compression they support in client metadata Auth spec defines command conversations for various auth mechanisms Compression spec defines OP_COMPRESSED as an envelope for other opcodes Compression is never used for certain commands (e.g. isMaster, auth)

Server discovery and monitoring SDAM defines structures for topology and
server descriptions, a strategy for periodic monitoring, and a state machine for updating descriptions Drivers can infer initial topology type and servers from the connection string Unknown types address ambiguity (e.g. seed list without replicaSet option) isMaster response affirms a server’s type and may also update the topology

Single-threaded applications have many app servers, each with a pool
of workers, each responsible for serving one request at a time Different application deployments App Server Cluster Cluster Multi-threaded and async applications have a limited number of app servers responsible for serving incoming requests concurrently App Server App Server App Server App Server App Server App Server

Different approaches to monitoring Multi-threaded and asynchronous drivers monitor the
topology in a background “thread” and maintain a separate connection pool for application usage Monitoring thread does not share sockets with the connection pool (rationale) Single-threaded drivers share sockets for monitoring and application usage and perform monitoring during server selection (i.e. procuring a socket) Separate sockets would be redundant and/or costly Forgo connection pools for persistent sockets

Robustness improvements to monitoring Use connectTimeoutMS in lieu of socketTimeoutMS
Retry isMaster once to quickly recover dropped sockets (rationale) Drivers internally invoke monitoring as needed (e.g. after “not master” error) Optimizations for single-threaded drivers Ignore inaccessible servers for cooldownMS (five seconds) Monitoring can be parallelized with async IO

Server selection Relies on SDAM for an up-to-date view of
the topology and its servers Server Selection uses a loop to filter the topology to a server description Straightforward algorithm for multi-threaded and async drivers, but single-threaded drivers must invoke SDAM during the loop Random selection within a latency window if multiple servers are eligible A server description can be exchanged for a socket

How this fits in with PHP <?php require_once 'vendor/autoload.php'; $client
= new MongoDB\Client; $collection = $client->test->foo; $collection->drop(); $collection->insertOne(['hello' => 'world']); $cursor = $collection->find(); foreach ($cursor as $document) { var_dump($document); } object(MongoDB\Model\BSONDocument)#4 (1) { ["storage":"ArrayObject":private]=> array(2) { ["_id"]=> object(MongoDB\BSON\ObjectId)#18 (1) { ["oid"]=> string(24) "5c07e4822dbd7b79db17f192" } ["hello"]=> string(5) "world" } }

URI is parsed during Client construction class Client { public
function __construct($uri = 'mongodb://127.0.0.1/', array $uriOptions = [], array $driverOptions = []) { $this->manager = new Manager($uri, $uriOptions, $driverOptions); } }

Server selection initializes SDAM class Collection { public function drop(array
$options = []) { $server = $this->manager->selectServer(new ReadPreference('primary')); $operation = new DropCollection($this->databaseName, $this->collectionName, $options); return $operation->execute($server); } }

Driver Configuration You’ve got a few options

connectTimeoutMS is the timeout for initial socket connections and internal
monitoring activity (defaults to 10 seconds) Consider tuning closer to the expected max latency of the database servers socketTimeoutMS pertains to application operations (defaults to 300 seconds) Comparable to PHP’s own default_socket_timeout. Be mindful of PHP’s max_execution_time. Configuring socket timeouts

heartbeatFrequencyMS is the monitoring interval (defaults to 60 seconds for
single-threaded drivers; minimum is 500ms) socketCheckIntervalMS determines if a socket is considered inactive and must be re-checked before use (defaults to 5 seconds) Specifically for single-threaded drivers. Like retrying isMaster, this helps insulate applications from network errors. Configuring monitoring

Configuring server selection localThresholdMS defines the latency window for selecting
an eligible server (defaults to 15ms) serverSelectionTimeoutMS is maximum amount of time to spend in the server selection loop (defaults to 30 seconds) serverSelectionTryOnce allows the application to “fail fast” Specifically for single-threaded drivers, where this defaults to true Disabling try-once behavior can improve resiliency at the expense of time

The argument for “fail fast” behavior

Application Concepts

Write Concern w controls acknowledgement behavior majority scales with the
size of the replica set majority and journaling collectively guarantee durability and avoid data loss due to roll backs

Read Concern local and available are most permissive majority guarantees
that the data has been acknowledged by a majority linearizable provides additional guarantees over majority to avoid returning stale data. Introduced in MongoDB 3.4 to satisfy the Jepsen test framework. Peter Bailis provides an accessible definition of linearizability snapshot may be used with majority-committed transactions to guarantee that reads within that transaction use a snapshot of majority-committed data

We haven’t got all day Operations can limit their execution
time with maxTimeMS Server will track processing time and abort at the next interrupt point Socket timeouts can be expensive for both the client and server Write concerns can also use wtimeout to limit waiting time for replication Distinguish write concern errors from write errors

Causal Consistency Casual relationship when an operation logically depends on
a preceding operation Causal consistency comes with several guarantees Read your own writes, monotonic reads/writes, and writes follow reads Satisfied by majority read and write concerns (when durability required) Applications can obtain causal consistency by using explicit sessions (examples)

Logical Sessions Sessions maintain cluster-wide state about the user and
their operations In earlier versions of MongoDB, state was tied to connection objects Sessions live throughout a cluster and are not tied to connection objects Sessions can be created and used as an explicit option for database operations Group operations by passing the same session (e.g. causal consistency) By default, drivers will use an implicit session for single operations

Abort, Retry, Fail?

Don’t do this function retry(Closure $retry, $numRetries = 1) {
if ($numRetries < 1) { return $retry(); } for ($i = 0; $i <= $numRetries; $i++) { try { return $retry(); } catch (MongoDB\Driver\Exception $e) { if ($i === $numRetries) { throw $e; } } } }

What’s the problem with retrying? Operations can change the state
of the system Reads or writes may continue to run on the server after the client moves on. Write operations may not be idempotent and safe to execute multiple times. At best, retrying may waste time or consume resources At worst, retrying may inadvertently alter the data itself

Know your errors Any retry strategy should consider the kind
of failure Transient network error, persistent outage, command error A retry attempt may be necessary to differentiate transience from persistence If a command response reports failure, retrying probably isn’t going to help

Retryable errors Any network error (e.g. socket timeout, dropped connection)
Server response clearly indicating a transient error (e.g. “not master”) Most commonly caused by a replica set failover or step down

Retrying reads Queries that return a single document are always
safe to retry Short-running queries that return a single batch of documents (i.e. will not leave behind a cursor) may be safe to retry Drivers will aim to retry most read commands in MongoDB 4.2 Requires server functionality to detect dropped sockets and abort operations getMore cannot be retried, since cursor iteration is forward only

Retrying writes Given that: • Sessions are cluster-wide and exist
beyond the scope of a connection • Each write can be uniquely identified by a session and statement ID • Drivers can rely on SDAM and server selection to re-select the primary Drivers can safely retry single-document writes (or bulks thereof) by resending the original command to the primary and trusting the server to Do the Right Thing™ If the write already executed, return the result we missed If the write never executed, do it now and return its result

Retrying wants a server selection loop Drivers invoke server selection
for each retry attempt PHP’s default try-once behavior is unlikely to find a new primary after a failover, since replica set elections can take a few seconds (electionTimeoutMillis) Reducing election times for planned maintenance (SERVER-35624) Combining retryWrites=true with serverSelectionTryOnce=false can fully insulate an application’s writes from replica set failovers (https://git.io/fNbW0)

Taking advantage of retryable writes Add retryWrites=true to your connection
string, disable try-once behavior (serverSelectionTryOnce=false), and tune serverSelectionTimeoutMS closer to expected election time (e.g. 15 seconds) Atlas already advises this, which helps with its automated maintenance Use the driver as you would normally Multi-document writes (e.g. updateMany) may still fail; you’re no worse off Single-document writes may still fail after one retry attempt

60% of the time… It works every time.

Transactions ACID compliance only took ten years…

Getting to this point MongoDB 3.0 introduced the WiredTiger storage
engine MongoDB 3.2 made WiredTiger the default, introduced read concerns, and made significant improvements to the replication protocol MongoDB 3.6 introduced logical sessions, which was the underlying framework for causal consistency and retryable writes MongoDB 4.0 introduced multi-document transactions for replica sets by leveraging the logical session API and WiredTiger storage engine MongoDB 4.2 will add transaction support for sharded clusters

Transactions at a glance All operations within a transaction must
route to the same member (i.e. primary) Read and write concerns are specified once, when starting a transaction While many operations are supported, there are some restrictions (e.g. DDL) Databases and collections must exist prior to starting the transaction Cursors created outside a transaction cannot be used within, and vice versa

Transactions in PHP <?php require_once 'vendor/autoload.php'; $client = new MongoDB\Client;
$session = $client->startSession(); $session->startTransaction(); $client->test->foo->insertOne(['x' => 1], ['session' => $session]); $client->test->bar->insertOne(['y' => 2], ['session' => $session]); $session->commitTransaction();

Retrying transactions Drivers automatically retry commit and abort commands once
Applications can retry commits additional times if desired Other read and write operations are not retried. Transactions and retryable writes (i.e. retryWrites=true) are mutually exclusive. Entire transactions may be retried if an operation fails with a transient error Use a majority write concern when retrying transactions for durability

Knowing when to retry transactions Any RuntimeException thrown by the
driver or library may be associated with one or more error labels, which can be checked using the hasErrorLabel() method TransientTransactionError implies the entire transaction can be retried UnknownTransactionCommitResult implies a commit can be retried Applications can, and should, handle both cases (example)

(take a breath)

Resources and Further Reading MongoDB PHP driver documentation and specifications
https://php.net/mongodb https://docs.mongodb.com/php-library/ https://github.com/mongodb/specifications MongoDB Manual (CRUD concepts, retryable writes, transactions) https://docs.mongodb.com/manual/core/crud/ https://docs.mongodb.com/manual/core/retryable-writes/ https://docs.mongodb.com/manual/core/transactions/ How to Write Resilient MongoDB Applications — A. Jesse Jiryu Davis https://emptysqua.re/blog/how-to-write-resilient-mongodb-applications/ It’s 10pm: Do You Know Where Your Writes Are? — Jeremy Mikola https://speakerdeck.com/jmikola/its-10pm-do-you-know-where-your-writes-are

Thanks! Jeremy Mikola @jmikola

Image Credits • https://docs.mongodb.com/ • https://twitter.com/dcousineau/status/613127314545737728 • https://imgur.com/gallery/B58uJxA • http://www.kollected.com/Mars-Rover-Curiosity
• https://scryfall.com/card/ugl/2/the-cheese-stands-alone • https://www.reddit.com/r/thinkpad/comments/8lzftd/exploded_x220_wallpaper_with_a_slightly_different/ • https://skitterphoto.com/photos/232/mixer-knobs • https://www.gsb.stanford.edu/insights/end-traffic-jams-it-might-not-be-dream • https://pixabay.com/en/blueprint-ruler-architecture-964630/ • https://obrazki.elektroda.pl/5336891800_1520705708.jpg • https://www.youtube.com/watch?v=IKiSPUc2Jck • http://markinternational.info/coding-hd-wallpaper/222275975.html

Bulletproof MongoDB

Bulletproof MongoDB

More Decks by Jeremy Mikola

Other Decks in Programming

Featured

Transcript