How many have been woken up from sleep to do a fail-over(s)? • How many have experienced issues due to network latency? • Different uses for data – Normal processing – Simple analytics
written to, and read from • Each member can have one or more tags – tags: {dc: "ny"} – tags: {dc: "ny", subnet: "192.168", rack: "row3rk7"} • Replica set defines rules for write concerns • Rules can change without changing app code
primary (only) - Default – primaryPreferred – secondary – secondaryPreferred – Nearest When more than one node is possible, closest node is used for reads (all modes but primary)
of failure: – Power – Network – Data center – Two node failure • Automatic recovery of single node crash Replica Set – 1 Data Center Datacenter 2 Datacenter Member 1 Member 2 Member 3
Can’t do multi data center durable write safely since only 1 node in distant DC Replica Set – 2 Data Centers Member 3 Datacenter 2 Member 1 Member 2 Datacenter 1
loss • Can do w= { dc : 2 } to guarantee write in 2 data centers (with tags) Replica Set – 3 Data Centers Datacenter 1 Member 1 Member 2 Datacenter 2 Member 3 Member 4 Datacenter 3 Member 5
cluster chunk ranges and locations – Can have only 1 or 3 (production must have 3) – Two phase commit (not a replica set) or Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server
router / balancer – No local data (persists to config database) – Can have 1 or many App Server Mongos Mongos App Server App Server App Server Mongos or
- Only one Config server (No Fault Tolerance) - Shard not in a replica set (Low Availability) - Only one Mongos and shard (No Performance Improvement) - Useful for development or demonstrating configuration mechanics Node 1 Secondary Config Server Mongos Mongod Mongod
• "mongos --configdb <hostname>:27019" • For 3 config servers: "mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>" • This is always how to start a new mongos, even if the cluster is already running
mongod with the default shard port (27018) • Shard is not yet connected to the rest of the cluster • Shard may have already been running in production Node 1 Secondary Config Server Mongos Mongod Shard
replica set: "sh.addShard(‘<rsname>/<seedlist>’) • In 2.2 and later can use sh.addShard(‘<host>:<port>’) Node 1 Secondary Config Server Mongos Mongod Shard
• Shard a collection with the given key – sh.shardCollection("<dbname>.people",{"country":1}) • Use a compound shard key to prevent duplicates – sh.shardCollection("<dbname>.cars",{"year":1, "uniqueid":1})
control the distribution of your data • Tag a range of shard keys – sh.addTagRange(<collection>,<min>,<max>,<tag>) • Tag a shard – sh.addShardTag(<shard>,<tag>)
the maximum size • There is no split point if all documents have the same shard key • Chunk split is a logical operation (no data is moved) • If split creates too large of a discrepancy of chunk count across cluster a balancing round starts minKey maxKey minKey 13 14 maxKey
difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
out a "balancer lock" • To see the status of these locks: - use config - db.locks.find({ _id: "balancer" }) Node 1 Secondary Config Server Mongos Shard 1 Mongos Mongos Shard 2 Mongod
to source shard • The source shard then notifies destination shard • The destination claims the chunk shard-key range • Destination shard starts pulling documents from source shard Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
for open cursors to either close or time out - NoTimeout cursors may prevent the release of the lock • Mongos releases the balancer lock after old chunks are deleted Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
• Shard key is immutable • Shard key values are immutable • Shard key requires index on fields contained in key • Uniqueness of `_id` field is only guaranteed within individual shard • Shard key limited to 512 bytes in size