Replication and Replica Sets - Kyle Banker, 10gen

Replication and Replica Sets Kyle Banker 10gen Friday, November 4,
2011

What is Replication for? • High availability • If a
node fails, another node can step in • Extra copies of data for recovery • Offline operations • Backups and exports are cheap • Scaling reads • Read-heavy applications can read from replicas Friday, November 4, 2011

What Does Replication Look Like? • Replica Set • A
set of mongod servers • Minimum of 3 • Can use “arbiters” • Consensus election of a “primary” • All writes go to primary • “Secondaries” replicate from primary Friday, November 4, 2011

Configuring a Replica Set • Start mongod processes with --replSet
• Then: > rs.initiate(); > rs.add(‘localhost:30000’); > rs.add(‘localhost:30001’) Friday, November 4, 2011

How Does it Work? • Change operations are written to
the oplog • The oplog is a capped collection • Must have enough space to allow new secondaries to catch up after copying from a primary • Must have enough space to cope with any applicable slaveDelay • Secondaries query the primary’s oplog and apply what they find Friday, November 4, 2011

UNDERSTANDING AUTOMATIC FAILOVER Friday, November 4, 2011

Primary Election Primary Secondary Secondary As long as a partition
can see a majority (>50%) of the cluster, then it will elect a primary. Friday, November 4, 2011

Simple Failure Primary Failed Node Secondary 66% of cluster visible.
Primary is elected Friday, November 4, 2011

Simple Failure Failed Node 33% of cluster visible. Read only
mode. Failed Node Secondary Friday, November 4, 2011

Network Partition Primary Secondary Secondary Friday, November 4, 2011

Network Partition Primary Secondary Secondary Primary Failed Node Secondary 66%
of cluster visible. Primary is elected Friday, November 4, 2011

Secondary Network Partition 33% of cluster visible. Read only mode.
Primary Secondary Failed Node Failed Node Secondary Friday, November 4, 2011

Even Cluster Size Primary Secondary Secondary Secondary Friday, November 4,
2011

Even Cluster Size Primary Secondary Secondary Secondary Failed Node Secondary
Failed Node 50% of cluster visible. Read only mode. Secondary Friday, November 4, 2011

Even Cluster Size Primary Secondary Failed Node Secondary Failed Node
50% of cluster visible. Read only mode. Secondary Secondary Secondary Friday, November 4, 2011

AVOIDING SINGLE POINTS OF FAILURE Friday, November 4, 2011

Avoid Single points of failure Friday, November 4, 2011

Avoid Single points of failure Primary Secondary Secondary Top of
rack switch Rack falls over Friday, November 4, 2011

Better Primary Secondary Secondary Loss of internet Building burns down
Friday, November 4, 2011

Better yet Primary Secondary Secondary San Francisco Dallas Friday, November
4, 2011

Priorities Primary Secondary Secondary San Francisco Dallas Priority 1 Priority
1 Priority 0 Disaster recover data center. Will never become primary automatically. Friday, November 4, 2011

Another Option Primary Secondary Secondary San Francisco Dallas New York

FAST RECOVERY Friday, November 4, 2011

2 Replicas + Arbiter?? Primary Arbiter Secondary Is this a
good idea? Friday, November 4, 2011

2 Replicas + Arbiter?? Primary Arbiter Secondary 1 Friday, November
4, 2011

2 Replicas + Arbiter?? Primary Arbiter Secondary Primary Arbiter Secondary
1 2 Friday, November 4, 2011

2 Replicas + Arbiter?? Primary Arbiter Secondary Primary Arbiter Secondary
1 2 Primary Arbiter Secondary 3 Secondary Full Sync Uh oh. Full Sync is going to use a lot of resources on the primary. So I may have downtime or degraded performance Friday, November 4, 2011

With 3 replicas Primary Secondary 1 Secondary Friday, November 4,
2011

With 3 replicas Primary Secondary Primary Secondary 1 2 Secondary
Secondary Friday, November 4, 2011

With 3 replicas Primary Secondary Primary Secondary 1 2 Primary
Secondary 3 Secondary Full Sync Sync can happen from secondary, which will not impact trafﬁc on Primary. Secondary Secondary Secondary Friday, November 4, 2011

Replica Set Topology • Avoid single points of failure –
Separate racks – Separate data centers • Avoid long recovery downtime – Use journaling – Use 3+ replicas • Keep your actives close – Use priority to control where failovers happen Friday, November 4, 2011

For Applications • getLastError( { w : … } )
• Application waits until changes are written to the specified number of servers • Defaults can be set in the replica set’s configuration • “Safe mode” for critical writes: setWriteConcern() • Another way to force writes to a number of servers • Drivers support ‘read preference’ for sending queries to a secondary • Careful on this: secondary reads are not guaranteed to be consisted with the primary. Friday, November 4, 2011

Replication and Sharding • Each shard needs its own replica
set • Drivers use a mongos process to route queries to the appropriate shard(s) • Configuration servers maintain the shard key range metadata Friday, November 4, 2011

Replication and Sharding Friday, November 4, 2011

Data Center Awareness • Tag nodes in replica set configuration
• Apply hierarchical labels to replica set members • Define getLastError modes • Require number of nodes writes must go to • Require locations of nodes writes must go to • Combinations • Available in 2.0.0+ Friday, November 4, 2011

Replication and Replica Sets - Kyle Banker, 10gen

Replication and Replica Sets - Kyle Banker, 10gen

More Decks by mongodb

Other Decks in Technology

Featured

Transcript