Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Brown Bag Intro to MongoDB

Mark Hillick
November 23, 2012
63

Brown Bag Intro to MongoDB

Brown Bag session given to college students

Mark Hillick

November 23, 2012
Tweet

Transcript

  1. Summary/Agenda • Who & what • Example Deployments • EC2

    Notes • EC2 Best Practices • Further Tuning Friday 23 November 12
  2. Example Deployments • Replica Sets • Shards • Some notes

    on EC2 deployments Friday 23 November 12
  3. Replica Set Configurations Primary Arbiter Secondary Primary Secondary Secondary Primary

    Secondary Secondary Secondary Secondary (Minimum) (Typical) Friday 23 November 12
  4. Some RS Notes • Asynchronous replication (single primary) • Automatic

    failover • App-level definition of “write replication” • Secondary nodes can replicate with a slaveDelay • Secondary nodes can be hidden • Maximum of 12 nodes, with 7 voting Friday 23 November 12
  5. Sharding Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary

    Primary Secondary Secondary mongos mongos mongos mongos config DB config DB config DB Friday 23 November 12
  6. Sharding Notes • Each “shard” usually a Replica Set (same

    options) • Meta Data for shard stored in ConfigDB • Copy of meta data stored in-memory by mongos • Config DB cluster is *not* a replica set • Data split into chunks, using range based shard key • Chunks may be migrated between shards • New chunks created by “splitting” old chunks Friday 23 November 12
  7. Shard Server in EC2 (1) Category/Impact Low Medium High Disk

    Speed x Disk Capacity x RAM x CPU x Friday 23 November 12
  8. Shard Server in EC2 (2) • MongoDB designed for OS

    defaults on 64 bit instance • Use standard virtual memory page size • Raise “nofiles” ulimit • Use RAID10 & modern f/s -> ext4, xfs etc • Use “noatime” mount option Friday 23 November 12
  9. Shard Server in EC2 (3) • kernel >= 2.6.23/2.6.25 respectively

    •Readahead: how much more to read than what you asked for • If too high => possible performance impact • Set to 0 on EBS devices • Set to desired value on RAID device Friday 23 November 12
  10. Config Server in EC2 (1) Category/Impact Low Medium High Disk

    Speed x Disk Capacity x RAM x CPU x Friday 23 November 12
  11. Config Server in EC2 (2) • Use Raid10 • Use

    64 bit instance • Can run on shard servers Friday 23 November 12
  12. Arbiter in EC2 (1) Category/Impact Low Medium High Disk Speed

    x Disk Capacity x RAM x CPU x Friday 23 November 12
  13. Arbiter in EC2 (2) • Can use micro instance •

    Elections may be slower • Can use instance store • Still want backups :) Friday 23 November 12
  14. Instance Types and Capabilities Instance Type API Name Available RAM

    (GB) Network (Gbps) Cores EC2 Units Standard Hi-Mem Hi-CPU Cluster Compute Micro m1.small 1.71 0.25 1 1 m1.medium 3.75 0.25 1 2 m1.large 7.5 0.5 2 2 m1.xlarge 15 1.0 4 8 m2.xlarge 17.1 0.25 2 6.5 m2.2xlarge 34.2 0.5 4 13 m2.4xlarge 68.4 1.0 8 26 c1.medium 1.7 0.25 2 5 c1.xlarge 7 1.0 8 20 cc1.4xlarge 23 10* 8 33.5 cc1.8xlarge 60 10* 16 88 t1.micro 0.613 0.1 1** 2** * Although Cluster Compute nodes have 10Gbps dedicated, there is a 2Gbps rate limit between the instances and EBS, limiting IO to 2GBps ** Micro instances are really just for testing - even their stated EC2 units are burst only Friday 23 November 12
  15. Instances Guidelines (1) • Use 64-bit only, 32-bit is not

    recommended • Primary/Secondary should be equal* • High CPU is not necessary • High Memory for large mongod instances • Network capacity is also IO capacity • EBS Friday 23 November 12
  16. Instances Guidelines (2) • Note the trade-offs - memory/network •

    m1.large to m2.xlarge = 2x Mem, 0.5x Network • Do not use micro except for testing • m1.medium is usually sufficient for config DB • m1.small can be used for Arbiters Friday 23 November 12
  17. System Configuration • Use 64-bit, Linux preferred • Set file

    descriptor limits (20,000 or above) • Turn off atime on filesystem (pre-2.6.30 especially) • Use ext4/XFS as the filesystem (not ext3) • RAID 10 is recommended everywhere • mitigates slow EBS volumes (fail the bad volume) • Do not use large VM pages • Do configure swap to prevent OOM Killer Friday 23 November 12
  18. Backups • EBS Snapshots - RAID complicates things • If

    possible, single EBS volume, hidden slave can be used to simplify • Single EBS volume, with journaling means: • No fsync & lock required • Similar applies to LVM snapshots • http://www.mongodb.org/pages/viewpage.action? pageId=19562846 Friday 23 November 12
  19. Tweaking for Performance • Place journal on separate EBS volume(s)

    - leave readahead as-is • On data volume, lower readahead to a reasonable level (mongod must be restarted) • Each EBS volume is ~100 IOPS • Use MMS and munin-node to track IO over time • Also track Flush average • Fragmentation can cause operations to be expensive • Trade-offs for using compact and repair Friday 23 November 12