Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB on AWS

MongoDB on AWS

cj_harris5

March 29, 2012
Tweet

More Decks by cj_harris5

Other Decks in Technology

Transcript

  1. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index

    Join Embedding & Linking Partition Shard Partition Key Shard Key Thursday, 29 March 12
  2. Here is a “simple” SQL Model mysql> select * from

    book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1 | 1 | | 2 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec) Thursday, 29 March 12
  3. The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title"

    : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] } Thursday, 29 March 12
  4. Cursors > var c = db.test.find({x: 20}).skip(20).limit(10)> c.next() > c.next()

    ... $gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $or, $not, $mod, $size, $exists, $type, $elemMatch query first N results + cursor id getMore w/ cursor id next N results + cursor id or 0 ... Thursday, 29 March 12
  5. db.blogs.ensureIndex({author: 1}) 1 = ascending -1 = descending An index

    on _id is automatic. For more use ensureIndex: Creating Indexes Thursday, 29 March 12
  6. Compound Indexes db.blogs.save({ author: "James", ts: new Date() ... });

    db.blogs.ensureIndex({author: 1, ts: -1}) Thursday, 29 March 12
  7. db.blogs.save({ title: "My First blog", stats : { views: 0,

    followers: 0 } }); db.blogs.ensureIndex({"stats.followers": -1}) db.blogs.find({"stats.followers": {$gt: 500}}) Indexing Embedded Documents Thursday, 29 March 12
  8. Four things to think about 1. Machine Sizing: Disk and

    Memory 2. Load Testing and Monitoring 3. Backup and restore 4. Ops Play Book Thursday, 29 March 12
  9. Virtual Address Space 1 Collection 1 Index 1 This is

    your virtual memory size (mapped) Thursday, 29 March 12
  10. Virtual Address Space 1 Physical RAM Collection 1 Index 1

    This is your resident memory size Thursday, 29 March 12
  11. Virtual Address Space 1 Physical RAM Disk Virtual Address Space

    2 Collection 1 Index 1 Thursday, 29 March 12
  12. Virtual Address Space 1 Physical RAM Disk Collection 1 Index

    1 100 ns 10,000,000 ns = = Thursday, 29 March 12
  13. Sizing RAM and Disk • Working set • Document Size

    • Memory versus disk • Data lifecycle patterns • Long tail • pure random • bulk removes Thursday, 29 March 12
  14. Figuring out working Set > db.wombats.stats() { "ns" : "test.wombats",

    "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "name_1" : 43982848 }, Size of data Size on disk (and in memory!) Size of all indexes Average document size Size of each index Thursday, 29 March 12
  15. Disk configurations ~200 seeks / second ~200 seeks / second

    ~200 seeks / second ~200 seeks / second Single Disk RAID 0 Thursday, 29 March 12
  16. Disk configurations ~200 seeks / second ~200 seeks / second

    ~200 seeks / second ~200 seeks / second ~400 seeks / second ~400 seeks / second ~400 seeks / second Single Disk RAID 0 RAID 10 Thursday, 29 March 12
  17. •Focus on higher Memory and not adding CPU core based

    instances •Use 64-bit instances •Use XFS or EXT4 file system •Use EBS in RAID. Use RAID 0 or 10 for data volume, RAID 1 for configdb Basic Tips Thursday, 29 March 12
  18. Basic Installation Steps 1. Create your EC2 Instance 2. Attached

    EBS Storage 3. Make a EXT4 file system $sudo mkfs -t ext4 /dev/[connection to volume] 4. Make a data directory $sudo mkdir -p /data/db 5. Mount the volume $sudo mount -a /dev/sdf /data/db 6. Install MongoDB $curl http://[mongodb download site] > m.tgz $tar xzf m.tgz 7. Start mongoDB $./mongodb Thursday, 29 March 12
  19. Types of outage • Planned • Hardware upgrade • O/S

    or file-system tuning • Relocation of data to new file-system / storage • Software upgrade • Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption Thursday, 29 March 12
  20. How MongoDB Replication works Member 1 Member 2 Member 3

    •Set is made up of 2 or more nodes Thursday, 29 March 12
  21. How MongoDB Replication works Member 1 Member 2 PRIMARY Member

    3 •Election establishes the PRIMARY •Data replication from PRIMARY to SECONDARY Thursday, 29 March 12
  22. How MongoDB Replication works Member 1 Member 2 DOWN Member

    3 negotiate new master •PRIMARY may fail •Automatic election of new PRIMARY if majority exists Thursday, 29 March 12
  23. How MongoDB Replication works Member 1 Member 2 DOWN Member

    3 PRIMARY •New PRIMARY elected •Replication Set re-established Thursday, 29 March 12
  24. How MongoDB Replication works Member 1 Member 2 RECOVERING Member

    3 PRIMARY •Automatic recovery Thursday, 29 March 12
  25. How MongoDB Replication works Member 1 Member 2 Member 3

    PRIMARY •Replication Set re-established Thursday, 29 March 12
  26. Replica Set 0 •Two Node? •Network failure can cause the

    nodes to slip which will result in the the whole system going read only Thursday, 29 March 12
  27. Replica Set 1 •Single datacenter •Single switch & power •Points

    of failure: •Power •Network •Datacenter •Two node failure •Automatic recovery of single node crash Thursday, 29 March 12
  28. Replica Set 3 •Single datacenter •Multiple power/network zones •Points of

    failure: •Datacenter •Two node failure •Automatic recovery of single node crash AZ:1 AZ:2 AZ:3 Thursday, 29 March 12
  29. Replica Set 4 •Multi datacenter •DR node for safety •Can’t

    do multi data center durable write safely since only 1 node in distant DC Thursday, 29 March 12
  30. Replica Set 5 •Three data centers •Can survive full data

    center loss •Can do w= { dc : 2 } to guarantee write in 2 data centers Thursday, 29 March 12
  31. Sharding Across AZs • Each Shard is made up of

    a Replica Set • Each Replica Set is distributed across availability zones for HA and data protection AZ:1 AZ:2 AZ:3 Thursday, 29 March 12
  32. Balancing Shard 1 Shard 2 Shard 3 Shard 4 5

    9 1 6 10 2 7 11 3 8 12 4 17 21 13 18 22 14 19 23 15 20 24 16 29 33 25 30 34 26 31 35 27 32 36 28 41 45 37 42 46 38 43 47 39 44 48 40 mongos balancer config config config Chunks! Thursday, 29 March 12
  33. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Imbalance Imbalance Thursday, 29 March 12
  34. Balancing mongos balancer Move chunk 1 to Shard 2 config

    config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Thursday, 29 March 12
  35. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1 Thursday, 29 March 12
  36. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1 Thursday, 29 March 12
  37. Balancing mongos balancer Chunk 1 now lives on Shard 2

    config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Thursday, 29 March 12
  38. Replica Set 3 backup 1. Lock the “Backup” Node: db.fsyncLock()

    2. Check Locked db.currentOp() 3. Take a EBS Snapshot or MongoDump ec2-create-snapshot -d mybackup vol-nn 4. Unlock db.fsyncUnlock() Thursday, 29 March 12
  39. Monitoring Tools mongostat - MMS! - http://mms.10gen.com munin, cacti, nagios

    - http://www.mongodb.org/display/DOCS/Monitoring+and+Diagnostics Thursday, 29 March 12