Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2011-MongoDC-Storage.pdf

 2011-MongoDC-Storage.pdf

mongodb

July 12, 2011
Tweet

More Decks by mongodb

Other Decks in Programming

Transcript

  1. Directory Layout -rw------- 1 erh admin 64M Jun 26 00:15

    test.0 -rw------- 1 erh admin 128M Jun 21 00:20 test.1 -rw------- 1 erh admin 256M Jun 26 00:15 test.2 -rw------- 1 erh admin 512M Jun 21 00:20 test.3 -rw------- 1 erh admin 1.0G Jun 26 00:15 test.4 -rw------- 1 erh admin 2.0G Jun 25 23:08 test.5 -rw------- 1 erh admin 16M Jun 26 00:15 test.ns •Separate files per database •Aggressive preallocation •Always spare file
  2. Internal File Format • Files broken into extents • A

    collection has 1 to many extents • Grow exponentially up to 2gb (max file size as well) • Indexes have different extents than data
  3. Sample Extents > db.foo.validate( { full : true } ).extents.forEach(

    function(z){ print( z.loc + "\t\t" + z.size ); } ) 0:3000 20480 0:12000 81920 0:26000 327680 0:76000 1310720 0:1da000 5242880 0:76a000 6291456 0:d6a000 7553024 0:16de000 9064448 0:1f83000 10878976 0:29e3000 13058048 1:2000 15671296 1:ef4000 18808832 1:29e4000 22573056 1:3f6b000 27090944 1:5941000 32509952
  4. Index Extents > db.system.namespaces.find() { "name" : "test2.foo" } {

    "name" : "test2.system.indexes" } { "name" : "test2.foo.$_id_" } > db["foo.$_id_"].validate( { full : true } ).extents.forEach( function(z){ print( z.loc + "\t\t" + z.size ); } ) 0:9000 36864 0:1b6000 147456 0:6da000 589824 0:149e000 2359296 1:20e4000 9437184
  5. Memory Mapped • All data files memory mapped • Virtual

    size = total data size + overhead • Journaled virtual size = ( total data size * 2 ) + overhead • fsync every 60 seconds (--syncdelay)
  6. Journalling • Write ahead log • Operations written to journal

    before memory mapped regions • Once journal written, data safe unless hardware problem
  7. When is Data Written • Journal flushed every 100ms or

    100mb written • j=true flag to force a journal flush
  8. Journal Admin • /journal sub directory in <dbpath> (/data/ db)

    • 3 1gb files that get rotated • Can symlink to a different spindle
  9. Performance • On 99.9% read systems, no impact • Write

    performance 5-30% slowdown on same drive • Using separate drive as low as 3%
  10. When to use • Single node - required for any

    data integrity • Replica Set - at least 1 node • All nodes for large data sets removes need for large resyncs
  11. Fragmentation • Files can get fragmented over time if documents

    change size • Need to improve free list
  12. update and moves • Updates can make documents bigger •

    Moves are more expensive than other operations
  13. Download MongoDB http://www.mongodb.org and  let  us  know  what  you  think

    @eliothorowitz        @mongodb 10gen is hiring! http://www.10gen.com/jobs