Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Storage and Journaling - Eliot Horowitz, 10gen

mongodb
October 05, 2011

Storage and Journaling - Eliot Horowitz, 10gen

MongoBoston 2011

With the release of 1.8, MongoDB supports write-ahead journaling of operations to facilitate fast crash recovery and durability in the storage engine.... In this session, we'll give an overview of durability with MongoDB, demo journaling, and discuss journaling internals.

mongodb

October 05, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Directory Layout -rw------- 1 erh admin 64M Jun 26 00:15

    test.0 -rw------- 1 erh admin 128M Jun 21 00:20 test.1 -rw------- 1 erh admin 256M Jun 26 00:15 test.2 -rw------- 1 erh admin 512M Jun 21 00:20 test.3 -rw------- 1 erh admin 1.0G Jun 26 00:15 test.4 -rw------- 1 erh admin 2.0G Jun 25 23:08 test.5 -rw------- 1 erh admin 16M Jun 26 00:15 test.ns •Separate files per database •Aggressive preallocation •Always spare file Wednesday, October 5, 2011
  2. Internal File Format • Files broken into extents • A

    collection has 1 to many extents • Grow exponentially up to 2gb (max file size as well) • Indexes have different extents than data Wednesday, October 5, 2011
  3. Sample Extents > db.foo.validate( { full : true } ).extents.forEach(

    function(z){ print( z.loc + "\t\t" + z.size ); } ) 0:3000 20480 0:12000 81920 0:26000 327680 0:76000 1310720 0:1da000 5242880 0:76a000 6291456 0:d6a000 7553024 0:16de000 9064448 0:1f83000 10878976 0:29e3000 13058048 1:2000 15671296 1:ef4000 18808832 1:29e4000 22573056 1:3f6b000 27090944 1:5941000 32509952 Wednesday, October 5, 2011
  4. Index Extents > db.system.namespaces.find() { "name" : "test2.foo" } {

    "name" : "test2.system.indexes" } { "name" : "test2.foo.$_id_" } > db["foo.$_id_"].validate( { full : true } ).extents.forEach( function(z){ print( z.loc + "\t\t" + z.size ); } ) 0:9000 36864 0:1b6000 147456 0:6da000 589824 0:149e000 2359296 1:20e4000 9437184 Wednesday, October 5, 2011
  5. Memory Mapped • All data files memory mapped • Virtual

    size = total data size + overhead • Journaled virtual size = ( total data size * 2 ) + overhead • fsync every 60 seconds (--syncdelay) Wednesday, October 5, 2011
  6. Planned Changes • Split data and indexes into different files

    • Indexes could by symlinked to a different drive (SSD) Wednesday, October 5, 2011
  7. Journalling • Write ahead log • Operations written to journal

    before memory mapped regions • Once journal written, data safe unless hardware problem Wednesday, October 5, 2011
  8. When is Data Written • Journal flushed every 100ms or

    100mb written • j=true flag to force a journal flush Wednesday, October 5, 2011
  9. Journal Admin • /journal sub directory in <dbpath> (/data/ db)

    • 3 1gb files that get rotated • Can symlink to a different spindle Wednesday, October 5, 2011
  10. Performance • On 99.9% read systems, no impact • Write

    performance 5-30% slowdown on same drive • Using separate drive as low as 3% Wednesday, October 5, 2011
  11. When to use • Single node - required for any

    data integrity • Replica Set - at least 1 node • All nodes for large data sets removes need for large resyncs Wednesday, October 5, 2011
  12. Changes in 2.0 • Writes to journal outside of lock

    • Journal is compressed so more fits in 3gb and is faster to write • On by default on 64-bit systems Wednesday, October 5, 2011
  13. Fragmentation • Files can get fragmented over time if documents

    change size • Need to improve free list • 2.0 reduced scanning to reasonable amounts • 2.2 will change allocation strategy • Need to re-write free list to do online compaction Wednesday, October 5, 2011
  14. Compacting • 1.8 and previous: repairDatabase • 2.0+ : compact

    command • only needs 2gb extra space • Can be N times faster where N = number of indexes Wednesday, October 5, 2011
  15. update and moves • Updates can make documents bigger •

    Moves are more expensive than other operations Wednesday, October 5, 2011
  16. padding • adaptive padding between 1.0 and 2.0 • manual

    control coming in 2.2 Wednesday, October 5, 2011
  17. Download MongoDB http://www.mongodb.org and  let  us  know  what  you  think

    @eliothorowitz        @mongodb 10gen is hiring! http://www.10gen.com/jobs Wednesday, October 5, 2011