Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evaluation of Data Reliability on Linux File Sy...

Evaluation of Data Reliability on Linux File Systems

Yoshitake Kobayashi

December 18, 2008
Tweet

More Decks by Yoshitake Kobayashi

Other Decks in Technology

Transcript

  1. Dec. 18, 2009 Evaluation of Data Reliability on Linux File

    Systems Yoshitake Kobayashi Advanced Software Technology Group Corporate Software Engineering Center TOSHIBA CORPORATION Copyright 2009, Toshiba Corporation.
  2. 3 Motivation We want • NO data corruption • data

    consistency • GOOD performance We do NOT want • frequent data corruption • data inconsistency • BAD performance enough evaluation? NO! Ext3 Ext4 XFS JFS ReiserFS Btrfs Nilfs2 ……
  3. 4 Reliable file system requirement For data consistency • journaling

    • SYNC vs. ASYNC - SYNC is better Focus • available file systems on Linux • data writing • data consistency Metrics • logged progress = file size • estimated file contents = actual file contents
  4. 5 Target files Evaluation: Overview Writer processes (N procs) Target

    Host write() system call Log Host Logger Writer process • writes to text files • sends progress log to logger
  5. 6 Target Host Writer process • writes to text files

    • sends progress log to logger How to crash • modified reboot system call - forced to reboot - 10 seconds to reboot
  6. 7 Target Host Writer process • writes to text files

    • sends progress log to logger How to crash • modified reboot system call - forced to reboot - 10 seconds to reboot Test cases 1. create: open with O_CREATE 2. append: open with O_APPEND 3. overwrite: open with O_RDWR 4. write->close: open with O_APPEND and call close() on each write()
  7. 8 Verification Checker Target file LOG file AAAAA BBBBB CCCCC

    DDDDD EEEEE OK FFFFF AAAAA BBBBB CCCCC DDDDD EEEEE OK AAAAA BBBBB CCCCC DDDDD AAAAA NG AAAAA BBBBB CCCCC DDDDD NG ? size mismatch data mismatch Verify the following metrics • file size • data contents Estimated file size
  8. 9 Environment Hardware • Host1 - CPU: Celeron 2.2GHz, Mem

    1GB - HDD: IDE 80GB (2MB cache) •Host2 - CPU: Pentium4 2.8GHz, Mem 2GB - HDD: SATA 500GB (16MB cache)
  9. 10 Environment Software • Kernel version - 2.6.18 (Host1 only)

    - 2.6.31.5 • File system - ext3 (data=ordered or data=journal) - xfs (osyncisosync) - jfs - ext4 (data=ordered used on Host 1, data=journal used on Host2) • I/O scheduler - kernel 2.6.18 tested with noop scheduler only - kernel 2.6.31.5 tested with all I/O schedulers - noop, cfq, deadline, anticipatory
  10. 11 Summary: kernel-2.6.18 (IDE 80GB, 2MB cache) Number of samples:

    1800 Rate = F / (W * T) Total number of mismatch: F Number of writer procs: W Number of trials: T 45.94 827 0.00 0 XFS 0.06 1 0.50 9 JFS 0.00 0 0.00 0 EXT3-JOURNAL 0.00 0 0.22 4 EXT3-ORDERED Rate[%] Count Rate[%] Count DATA mismatch SIZE mismatch File System 2.6.18 (IDE 80GB, 2MB cache) 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL JFS XFS SIZE mismatch Rate[%] DATA mismatch Rate[%] Mismatch rate [%]
  11. 12 Focused on Test case: kernel-2.6.18 (IDE 80GB) 69.33 0

    create XFS 58.22 0 append 0 0 overwrite 56.22 0 write->close 0 2.00 create JFS 0 0 append 0.22 0 overwrite 0 0 write->close 0 0 append 0 0 overwrite 0 0 write->close 0 0 create ext3(journal) 0 0 write->close 0 0.89 overwrite 0 0 append 0 0 create ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 450
  12. 13 Focused on write size: kernel-2.6.18 (IDE 80GB) 0 0

    4096 0 0.67 8192 0 0 128 JFS 0.17 0 4096 0 1.5 8192 25.50 0 128 XFS 58.83 0 4096 53.5 0 8192 0 0 8192 0 0 256 ext3(journal) 0 0 4096 0 0 256 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 600 The bigger write size , the more size mismatch ??
  13. 14 2.6.31 (IDE80GB, 2MB cache) 0.00 0.50 1.00 1.50 2.00

    EXT3- ORDERED EXT3- JOURNAL EXT4- ORDERED JFS XFS SIZE mismatch Rate[%] DATA mismatch Rate[%] Summary: kernel-2.6.31.5 (IDE80GB, 2MB cache) 0 0 0.02 3 XFS 19.40 3104 0.01 2 JFS 0 0 0.11 17 EXT4-ORDERED 0 0 0.16 25 EXT3-JOURNAL 0 0 1.07 171 EXT3-ORDERED Rate[%] Count Rate[%] Count DATA mismatch SIZE mismatch File System Number of samples: 16000 Mismatch rate [%]
  14. 15 Focused on test case: kernel-2.6.31.5 (IDE 80GB) 26.08 0

    create JFS 25.58 0 append 0 0.05 overwrite 25.95 0 write->close 0 0 create XFS 0 0 append 0 0.08 overwrite 0 0 write->close 0 0 create ext4(ordered) 0 0 append 0 0.43 overwrite 0 0 write->close 0 0 append 0 0 overwrite 0 0.18 write->close 0 0.45 create ext3(journal) 0 1.25 write->close 0 1.13 overwrite 0 0.70 append 0 1.20 create ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000
  15. 16 Focused on I/O sched: kernel-2.6.31.5 (IDE 80GB) 0 0.05

    noop JFS 0.98 0 deadline 52.78 0 cfq 23.85 0 anticipatory 0 0.03 noop XFS 0 0 deadline 0 0.03 cfq 0 0.03 anticipatory 0 0 noop ext4(ordered) 0 0 deadline 0 0 cfq 0 0.43 anticipatory 0 0 deadline 0 0.40 cfq 0 0.23 anticipatory 0 0 noop ext3(journal) 0 1.50 anticipatory 0 2.00 cfq 0 0.33 deadline 0 0.45 noop ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000
  16. 17 Focused on write size: kernel-2.6.31.5 (IDE 80GB) 22.94 0

    256 0 0 256 0 0 256 0 0 4096 0 3.13 8192 20.06 0 128 JFS 18.22 0.06 4096 17.63 0 8192 18.16 0 16384 0 0 128 XFS 0 0 4096 0 0 8192 0 0.09 16384 0 0 128 ext4(ordered) 0 0 4096 0 0.25 8192 0 0.28 16384 0 0 256 0 0.16 8192 0 0.63 16384 0 0 128 ext3(journal) 0 2.22 16384 0 0 4096 0 0 256 0 0 128 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 3200
  17. 18 Focused on write size: kernel-2.6.31.5 (IDE 80GB) 22.94 0

    256 0 0 256 0 0 256 0 0 4096 0 3.13 8192 20.06 0 128 JFS 18.22 0.06 4096 17.63 0 8192 18.16 0 16384 0 0 128 XFS 0 0 4096 0 0 8192 0 0.09 16384 0 0 128 ext4(ordered) 0 0 4096 0 0.25 8192 0 0.28 16384 0 0 256 0 0.16 8192 0 0.63 16384 0 0 128 ext3(journal) 0 2.22 16384 0 0 4096 0 0 256 0 0 128 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 3200 The bigger write size, the more size mismatch ?
  18. 19 Summary: kernel-2.6.31 (SATA500GB, 16MB cache) 0.000 0 0.019 3

    XFS 13.306 2129 0.175 28 JFS 0.000 0 0.000 0 EXT4-JOURNAL 0.000 0 0.006 1 EXT3-JOURNAL 0.000 0 0.650 104 EXT3-ORDERED Rate[%] Count Rate[%] Count DATA mismatch SIZE mismatch File System Number of samples: 16000 2.6.31 (SATA 500GB, 16MB cache) 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL EXT4- JOURNAL JFS XFS SIZE mismatch Rate[%] DATA mismatch Rate[%] Mismatch rate [%]
  19. 20 Focused on test case: kernel-2.6.31.5 (SATA 500GB) 17.9 0.23

    create JFS 22.23 0.33 append 0 0.15 overwrite 13.10 0 write->close 0 0 create XFS 0 0 append 0 0.08 overwrite 0 0 write->close 0 0 create ext4(journal) 0 0 append 0 0 overwrite 0 0 write->close 0 0 append 0 0 overwrite 0 0.03 write->close 0 0 create ext3(journal) 0 1.43 write->close 0 0.23 overwrite 0 0.10 append 0 0.85 create ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000
  20. 21 Focused on I/O sched: kernel-2.6.31.5 (SATA 500GB) 0.03 0.40

    noop JFS 0.38 0.28 deadline 25.63 0 cfq 27.20 0.03 anticipatory 0 0.03 noop XFS 0 0.03 deadline 0 0.03 cfq 0 0 anticipatory 0 0 noop ext4(journal) 0 0 deadline 0 0 cfq 0 0 anticipatory 0 0 deadline 0 0 cfq 0 0.03 anticipatory 0 0 noop ext3(journal) 0 0.20 anticipatory 0 0.88 cfq 0 0.90 deadline 0 0.63 noop ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000
  21. 22 Focused on write size: kernel-2.6.31.5 (SATA 500GB) 15.03 0

    256 0 0 256 0 0 256 0 0 4096 0 1.69 8192 13.44 0.66 128 JFS 18.48 0 4096 9.38 0 8192 10.25 0.22 16384 0 0 128 XFS 0 0 4096 0 0 8192 0 0.09 16384 0 0 128 ext4(journal) 0 0 4096 0 0 8192 0 0 16384 0 0 256 0 0 8192 0 0.03 16384 0 0 128 ext3(journal) 0 1.56 16384 0 0 4096 0 0 256 0 0 128 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 3200 The bigger write size, the more size mismatch
  22. 23 Try to evaluate other file systems… Evaluation failed •

    nilfs2 - caused file system full - nilfs_cleanerd not fast enough • btrfs - caused kernel crash - recovery failure
  23. 24 Conclusion Evaluation result shows: • XFS and JFS data/size

    mismatch rate depends on kernel version • SYNC write mode is not safe enough in most cases • BEST result on EXT4 with journal mode - effects of write barriers? • GOOD results on XFS(only 2.6.31.5) and Ext3-journal - NOTE: Ext3 performance is much better than XFS (random write) Future work • evaluate other file systems