Slide 1

Slide 1 text

Dec. 18, 2009 Evaluation of Data Reliability on Linux File Systems Yoshitake Kobayashi Advanced Software Technology Group Corporate Software Engineering Center TOSHIBA CORPORATION Copyright 2009, Toshiba Corporation.

Slide 2

Slide 2 text

2 Outline Motivation Evaluation Conclusion

Slide 3

Slide 3 text

3 Motivation We want • NO data corruption • data consistency • GOOD performance We do NOT want • frequent data corruption • data inconsistency • BAD performance enough evaluation? NO! Ext3 Ext4 XFS JFS ReiserFS Btrfs Nilfs2 ……

Slide 4

Slide 4 text

4 Reliable file system requirement For data consistency • journaling • SYNC vs. ASYNC - SYNC is better Focus • available file systems on Linux • data writing • data consistency Metrics • logged progress = file size • estimated file contents = actual file contents

Slide 5

Slide 5 text

5 Target files Evaluation: Overview Writer processes (N procs) Target Host write() system call Log Host Logger Writer process • writes to text files • sends progress log to logger

Slide 6

Slide 6 text

6 Target Host Writer process • writes to text files • sends progress log to logger How to crash • modified reboot system call - forced to reboot - 10 seconds to reboot

Slide 7

Slide 7 text

7 Target Host Writer process • writes to text files • sends progress log to logger How to crash • modified reboot system call - forced to reboot - 10 seconds to reboot Test cases 1. create: open with O_CREATE 2. append: open with O_APPEND 3. overwrite: open with O_RDWR 4. write->close: open with O_APPEND and call close() on each write()

Slide 8

Slide 8 text

8 Verification Checker Target file LOG file AAAAA BBBBB CCCCC DDDDD EEEEE OK FFFFF AAAAA BBBBB CCCCC DDDDD EEEEE OK AAAAA BBBBB CCCCC DDDDD AAAAA NG AAAAA BBBBB CCCCC DDDDD NG ? size mismatch data mismatch Verify the following metrics • file size • data contents Estimated file size

Slide 9

Slide 9 text

9 Environment Hardware • Host1 - CPU: Celeron 2.2GHz, Mem 1GB - HDD: IDE 80GB (2MB cache) •Host2 - CPU: Pentium4 2.8GHz, Mem 2GB - HDD: SATA 500GB (16MB cache)

Slide 10

Slide 10 text

10 Environment Software • Kernel version - 2.6.18 (Host1 only) - 2.6.31.5 • File system - ext3 (data=ordered or data=journal) - xfs (osyncisosync) - jfs - ext4 (data=ordered used on Host 1, data=journal used on Host2) • I/O scheduler - kernel 2.6.18 tested with noop scheduler only - kernel 2.6.31.5 tested with all I/O schedulers - noop, cfq, deadline, anticipatory

Slide 11

Slide 11 text

11 Summary: kernel-2.6.18 (IDE 80GB, 2MB cache) Number of samples: 1800 Rate = F / (W * T) Total number of mismatch: F Number of writer procs: W Number of trials: T 45.94 827 0.00 0 XFS 0.06 1 0.50 9 JFS 0.00 0 0.00 0 EXT3-JOURNAL 0.00 0 0.22 4 EXT3-ORDERED Rate[%] Count Rate[%] Count DATA mismatch SIZE mismatch File System 2.6.18 (IDE 80GB, 2MB cache) 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL JFS XFS SIZE mismatch Rate[%] DATA mismatch Rate[%] Mismatch rate [%]

Slide 12

Slide 12 text

12 Focused on Test case: kernel-2.6.18 (IDE 80GB) 69.33 0 create XFS 58.22 0 append 0 0 overwrite 56.22 0 write->close 0 2.00 create JFS 0 0 append 0.22 0 overwrite 0 0 write->close 0 0 append 0 0 overwrite 0 0 write->close 0 0 create ext3(journal) 0 0 write->close 0 0.89 overwrite 0 0 append 0 0 create ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 450

Slide 13

Slide 13 text

13 Focused on write size: kernel-2.6.18 (IDE 80GB) 0 0 4096 0 0.67 8192 0 0 128 JFS 0.17 0 4096 0 1.5 8192 25.50 0 128 XFS 58.83 0 4096 53.5 0 8192 0 0 8192 0 0 256 ext3(journal) 0 0 4096 0 0 256 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 600 The bigger write size , the more size mismatch ??

Slide 14

Slide 14 text

14 2.6.31 (IDE80GB, 2MB cache) 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL EXT4- ORDERED JFS XFS SIZE mismatch Rate[%] DATA mismatch Rate[%] Summary: kernel-2.6.31.5 (IDE80GB, 2MB cache) 0 0 0.02 3 XFS 19.40 3104 0.01 2 JFS 0 0 0.11 17 EXT4-ORDERED 0 0 0.16 25 EXT3-JOURNAL 0 0 1.07 171 EXT3-ORDERED Rate[%] Count Rate[%] Count DATA mismatch SIZE mismatch File System Number of samples: 16000 Mismatch rate [%]

Slide 15

Slide 15 text

15 Focused on test case: kernel-2.6.31.5 (IDE 80GB) 26.08 0 create JFS 25.58 0 append 0 0.05 overwrite 25.95 0 write->close 0 0 create XFS 0 0 append 0 0.08 overwrite 0 0 write->close 0 0 create ext4(ordered) 0 0 append 0 0.43 overwrite 0 0 write->close 0 0 append 0 0 overwrite 0 0.18 write->close 0 0.45 create ext3(journal) 0 1.25 write->close 0 1.13 overwrite 0 0.70 append 0 1.20 create ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000

Slide 16

Slide 16 text

16 Focused on I/O sched: kernel-2.6.31.5 (IDE 80GB) 0 0.05 noop JFS 0.98 0 deadline 52.78 0 cfq 23.85 0 anticipatory 0 0.03 noop XFS 0 0 deadline 0 0.03 cfq 0 0.03 anticipatory 0 0 noop ext4(ordered) 0 0 deadline 0 0 cfq 0 0.43 anticipatory 0 0 deadline 0 0.40 cfq 0 0.23 anticipatory 0 0 noop ext3(journal) 0 1.50 anticipatory 0 2.00 cfq 0 0.33 deadline 0 0.45 noop ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000

Slide 17

Slide 17 text

17 Focused on write size: kernel-2.6.31.5 (IDE 80GB) 22.94 0 256 0 0 256 0 0 256 0 0 4096 0 3.13 8192 20.06 0 128 JFS 18.22 0.06 4096 17.63 0 8192 18.16 0 16384 0 0 128 XFS 0 0 4096 0 0 8192 0 0.09 16384 0 0 128 ext4(ordered) 0 0 4096 0 0.25 8192 0 0.28 16384 0 0 256 0 0.16 8192 0 0.63 16384 0 0 128 ext3(journal) 0 2.22 16384 0 0 4096 0 0 256 0 0 128 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 3200

Slide 18

Slide 18 text

18 Focused on write size: kernel-2.6.31.5 (IDE 80GB) 22.94 0 256 0 0 256 0 0 256 0 0 4096 0 3.13 8192 20.06 0 128 JFS 18.22 0.06 4096 17.63 0 8192 18.16 0 16384 0 0 128 XFS 0 0 4096 0 0 8192 0 0.09 16384 0 0 128 ext4(ordered) 0 0 4096 0 0.25 8192 0 0.28 16384 0 0 256 0 0.16 8192 0 0.63 16384 0 0 128 ext3(journal) 0 2.22 16384 0 0 4096 0 0 256 0 0 128 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 3200 The bigger write size, the more size mismatch ?

Slide 19

Slide 19 text

19 Summary: kernel-2.6.31 (SATA500GB, 16MB cache) 0.000 0 0.019 3 XFS 13.306 2129 0.175 28 JFS 0.000 0 0.000 0 EXT4-JOURNAL 0.000 0 0.006 1 EXT3-JOURNAL 0.000 0 0.650 104 EXT3-ORDERED Rate[%] Count Rate[%] Count DATA mismatch SIZE mismatch File System Number of samples: 16000 2.6.31 (SATA 500GB, 16MB cache) 0.00 0.50 1.00 1.50 2.00 EXT3- ORDERED EXT3- JOURNAL EXT4- JOURNAL JFS XFS SIZE mismatch Rate[%] DATA mismatch Rate[%] Mismatch rate [%]

Slide 20

Slide 20 text

20 Focused on test case: kernel-2.6.31.5 (SATA 500GB) 17.9 0.23 create JFS 22.23 0.33 append 0 0.15 overwrite 13.10 0 write->close 0 0 create XFS 0 0 append 0 0.08 overwrite 0 0 write->close 0 0 create ext4(journal) 0 0 append 0 0 overwrite 0 0 write->close 0 0 append 0 0 overwrite 0 0.03 write->close 0 0 create ext3(journal) 0 1.43 write->close 0 0.23 overwrite 0 0.10 append 0 0.85 create ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000

Slide 21

Slide 21 text

21 Focused on I/O sched: kernel-2.6.31.5 (SATA 500GB) 0.03 0.40 noop JFS 0.38 0.28 deadline 25.63 0 cfq 27.20 0.03 anticipatory 0 0.03 noop XFS 0 0.03 deadline 0 0.03 cfq 0 0 anticipatory 0 0 noop ext4(journal) 0 0 deadline 0 0 cfq 0 0 anticipatory 0 0 deadline 0 0 cfq 0 0.03 anticipatory 0 0 noop ext3(journal) 0 0.20 anticipatory 0 0.88 cfq 0 0.90 deadline 0 0.63 noop ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 4000

Slide 22

Slide 22 text

22 Focused on write size: kernel-2.6.31.5 (SATA 500GB) 15.03 0 256 0 0 256 0 0 256 0 0 4096 0 1.69 8192 13.44 0.66 128 JFS 18.48 0 4096 9.38 0 8192 10.25 0.22 16384 0 0 128 XFS 0 0 4096 0 0 8192 0 0.09 16384 0 0 128 ext4(journal) 0 0 4096 0 0 8192 0 0 16384 0 0 256 0 0 8192 0 0.03 16384 0 0 128 ext3(journal) 0 1.56 16384 0 0 4096 0 0 256 0 0 128 ext3(ordered) Data mismatch [%] Size mismatch [%] Test case File System #samples: 3200 The bigger write size, the more size mismatch

Slide 23

Slide 23 text

23 Try to evaluate other file systems… Evaluation failed • nilfs2 - caused file system full - nilfs_cleanerd not fast enough • btrfs - caused kernel crash - recovery failure

Slide 24

Slide 24 text

24 Conclusion Evaluation result shows: • XFS and JFS data/size mismatch rate depends on kernel version • SYNC write mode is not safe enough in most cases • BEST result on EXT4 with journal mode - effects of write barriers? • GOOD results on XFS(only 2.6.31.5) and Ext3-journal - NOTE: Ext3 performance is much better than XFS (random write) Future work • evaluate other file systems

Slide 25

Slide 25 text

25 2008 / 7 / 24 TOSHIBA Confidential