the raw devices, to get the best performances • A read() corruption from a raw dev, is retried on another device (if the RAID allows) and the block is fixed on that raw device. • Raw Devices • Perform the read/write requests received. • Perform validation of the data block Each Raw Devices has a set of “Priority Queues” • Each queue has a its own “elevator algorithm”
(Deadline, SSFT, FIFO, …) • Each queue can be throttled to provide more fairness. • Journal Write, Recovery Reads, Async Reads, Async Writes struct r5l_vtable_dev_ioq { r5l_errno_t (*open) (r5l_dev_ioq_t *self, va_list args); void (*close) (r5l_dev_ioq_t *self); r5l_errno_t (*add) (r5l_dev_ioq_t *self, r5l_dev_io_t *io); r5l_errno_t (*fetch) (r5l_dev_ioq_t *self, r5l_dev_io_batch_t *batch); }; I/O Request Object Request r5l_dev_add_io() r5l_dev_ioq_fetch() Dev I/O Executor read()/write() r5l_dev_ioq_add() (ReRequest on failure) Verify Block Notify the Object Root Device
(It’s a Group Device) Group Devices (Raid0, Raid1, …) Raw Devices (BlkDev, File, Socket, …) struct r5l_dev_io_batch { uint64_t phy_offset; uint64_t phy_length; r5l_dev_io_t *ios[N]; }; struct r5l_dev_io { uint8_t type; uint8_t advice; uint8_t priority; r5l_dev_ptr_t ptr; }; struct r5l_dev_ptr { uint64_t seqid; uint32_t flags; uint32_t section; uint32_t offset; uint32_t size; uint64_t dev_id; uint64_t cksum[4]; };