Google Summer of Code: Improving RAIDframe parity handling

June 21, 2009 posted by Jed Davis


A NetBSD system, in order to tolerate disk failures, can use the software RAID driver raid(4). Currently, if that system is shut down uncleanly (e.g., loses power or crashes), then when it comes back up it will have to check the entire RAID set's redundancy information. This process can take many hours, during which it imposes a substantial load on the system. It is also a distinct disadvantage to using NetBSD in server applications, and the inclusion of a journaling filesystem in NetBSD 5 makes it all the more prominent.

The goal of my Summer of Code project is to shorten that check from hours to minutes.

More Detail

The problem is due to a fundamental limitation of RAID: if the system is abruptly halted in the middle of writing to several independent disks, there is no way to tell afterwards which of those write operations were actually performed short of reading the actual data involved. This is especially problematic for RAID-5 and similar, where a mismatch between data and parity will result in garbling part of that data when (not if) it has to be reconstructed after a disk failure.

The solution taken in the RAIDframe codebase used by NetBSD is simple: while a RAID device is configured and in use, the entire thing is marked as needing a parity rewrite — even though only a small part, or perhaps none at all, has write operations in progress at any given time.

Thus, my fix is to record a better approximation of what parts of the RAID are being written to at any given time. Except that now the RAID driver has to update that record as writes are done, rather than just once on boot and once at shutdown, and if this approximation is too fine-grained then that will noticeably reduce performance — all of the time, not just after unplanned reboots.


At the time of this writing, I have a prototype implementation of just such a parity map; it's not fit for general consumption at this point, but the basics are working. Next up is more testing and performance evaluation, to refine the parity map algorithm. Eventually there will need to be an interface for the system administrator to configure this feature, and the code will need to correctly handle a RAID set being moved between kernels with and without parity map support, but these should be relatively straightforward and are scheduled for later in the summer.

Some additional minor improvements to the RAID system are also planned, time permitting, pertaining in particular to the handling of spare disks and reconstruction after a disk failure. In any case, by the end of the summer, a notable weakness of NetBSD will have been corrected.



Post a Comment:
Comments are closed for this entry.