Google Summer of Code 2013 report: Defragmentation for FFS


October 11, 2013 posted by Thomas Klausner

The following report is by Manuel Wiesinger:

First of all, I like to thank the NetBSD Foundation for enabling me to successfully complete this Google Summer of Code. It has been a very valuable experience for me.

My project is a defragmentation tool for FFS. I want to point out at the beginning that it is not ready for use yet.

What has been done:

Fragment analysis + reordering. When a file is smaller or equal than the file system's fragment size, it is stored as a fragment. One can think of a fragment as a block. It can happen that there are many small files that occupy a fragment. When the file systems changes over time it can happen that there are many blocks containing fewer fragments than they can hold. The optimization my tool does is to pack all these fragments into fewer blocks. This way the system may get a little more free space.

Directory optimization. When a directory gets deleted, the space for that directory and its name are appended to the previous directory. This can be imagined like a linked list. My tool reads that list and writes all entries sequentially.

Non-contiguous files analysis + reordering strategy. This is what most other operating systems call defragmentation - a reordering of blocks, so that blocks belonging to the same file or directory can be read sequentially.

What did not work as expected

Testing: I thought that it is the most productive and stable to work with unit tests. Strictly test driven development. It was not really effective to play around with rump/atf. Although I always started a new implementation step by generating a file system in a state where it can be optimized. So I wrote the scripts, took a look if they did what I intended (of course, they did not always).

I'm a bit disappointed about the amount of code. But as I said before, the hardest part is to figure out how things work. The amount it does is relatively much, I expected more lines of code to be needed to get where I am now.

Before applying for this project, I took a close look at UFS. But it was not close enough. There were many surprises. E.g. I had no idea that there are gaps in files on purpose, to exploit the rotation of hard disks.

Time management, everything took longer than I expected. Mostly because it was really hard to figure out how things work. Lacking documentation is a huge problem too.

Things I learned

A huge lesson learned in software engineering. It is always different than expected, if you do not have a lot of experience.

I feel more confident to read and patch kernel code. All my previous experiences were not so in-depth. (e.g., I worked with pintos). The (mental) barrier of kernel/system programming is gone. For example I see a chance now to take a look on ACPI, and see if I can write a patch to get suspend working on my notebook.

I got more contact with the NetBSD community, and got a nice overview how things work. The BSD community here is very mixed. There are not many NetBSD people.

CVS is better than most of my friends say.

I learned about pkgsrc, UVM, and other smaller things about NetBSD too, but that's not worth mentioning in detail.

How I intend to continue:

After a sanity break, of the whole project, there are several possibilities.

In the next days I will speak to a supervisor at my university, if I can continue the project as a project thesis (I still need to do one). It may even include online defragmentation, based on snapshots. That's my preferred option.

I definitely want to finish this project, since I spent so much time and effort. It would be a shame otherwise.

What there is to do technically

Once the defragmentation works, given enough space to move a file. I want to find a way where you can defragment it even when there is too little space. This can be achieved by simply moving blocks piece by piece, and use the files' space as 'free space'.

Online defragmentation. I already skimmed hot snapshots work. It should be possible.

Improve the tests.

It should be easy to get this compiling on older releases. Currently it compiles only on -current.

Eventually it's worth it to port the tests to atf/rump on the long run.

Conclusion

I will continue to work, and definitely continue to use and play with NetBSD! :)

It's a stupid thing that a defrag tool is worth little (nothing?) on SSDs. But since NetBSD is designed to run on (almost) any hardware, this does not bug me a lot.

Thank you NetBSD! Although it was a lot of hard work. It was a lot of fun!

Manuel Wiesinger

[1 comment]

 



Comments:

a seemless on the fly defragger !?

Posted by heraux on November 01, 2013 at 08:47 PM UTC #

Post a Comment:
Comments are closed for this entry.