Perfectionism and pthreads

I’ve been putting the “finishing touches” on my ext2/3 fsck readahead patches for about, oh, 4 weeks now. I finished the proof of concept around 3 or 4 months ago and started over on a nice clean version that has a chance of being merged. That version is up, working, and finishing in about 50% of the time of vanilla fsck on my test file system (running on a big fancy RAID – there’s almost no improvement on a single disk). Certainly, I needed to go back through and get rid of the debugging stuff, search for “XXX – check for failure”, and do a final code review. But that only takes a week, maybe. What I’ve been doing since then is attempting to make fsck no less stable with readahead than without. It’s far more important for fsck to work than for it to work quickly. My ultimate goal is that if the readahead thread has a random segfault, it will clean up after itself perfectly and allow the main fsck thread to continue and finish as though nothing had happened.

This turns out to be overkill (ha ha) and incredibly annoying to do with pthreads. In the worst case, the readahead thread will need to deallocate memory and possibly (but not always) unlock a mutex. After several hours of reading man pages, the solutions for these problems appear to be:

  • Every time you lock a mutex, pthread_cleanup_push/pop a handler that will unlock the mutex.
  • Use the incredibly clumsy and painful pthread key interface to create a faux data-associated cleanup function that frees allocated resources on thread exit.
  • Catch SIGSEGV and, um, a bunch of other signals and, um, figure out if it came from the readahead thread dying and um…


So I’ve decided that if readahead breaks, then you should rerun fsck without readahead enabled. The readahead thread won’t run at all unless it is explicitly turned on, so it’s not like this will be unfathomable to the user. I’m still making the readahead/main thread interaction as robust as possible, using an explicit test/exit system in exactly the same way as the main fsck thread. The readahead thread shuts down if it detects a bug and the main thread can kill the readahead thread if it no longer wants readahead to keep going.

As for the rest of the possible targets for perfectionism, I’ve decided to just add them to the TODO list and get the damn code out there for review (as soon as I do some performance testing to make sure my perfectionist editing didn’t destroy performance). Die, perfectionism, die!

And of course, I’m expecting a barrage of suggestions for how to robust-ify readahead.

4 thoughts on “Perfectionism and pthreads

  1. do you expect your patches to shorten fsck times for raid1 mirrors? i ask because i use ext3 on a three-way raid1 mirror in an attempt to stay a step ahead of disk failures but fsck time is still painful (although only about as painful as on a same-sized filesystem with no mirroring, i think.)

  2. also, is using fork() either sysv ipc and/or a shared mmap’ed region an option? that way the kernel would handle most of your cleanup and you’d just need some timeouts in the main fsck code to handle a hung or failed readahead, right?

  3. It’s certainly possible it will speed up. With a mirror, you can theoretically service twice as many requests per second, so issuing more than one at a time ought to help. It won’t divide the time by two, but might reduce it on the order of 5% (not much, I’m afraid).

    There’s some work going on to lay out file system metadata so that it is clustered together in various ways. The interesting thing about this layout is that it is in many ways a return to the original System V file system (s5fs), which put all the inodes at the beginning of the disk and everything else after it. s5fs was pretty slow, and was replaced by FFS, with its modern arrangement of metadata interspersed with data throughout the disk in block groups. An argument can be made that modern caching will mitigate the seeks between data and metadata, and that moving the indirect blocks and directory blocks together with the inodes will improve performance. Caching obviously helps, but it only takes a few read cache misses to end up seeking back and forth nearly as much as before.

    In general, I don’t think optimizing file system layout for fsck at the cost of online performance is a good approach. It does seem possible to improve fsck performance without a major sacrifice of online performance.

  4. Ew, IPC. I’m not sure; there are a lot of interesting race conditions that seem best solved with pthread mutexes and condition variables. But I’ll keep it mind.

Comments are closed.

%d bloggers like this: