Worried about the recent firestorm over fsync() and ext3/4? Wondering if you should rewrite your applications to fsync() every other line of code? Afraid that you’ll boot into a new kernel one day and suddenly start losing all your data?
Don’t Panic. Don’t start rewriting your applications. And don’t worry about waking up one day and finding that your file system has silently switched to data=loseallmystuff mode. I will post something in more detail in the next few days, but for now, here are the top data points:
- The majority of Linux file systems developers believe that applications that Just Work on ext3 today should also Just Work on your Linux file system in the future. Specifically, rename() implies that the file’s data will hit disk before the rename() does. (In other words, you won’t have to add an fsync() before the rename() in order to guarantee this behavior, as is technically required by POSIX – we figure it’s implied by the rename().)
- Please don’t go add a load of fsync()s to your applications “just to be safe.” On 99.99% of ext3 systems in use, fsync() won’t return until all outstanding writes to the ENTIRE file system have hit disk. This causes enormous, unacceptable latencies if anyone else is using the file system, and in most cases isn’t what you actually want. (See data=guarded mode, below.)
- Don’t worry about the new default journaling mode for ext3 planned for 2.6.30 (data=writeback, which is much faster than the old default, data=ordered, but has enormous security and data integrity problems). No distro would ship this as the default. The only way it could happen at Red Hat is over the dead bodies of the security team, who, let me tell you, keep an eagle eye on file system data leaks like this.
- Chris Mason is taking time off btrfs to work on a new journaling mode for ext3, data=guarded, which will get around the current performance issues of data=ordered while preserving many of the old consistency guarantees. Please test it – the more testing it gets, the sooner data=writeback will stop being the default. Latest patches are here, here, and here.
If you are an applications developer trying to figure out whether to rewrite all your file I/O code, please sit back and wait for things to settle down for a few months. My prediction is that by the time 2.6.31 is released (and possibly earlier), Linux file systems will actually be more reliable and better performing than in 2.6.29, without application developers or distros having to lift a finger.
Edited to remove “rename() implies fsync()”. Keep the comments coming! But note that I generally don’t approve anonymous comments unless they are polite and informative.