Relatime Recap

Linus just made relatime the default atime behavior for file systems (thanks to mjg59’s hard work), although I suspect it won’t remain the default or in this particular form for long. People are still asking the same questions about atime as they did 2 or 3 years ago, so I thought I’d do a quick recap.

  • Q: Why do I care about atime anyway?
  • A: atime (“access time”) is a POSIX defined file property that records the last time a file’s data or metadata was read. This turns every read of a file into a small write. Reading a lot of files (as in, say, a recursive grep) will cause a sudden storm of random write I/Os even if all the actual data to read is in cache, which uses up disk I/O bandwidth, spins up the disk if it was idle, and generally wastes power, wears out the disk, and speeds the heat death of the universe. Thus, atime is considered to be one of the great misfeatures of UNIX.
  • Q: So let’s just not update atime at all. I hear there’s this “noatime” mount option that’s been around for years.
  • A: “noatime” is great and some distros turned it on by default for a while. I use it and I tell anyone who asks me to turn on noatime. Then one of my friends complained that mutt no longer knew when it had new mail when she turned on atime. (In some configurations, mutt depends on atime being updated on mbox format mailboxes.) noatime breaks a small but important set of applications, and so most distros couldn’t turn it on by default without generating a flood of complaints. (NB: “nodiratime” only turns off atime on directories but not files (noatime does both); this was a different compromise and broke another set of applications for somewhat less benefit.)
  • Q: Okay, how about we just update atime in memory and never write it out disk?
  • A: When the cached inode gets kicked out of memory and then reread from disk, the atime as viewed from the application point of view will go backwards in time. This is a little too non-POSIX for most people. It’s worth noting, though, that XFS had a bug which resulted in exactly this behavior for months (years?) and no one ever noticed.
  • Q: Fine, let’s update atime in memory and then write it out to disk only when the inode gets evicted from memory.
  • A: One of the XFS guys, Dave Chinner, told me that they had a serious problem with this kind of strategy in other use cases. If you have a lot of memory in relation to the number of disk iops available, you could easily build up enough inode updates to keep the disk busy for 10s of minutes straight. See, the writes are random and scattered around the disk, so the disk becomes IO bound on a per-operation level (on the order of a hundred IOs per second). Everything is fine until you get enough memory pressure to start kicking inodes out of cache You want to be writing these inode updates out to disk on an ongoing basis rather than letting them pile up indefinitely, which gets us back to regular atime behavior.
  • Q: Damn. So what do we do now?
  • A: The idea I came up with a couple of years ago is to only update the atime if the previous atime was older than the file’s mtime or ctime – that is, the file’s data or metadata has been changed since the last time the atime was updated. Now you know if the file has been accessed relative to the last time it was changed. This solved my friend’s mutt new mail notification problem and made it possible for distros to turn on “relatime” (relative atime) by default.
  • Q: Any problems with that?
  • A: Other than the fact that “relatime” (relative atime) looks like “realtime” misspelled, certain applications (e.g., tmpwatcher) want to know if a file has been accessed at all in the last <timeperiod>. This made it impossible to turn relatime on by default in some distros, which was the whole point of the feature. So a variety of people (Ingo Molnar, Matthew Garrett, and Andrew Morton that I recall) rewrote the patch over the years to update the atime if it was older than some period of time (configurable variously via /proc, kernel config value, mount option, tapping Morse code into the speaker, etc.). Two years of bikeshedding ensued. Some distros included the patches to configure the time period but they weren’t in mainline yet.
  • Q: So how come it’s the default now?
  • A: The Linux community is currently convulsed by file system performance and correctness issues. relatime came up in a discussion about ext3 IO latency and this time it piqued Linus’ interest enough to merge in non-configurable form. But Linus says he’ll consider adding a configuration knob:
    On Thu, 26 Mar 2009, Andrew Morton wrote:
    > I (and others) pointed out that it would be better to implement this as
    > a mount option. That suggestion was met with varying sillinesses and
    > that is where things stand.
    I'd suggest first just doing the 24 hour thing, and then, IF user space
    actually ever gets its act together, and people care, and they _ask_ for a
    mount option, that's when it's worth doing.
  • Q: I think what Linus merged is a horrible idea! What can I do?
  • A: On an individual level, you can specify the “strictatime” mount option. If you have wider concerns, download the latest kernel git snapshot, run it, see what breaks, and post to LKML (preferably with patches – check the archives, there is probably already one that does what you want). I’m pretty sure this patch will be modified before the next release (or possibly reverted), so right now is a golden opportunity to affect it.
This entry was posted in Uncategorized. Bookmark the permalink.

22 Responses to Relatime Recap

  1. Jeff says:

    From the application programmer’s perspective, what is needed is a way to differenciate between atime reads and file reads. For example, suppose an application wants to read a file’s atime, it should access that data in a read-only way so as not to alter the atime.

    So why should we be limited to atime, ctime and mtime? How about adding a new field, ratime (‘readonly atime’)?


Comments are closed.