In December, a team of hackers took the time to implement a full exploit of SSL certificates signed with a broken hash function (MD5). The paper is entitled MD5 Considered Harmful Today (a charming reference to the classic MD5 Considered Harmful Someday.
This piece of news has been in my “to-blog” queue for the better part of a month but I’ve been too depressed to write about it. For some years, I’ve tried – and failed – to convince systems programmers that they need to plan for the obsolescence of any cryptographic hash function they have incorporated into their software. Why?
Cryptographic hash functions don’t actually produce universally unique signatures/addresses/identifiers from their inputs, they simply do so with very high probability for the few years between their definition and the inevitable break of that hash function. When I wrote my first paper on the subject back in 2003, MD5 was already considered broken by cryptographers, but since none of them had bothered to find two colliding inputs, most programmers believed MD5 was still safe to use. When MD5 was completely broken the next year, I thought that the debate was over, but programmer practice remained unchanged, in part because most people had already migrated to SHA-1. Then SHA-0, a close cousin to SHA-1, was completely broken. Again, nothing changed. Then SHA-1 was significantly weakened, to the point that Bruce Schneier described it as:
SHA-1 has been broken. Not a reduced-round version. Not a simplified version. The real thing.
Still nothing. Of course, programmers are taking the exact attitude towards the weakening of SHA-1 as they did towards MD5 – it ain’t broken until you show me the collision, so I’m gonna keep using it – but this time we have a significant body of software using SHA-1. And now we know that even supposedly security-centered organizations continue to use known broken cryptographic hashes right up until someone demonstrates a collision on their exact system and creates a media storm around it.
I still recommend that designers of software in which the cryptographic “address space” is shared by untrusted users should plan for upgrading their cryptographic hash functions to the current state of the art every few years. I just have no expectation that this will happen.
I have some pretty color graphics showing the lifetime of cryptographic hash functions in an earlier post.
Finally, in a vain attempt to forestall the inevitable flame wars, I will point out that my objections do not apply to systems in which the hash address space is shared only with trusted users. In other words, hash-based source control is for the most part fine sticking with SHA-1 and could indeed use a cheaper hash like MD5 without any practical trouble. My hatred of git is based entirely on the user interface.
I should have read my Google News Alerts before posting this. Apparently a new document archive system from U Washington was just announced that does take into account migration to new hash functions. From the New York Times article by John Markoff:
The University of Washington researchers now use a modern hash algorithm called SHA-2, but they have designed the system so that it can be easily replaced with a more advanced algorithm.
But I can’t find any primary sources and the article has a technical error elsewhere: “At the heart of the system is an algorithm that is used to compute a 128-character number known as a cryptographic hash from the digital information in a particular document.” Probably a garbled reference to 128-bit MD5 checksums. Nonetheless, encouraging news.