At long last, I’ve written and published the “compare-by-hash for programmers” article everyone’s always been asking for. You can read it chopped into 17 pieces and partially obscured by floating ads here:
http://www.linuxworld.com/news/2007/111207-hash.html
(My editor says: Please please complain about this! No one believes me when I say this is bad!) Or you can read it one piece with full size tables, etc. here:
I’m always looking for new article suggestions, especially for the Kernel Hacker’s Bookshelf (search down the page for the entry). Writing is fun!
The part of the article that Edward Tufte would be most proud of are the two tables about hash function life cycles:
Stages in the life cycle of cryptographic hash functions |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Stage | Expert reaction | Programmer reaction | Non-expert (“slashdotter”) reaction |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Initial proposal | Skepticism, don’t recommend use in practice |
Wait to hear from the experts before adding to OpenSSL | SHA-what? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Peer reviewal | Moderate effort to find holes and garner an easy publication |
Used by a particularly adventurous developers for specific purposes |
Name-drop the hash at cocktail parties to impress other geeks |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
General acceptance | Top-level researchers begin serious work on finding a weakness (and international fame) |
Even Microsoft is using the hash function now | Flame anyone who suggests the function may be broken in our lifetime | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Minor weakness discovered | Massive downloads of turgid pre-prints from arXiv, calls for new hash functions |
Start reviewing other hash functions for replacement | Long semi-mathematical posts comparing the complexity of the attack to the number of protons in the universe |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Serious weakness discovered | Tension-filled CRYPTO rump sessions! A full break is considered inevitable |
Migrate to new hash functions immediately, where necessary |
Point out that no actual collisions have been found |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
First collision found | Uncork the champagne! Interest in the details of the construction, but no surprise |
Gather around a co-worker’s computer, comparing the colliding inputs and running the hash function on them |
Explain why a simple collision attack is still useless, it’s really the second pre-image attack that counts |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Meaningful collisions generated on home computer |
How adorable! I’m busy trying to break this new hash function, though |
Send each other colliding X.509 certificates as pranks |
Tell people at parties that you always knew it would be broken |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Collisions generated by hand | Memorize as fun party trick for next faculty mixer |
Boggle | Try to remember how to do long division by hand |
Life cycles of popular cryptographic hashes (the “Breakout” chart) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Function | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Snefru | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MD4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MD5 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MD2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RIPEMD | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HAVAL-128 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SHA-0 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SHA-1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RIPEMD-128 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RIPEMD-160 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SHA-2 family | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Hash Function Lounge has an excellent list of references for the dates. |
nice
I like the article and I especially like the stages descriptive text.
Fred
So what happened in 2004? Did the price of alcohol fall thus placing swathes of “experts” in the Balmer peak for extended periods?
2004 was a pretty astounding year; I have some vague conspiracy theories about that not worth the electrical potential I use to think them. The real reason is that some brilliant PRC researchers found techniques that exploited common design elements of several of the top hash functions – and published them all in the same paper. Another suspicion I have is that most other researchers were not interested in bothering to fully break hash functions other than SHA-*, since little glory would accrue to them for proving what “everyone” (i.e., cryptographers) already knew. For example, I suspect Joux could have broken them too, but he was busy finding a collision in SHA-0 (also reported in 2004).
that’s a great visualization. i’m shocked that git was designed using sha-1 and has apparently still not been changed.
So the case where only trusted users can put data in the system is usually pretty annoying. When the hash isn’t broken, they tell you that the system works because no one can generate collisions. When the hash is broken, they tell you that it doesn’t matter if you can generate collisions because the users won’t do that deliberately. To which my reply is, “Then why don’t you use a cheaper hash?” The only one that gets this right is rsync – the hashes are pretty wimpy, but exactly as good as they need to be (and much faster to calculate).
So it’s fine that git uses SHA-1 and will continue to use SHA-1. Git uses SHA-1 as a cheap way of creating UUIDs for objects, not for any security reason. But there’s absolutely no point in using the latest greatest cryptographically secure hash if you don’t think there’s a reason to upgrade it when it’s broken. Git would also work just fine if it was using MD5, or even MD4 – and it would go lots faster.
I guess I worry a bit about accidental collisions. I know they should be vanishingly unlikely if the objects I’m storing in git are all of similar types, but it seems like that might break down once I start putting a lot of audio and photo captures in there — for instance, if I used git to keep a complete revision history of my home directory. It’s not a case of deliberately causing collision, it’s more that I worry I might accidentally find out my backup is missing some day because that crucial .wav file hashed the same as the .png from a friend’s amateur astronomy CCD exercise. I realize the probability of a random collision is very small, but who knows when one widely-used file format and another widely-used file-format will happen to fall afoul of whatever hash function was popular when the backup system was designed…
Oh, and exactly these sorts of problems did arise when using CRC32, of course. Otherwise I’d be completely blissful in my ignorance, rather than needlessly fearful ;)
You’re preaching to the choir. Compare-by-hash is somewhat defensible when it’s used for a specific purpose by educated persons who trust each other. When it becomes invisible systems software, those qualifications go away.
One way I think about it that we see certain sequences of bits far more often than expected by probability alone (and others far less often).
Some of the sequences of bits that will appear more often than by chance within a few years are sequences that have colliding SHA-1 hashes. These sequences will be replicated and stored and generally spread through computer systems because they are so interesting and unique. And then there’s someone quite reasonably using git to maintain their home directory who downloads the pair of sequences with the same SHA-1 hash. Yeah, probably git hashes in a lot of other information as well as the data, but it’s just not that hard to construct something that collides in git once you have the original vanilla collisions.
I now think I will not use git, monotone or mercurial to back up my home directory. Thanks :)
ouch.
you probably saw this already, but and friends broke ssl (also on
) using weaknesses in md5. as noted there, combined with dns hijacking, this is basically the theoretical end of web security as we know it. if only the browser hackers had heeded your article when you published, we might be in a better position to work around this problem today.
Re: ouch.
I hear, according to the experts, no one really cares about certificates anyway. :-/
md5 decode
Hello, i’m using – online md5 database