First major public exploitation of a compare-by-hash based system

In December, a team of hackers took the time to implement a full exploit of SSL certificates signed with a broken hash function (MD5). The paper is entitled MD5 Considered Harmful Today (a charming reference to the classic MD5 Considered Harmful Someday.

This piece of news has been in my “to-blog” queue for the better part of a month but I’ve been too depressed to write about it. For some years, I’ve tried – and failed – to convince systems programmers that they need to plan for the obsolescence of any cryptographic hash function they have incorporated into their software. Why?

Cryptographic hash functions don’t actually produce universally unique signatures/addresses/identifiers from their inputs, they simply do so with very high probability for the few years between their definition and the inevitable break of that hash function. When I wrote my first paper on the subject back in 2003, MD5 was already considered broken by cryptographers, but since none of them had bothered to find two colliding inputs, most programmers believed MD5 was still safe to use. When MD5 was completely broken the next year, I thought that the debate was over, but programmer practice remained unchanged, in part because most people had already migrated to SHA-1. Then SHA-0, a close cousin to SHA-1, was completely broken. Again, nothing changed. Then SHA-1 was significantly weakened, to the point that Bruce Schneier described it as:

SHA-1 has been broken. Not a reduced-round version. Not a simplified version. The real thing.

Still nothing. Of course, programmers are taking the exact attitude towards the weakening of SHA-1 as they did towards MD5 – it ain’t broken until you show me the collision, so I’m gonna keep using it – but this time we have a significant body of software using SHA-1. And now we know that even supposedly security-centered organizations continue to use known broken cryptographic hashes right up until someone demonstrates a collision on their exact system and creates a media storm around it.

I still recommend that designers of software in which the cryptographic “address space” is shared by untrusted users should plan for upgrading their cryptographic hash functions to the current state of the art every few years. I just have no expectation that this will happen.

I have some pretty color graphics showing the lifetime of cryptographic hash functions in an earlier post.

Finally, in a vain attempt to forestall the inevitable flame wars, I will point out that my objections do not apply to systems in which the hash address space is shared only with trusted users. In other words, hash-based source control is for the most part fine sticking with SHA-1 and could indeed use a cheaper hash like MD5 without any practical trouble. My hatred of git is based entirely on the user interface.


I should have read my Google News Alerts before posting this. Apparently a new document archive system from U Washington was just announced that does take into account migration to new hash functions. From the New York Times article by John Markoff:

The University of Washington researchers now use a modern hash algorithm called SHA-2, but they have designed the system so that it can be easily replaced with a more advanced algorithm.

But I can’t find any primary sources and the article has a technical error elsewhere: “At the heart of the system is an algorithm that is used to compute a 128-character number known as a cryptographic hash from the digital information in a particular document.” Probably a garbled reference to 128-bit MD5 checksums. Nonetheless, encouraging news.

FAST ’09

Yo! Forget not! The next FAST (USENIX File systems And Storage Technologies) conference is coming up February 24 – 27:

The best part: It’s in downtown San Francisco! Compare and contrast with the location of the last two FAST conferences in “downtown” San Jose.

I have several personal reasons for attending. First, it’s only a bus ride away, so my travel budget comes to about $12. Second, I had the honor to be on the program committee for FAST this year, an event which is unlikely to repeat itself. A third and directly related reason is that I’m very interested in the papers this year, in particular “Generating Realistic Impressions for File-System Benchmarking” from Nitin Agrawal, et alia at UW Madison. I can’t wait for this tool to be open sourced – it will be an enormous boost to file systems developers and testers.

HOWTO debug silent data corruption

Back when I worked at Sun, I used to listen starry-eyed at the knees of senior engineers while they told their tales of debugging silent data corruption. They were really good stories – hardware with obscure manufacturing defects that didn’t show up until really optimized code ran on the chip, rogue SCSI drivers overwriting blocks in memory, wild pointers of every sort. “Wow,” I thought, “That sounds really hard! I want to do that!” Ever since then, I’ve jumped at any opportunity to debug silent data corruption and I’ve loved it every time. Now I have my own set of awesome silent data corruption stories, though sadly some of the best of them are still under NDA.

Lately though, I’ve realized two things: (1) Most people think silent data corruption is almost impossible to debug, (2) I’ve debugged silent data corruption often enough that I actually have a fairly specific system for doing it. So here’s HOWTO Debug Silent Data Corruption:

  1. Gather data. Somehow you noticed something was going wrong; turn on all your logging and crash dumps and anything else you need. Getting as many cases as possible is key. Plan ahead to gather all the data you possibly can when the data corruption is detected since you don’t know when it will happen again. Laziness is a serious error here. One of the hardest silent data corruption bugs I ever solved was made that much harder because a co-worker threw away the log output from his debugging program because he thought it was irrelevant. The program exited just before it triggered a crash dump, so I couldn’t just crawl through its memory in the debugger, but I was sure that some of the pages from the output of the program had not been reused and were floating around in memory somewhere. I ended up writing a script to pull out the entire contents of memory from some weird compressed format the dump program used and grepping through it all for my guess as to what the program would output if the corruption was of a certain type. (This data was the key to finding the root cause of the corruption.)
  2. Get the data out of the computer and into your brain via the highest bandwidth channel available: your eyeballs. Most people instinctively get the first step and gather vast reams of data. Then they dink around with manually paging through the data or running a debugger by hand on the crash dump and give up after a few days (or call me, which I prefer). You need to somehow get the relevant data sifted out of the giant pile of junk and then present it all in close physical and temporal proximity on the screen in a manner optimized for data transmission through your optic nerves. Just keep fiddling with the presentation of the data until you start to see something meaningful. A typical simple visualization hack: I recently wrote a script to take only the first part of a disk and print out an ASCII character for each 512 byte block representing the content of the block (all zeroes, all ones, or mixed), and then fiddled with the output of the scripts and resized the window back and forth. Then I took the output for 10 different corrupted disks – and one uncorrupted disk – and ran a while loop popping each one up with less. Hitting ‘q’ quickly to cycle through the output for different disks let me visually detect common patterns in the corruption; the uncorrupted disk allowed me to distinguish signal from noise.
  3. Look for patterns. The exact pattern of data corruption is the key to tracking down the data corruption. How big is the corruption? How is it aligned? What does it contain? 8 bytes aligned at an offset of N * 280 bytes plus 48 and containing a number consistent with a kernel address means a wild kernel pointer; search memory for an address close to the location of the corruption and find out what data structure it’s living in – it should be a pointer to a 280 byte structure with a pointer in it at offset 48 bytes. 128KB of all zeroes at 128KB alignment screams “Corruption somewhere in the block I/O stack! Almost certainly a crappy SATA driver!” Single bit errors are very occasionally kernel bugs (set bit operations on wild pointers) but nearly always indicate some hideous incurable hardware problem, like bad memory or broken trace lines.
  4. Develop a reproducible test case. If you’re lucky, someone already did this and that’s how you gathered the data in the first case. However, more often than note, the data has been trickling in from the field at some incredibly low rate such that you can run your test system for a month and only have a 1% chance of triggering the error. Now that you have some clues about what the problem could be, you should be able to find a reproducible test case with only a little bit of blind flailing. Make sure you do the math to figure out how long you’ll have to run to get a statistically significant result. Math is your friend!
  5. Run the reproducible test case over and over and over and gather more and different data until you have some hypotheses to test. Talk to your friends about the problem. Walk up and down the hallways eating your vending machine dinner until your brain stops consciously thinking about the problem and comes up with some brilliant insight. Take a shower, start driving home, attempt to go to sleep – whatever your bug idea strategy is, keep using it. Don’t forget to keep looking at your data in different ways and to go back for more data every time you run out of data to analyze. I’m always tempted to reason out the problem through sheer force of mind, but it always turns out that getting just that one more piece of data simplifies the problem by another order of magnitude.
  6. Congratulations! You now have a hypothesis or two about what’s causing the problem! So test it. Your first 5 guesses will probably be wrong. This is why we have a reproducible test case. My favorite is when someone finds some random bug report for some piece of software or hardware that happens to be installed on the system and declares that this must be the cause of the corruption. Folks, every system has a vast number of reported bugs and most of them are not causing your silent data corruption. This part can be pretty frustrating because often your test results will not make any sense at all because your mental model is wrong. Be sure to test the null case, too – run your test with both the new disk driver and the old one and see if it actually distinguishes between the two.
  7. Finally, get ready to be the messenger the customer shoots. Sometimes the fix for silent data corruption is easy: just apply a patch, recompile the driver, and voila! All better now! But more often the silent data corruption is just the tip of a giant hulking iceberg of very expensive hardware badness. If you manage to prove that every single disk in the entire 1024-node cluster has to be replaced, somebody is going to be out a lot of money and they are not going to be happy to see you. Be sure to have all your statistics and political machinations ready at hand. Remember, in the end, you found the cause of the corruption and nothing they do can change that. You found the cause of *silent data corruption*! Woohoo!! Yeah!

And that is how you debug silent data corruption.

Sex(ual harassment) in the City

No one ever believes me that I get harassed on the street on a regular basis. It’s pretty much a guarantee that whenever I walk more than about 5 blocks in a city, I’m going to get at least one cat call, “solicitation,” or threat. When I tell people about this, they look shocked and say, “Wow, I can’t believe that. Really?” Very rarely, someone will believe me, but only because the same thing happens to her. One of my sisters can somehow be detected a mile a way by flashers. She’s had men exposing themselves to her since she was 5 years old.

For a while now, I’ve considered keeping a log of exactly when, where, and how I’m harassed. Today, a guy harassed me for trying to avoid being harassed, and now I’m mad enough to do it. First, we’ll need a little Street Harassment 101:

1. Shouting at women on the street about how attractive they are is not, actually, a compliment, and women do not and should not feel gratified by it. Think about it: when you actually want to make a woman feel attractive and good about herself, do you walk out on the street, wait for the next pretty woman to walk by, and shout at her? No, because that’s rude and threatening. Strange men shouting at women about, basically, how much they’d like to fuck her is generally intended to scare and frighten women.

2. Men generally harass only women who are alone or are with other women, and usually when no one else is around to hear if a woman starts to yell or scream. This is part of why men almost never see harassment.

3. Street harassment has serious effects for women even if it doesn’t lead to violence directly. Besides the knowledge that the men you have to walk by would hurt you even more if they thought they could get away with it, it makes you go to great lengths to avoid that area. When I go to the BART station, I walk the exact same way every time because I’ve been harassed only three times on that path (there is no way to get to BART without being harassed). I now take a taxi to BART and back if it’s a time of day when few people are on the street. “Fine,” you say, “That’s just being prudent.” Oh yeah? Well, when do I get the transportation subsidy to make up for the fact that I have to take a taxi for a 4 block trip and men don’t? I estimate I’ll spend at least $1000 a year more on taxis than a comparable man.

4. Non-verbal harassment is also a problem. Stare at her. Leer. Walk as close to her as you can without actually touching her. My current favorite: Get about 5 other guys, walk down the sidewalk together at night, and whenever you see a woman by herself, spread out and walk past her on both sides, as close as possible. “Oh,” you say, “That’s just because that’s the easiest way for a group of people to pass a person walking alone.” I decided to test the “convenience” theory one night. I was walking down Mission on a very wide sidewalk – about as wide as two cars – when I saw the canonical gang of six guys dressed in black. I decided not to let them walk on both sides of me, so I moved a foot to the right, towards the street. They moved a foot their left. I moved another foot to the right. They matched me. Finally, I was walking literally an inch from the curb, and one of the guys stepped down off the sidewalk into the gutter (filled with water) between two cars to walk around me, just so at least one guy was on both sides of me. Boy, that was convenient.

5. There is absolutely nothing you can do about street harassment. What, are you going to call 911 on a guy who just waggled his tongue at you obscenely? He’ll be gone by the time they get there anyway. The only thing to do when you are being harassed is to act like it’s not happening. Any reaction at all just makes it worse. So you spend a lot of time bottling up your anger and feeling helpless, and occasionally you screw up and actually say something – and it’s usually loud and profane. The last time I lost control and actually said something was last August in Portland. Two teenage guys and a girl (bizarrely) were walking down 18th St near PGE Park, where I’ve been harassed many times. The guys weren’t wearing shirts, which is a pretty good indicator in summer in Portland that you’re about to get harassed. Sure enough, one of the guys said, “Hey pretty, wanna party with us?” just as he was a few feet away from me. I lost it and snapped, “Fuck you!” – which, for reference, is the worst possible thing to say. He laughed and said, “Promise?” The girl giggled. The worst part was that I saw him again that afternoon in Pioneer Square.

Okay, now that we’ve gotten that out of the way, what happened today to get me to start logging this stuff? While walking on 24th street, a fairly nice part of town, a middle-aged guy tried to stop me to ask for directions. For context, when a strange man tries to start a conversation with me on the street, it’s inevitably so that shortly thereafter he can harass me. I kept walking. The guy complained loudly to another pedestrian behind me for a while. Then he caught up to me and had the balls to harass me for trying to avoid being harassed:

Him: “The pedestrians sure are polite here.” Sarcastically.
Me: “Do you know how many times a day I get harassed?”
Him: Pause. “Sure is a friendly city.” A little less sarcastic.
Me: “Guess.”
Him: Pause. “I was just trying to make conversation.”
Me: “So leave me the fuck alone.”

So, yeah, I’m starting the log, and I’m keeping it in a Google map. The dates before yesterday are a little hazy, but I’m writing down the exact time and place when they happen in the future. If you want to log your own harassment incidents, let me know and I’ll give you edit permissions.,-122.4281&spn=0.007227,0.014591&z=16&om=1&msid=109475623440902585657.000441d553174a98a114a

The code monkey’s guide to cryptographic hashes for content-based addressing

At long last, I’ve written and published the “compare-by-hash for programmers” article everyone’s always been asking for. You can read it chopped into 17 pieces and partially obscured by floating ads here:

(My editor says: Please please complain about this! No one believes me when I say this is bad!) Or you can read it one piece with full size tables, etc. here:

I’m always looking for new article suggestions, especially for the Kernel Hacker’s Bookshelf (search down the page for the entry). Writing is fun!

The part of the article that Edward Tufte would be most proud of are the two tables about hash function life cycles:

Stages in the life cycle of cryptographic
hash functions
Stage Expert reaction Programmer reaction Non-expert
(“slashdotter”) reaction
Initial proposal Skepticism, don’t recommend use in
Wait to hear from the experts before adding to OpenSSL SHA-what?
Peer reviewal Moderate effort to find holes and
garner an easy publication
Used by a particularly adventurous
developers for specific purposes
Name-drop the hash at cocktail
parties to impress other geeks
General acceptance Top-level researchers begin
serious work on finding a weakness (and international fame)
Even Microsoft is using the hash function now Flame anyone who suggests the function may be broken in our lifetime
Minor weakness discovered Massive downloads of
turgid pre-prints from arXiv, calls for new hash functions
Start reviewing other hash functions for replacement Long
semi-mathematical posts comparing the complexity of the attack to the
number of protons in the universe
Serious weakness discovered Tension-filled CRYPTO rump sessions! A full break is considered
Migrate to new hash functions immediately, where
Point out that no actual collisions have been
First collision found Uncork the
champagne! Interest in the details of the construction, but no
Gather around a co-worker’s computer, comparing the
colliding inputs and running the hash function on them
Explain why a simple collision attack is still useless, it’s
really the second pre-image attack that counts
Meaningful collisions generated on home
How adorable! I’m busy trying to break this new
hash function, though
Send each other colliding X.509
certificates as pranks
Tell people at parties that you always
knew it would be broken
Collisions generated by hand Memorize as fun party
trick for next faculty mixer
Boggle Try to remember
how to do long division by hand
Life cycles of popular cryptographic hashes (the
“Breakout” chart)
Function 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
SHA-2 family                                    
Key Unbroken Weakened Broken
Hash Function Lounge
has an excellent list of references for the