Stop using file hashes in place of digital signatures (please!)

You may have seen something like this before. You go to download your favourite program SuperApp3000 and on the download page they provide you with hashes (usually MD5, SHA1, etc.) for each of the available files to download. Sometimes they even stress that you should verify that the file you downloaded matches the provided hash or that you should never trust anything you download without first confirming the hashesmatch. This is a prime example of people confusing file hashes with digital signatures and it needs to stop.

Notepad++ is a prime example

Notepad++ is a prime example

What is a file hash

Simply put a file hash is a unique* value built from the contents of the file itself. For example if my file contained the ASCII text “hello there” (without quotes) the resulting SHA1 hash of the file would be 55e82e1eb131597ce6ef77ff775b2c2e5f4d6b45. If I changed the file even by a single character, for example change hello to hallo, the result is drastically different: 34e864a52fcc62ddab7210e1dbfe7edbdc2ae0d8.

So what’s the problem here?

File hashes are a wonderful way to confirm that there hasn’t been any data corruption once the download has finished but they are not a way to confirm that the download you have received is the one you think you’re getting. It may seem counter-intuitive when stated like that but perhaps an example will make it more clear.

Let’s say I have a competing program to SuperApp3000, the cleverly named BestApp3001. Unfortunately SuperApp3000 is getting all of the attention and so I hatch a scheme to discredit the developers and make it appear as though their program is unsafe. Being the nefarious hacker that I am** I have secretly uploaded a virus to the SuperApp3000 download site. Now when users go to download the program that they think is SuperApp3000 what they’re actually getting is something terrible muhahahaha***.

But wait! As unlikely as it is a user could simply compare the downloaded virus to the real SuperApp3000 file hash and know that they don’t match! That’s no problem though, I’ll simply generate new file hashes, this time for my uploaded virus, and replace the ones on their website. Now when the user downloads the virus and compares the hashes (to the hashes that I modified) everything looks correct!

Notepad++ provides hashes (to “check if you’re paranoid”) but those hashes aren’t cryptographically signed and could easily be forged.

Notepad++ provides hashes (to “check if you’re paranoid”) but those hashes aren’t cryptographically signed and could easily be forged.

What is the correct way of doing this then?

The correct way to ensure that the user is getting valid file hashes is to deliver them as digital signatures. A digital signature is exactly the same thing as a file hash but as a last step the hash is wrapped using a key that only the developers have control over. In the above example even if I were still able to replace their program with mine I could not forge the digital signature because I don’t have access to their signing key****.

There are many different ways to digitally sign something but I personally find the easiest method is to simply use gpg. You can even combine it with the regular file hashes you would normally provide and produce something that looks similar to this. Notice how the [MD5/SHA1/etc.] file hashes are still there but there is an extra digital signature bit surrounding it.

So what are hashes good for then?

Standard file hashes can still be a decent way of quickly verifying that no random bits were flipped on their way to you as part of the download. These days the chances of that happening should be pretty slim though as TCP and reliable internet connections really remove most of the potential for download corruptions.


File hashes are good for checking for file corruption, they do not tell you if the file has been altered maliciously. So please, please, please if you notice that your favourite download site is making the above mistake let them know and ask them to correct it. 🙂

* = Yes I know there can be collisions but for the sake of simplicity we can assume a good enough hashing algorithm (*cough* not MD5 and probably not SHA1 *cough*) is protected against malicious collisions if nothing else.

** = For the record I am not a nefarious hacker.

*** = Terrible, evil laugh.

**** = For the purpose of this post I am not going to get into the trust model associated with the developers key (i.e. WoT, CA, whatever).