Hello,
Thanks for checking out the WGBH Media Library and Archives’ blog for our first Technical Tuesday. We’ll be sharing some of the techniques we use in our daily digital preservation and access processes. First up, creating MD5 checksums for files.

What’s a MD5?
A MD5 checksum hash is a value composed of 32 digits that can be calculated from a digital file to verify integrity and looks like this 9aee1a70c2055b5eaba6dcb73ffe42cc

At WGBH we generate and compare MD5 values every time we copy a file from one storage medium to another. If the MD5 value is not identical between the source and copied file, it means there was a change to the file somewhere during the transfer and the files are not identical.

We generate and store MD5 checksums for every file we preserve. When we run processes to check the integrity of our digital files, it’s important we have a base value to compare to.

Systems and software we use:
– Computer with Mac OS X 10.5 or higher
– “Terminal” application included with OS X

Generating an MD5 for a file is simple.
Open the Terminal application.
Type

$ md5 /folder/path/to/your/file/example.txt

Press “return”
That should return a value that looks similar to this:

MD5 (/folder/path/to/your/file/example.txt) = 9aee1a70c2055b5eaba6dcb73ffe42cc

That is the MD5 checksum for that example text file.

If you wanted to save that MD5 value to a separate csv report file you can do this:

$ md5 /folder/path/to/your/file/example.txt >> /path/to/your/report/file/md5_report.csv

Press “return” and you’ll find a new file created in the folder, /path/to/your/report/file/, called “md5_report.csv”. Inside it will have the filename MD5 output for the original file.
In the WGBH Media Library and Archives, we generate a MD5 csv report file for an entire directory on files on a hard drive using these commands:

$ cd directory
$ find "$(pwd -P)" -not -path '*/\.*' -type f -exec md5 '{}' \; >> /path/to/your/destination/folder/md5_report.csv

Once we have that, we can compare those MD5 values to another list to verify files have been copied successfully.

It’s important to note that there are other checksum algorithms besides MD5 that are more unique, such as SHA-256.

To calculate the SHA-256 value:

$ shasum -a 256 /folder/path/to/your/file/example.txt

The value should look something like this
53971fee91ae8530f32dad213d76aac0cc5cf9cb9771e6268b7568e791de0327.

We don’t use SHA-256 yet at WGBH because the preservation software and systems are not yet making use of it.

Check back here every Tuesday for more tips!

One thought on “MD5 Checksum – Technical Tuesday

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s