Tuesday, September 18, 2012

MultiPar (the spiritual successor to QuickPar)

We archive a lot of data onto CD/DVD, which has never been a reliable medium even if you use the high quality Taiyo Yuden media.  As a result of this issue where a CD/DVD can become unreadable over the course of years / decades, you have to take one of two approaches:

1) Burn a second (or third) copy of every CD/DVD that you create.  The primary downside is that you double the number of disks that you have to keep track of.  If you store the disks in two separate geographical locations, this is not necessarily a bad thing.  But back when media was far more expensive, this also drove up your costs a lot.  You still need to create some sort of checksum / verification data at the file level so that you can validate your archives down the road (such as MD5 or SHA1 hashes).

2) Add some sort of parity / error recovery data to the disk contents.  While CD/DVD media both include Reed-Solomon error correction at the sector level, you can't always get information about how clean the disc is and whether or not it is failing.  In many cases, the first sign of trouble occurs after the point where the built-in error correction is no longer able to do its job.  So you use a program like WinRAR, QuickPAR, par1 or par2 command line programs, or something else to create additional error correction data and add it to the data being written to the media.

An important concept when dealing with long term archival is "recovery window".  In most cases, when media starts to fail, it is a progressive condition where only a few sectors will have issues at the start.  As time goes on, more and more sectors will fail verification and less and less data will be recoverable.  The exception to this is if the TOC (table of contents) track goes bad, which will then require the use of special hardware in order to read any data off of the media.

In the case of the above approaches:

1) Multiple copies -- The recovery window is from the point that you find out that one of the copies has failed until you make a copy of one of the remaining copies that is still valid. Depending on where the physical media is located, this might be a quick process, or it might require a few days to transport media between locations.  The problem comes when multiple copies are experiencing data loss, because you will need to hope that the same files/sectors on both media are not corrupt on all copies.

Note that the multiple copies approach is only recoverable at the "file" level in most archive situations.  Most verification codes are calculated at the file level, which means a file is either completely good or completely bad.  Unless the file contains internal consistency checks, you cannot combine two damaged files to create a new undamaged copy.

2) Error correction data -- Again, the recovery window starts at the point in time where you first discover an issue.  But because the error correction data lives on the disk next to the raw data, you are able to immediately determine whether the media has failed to the point where data is actually lost.  Some of the tools (QuickPar in particular) used to create verification data can even recover disks where the file system has been corrupted by digging through the block level data and pulling out good blocks.

Note that the two approaches are not exclusive to each other.  For the truly paranoid, creating two copies of the media along with dedicating 5-20% of the media's capacity to error correction will give you lots of options when dealing with "bit rot".

So, back to the original point of the posting...

We used to use QuickPar to create our recovery data.  It was written for Windows XP and had a nice GUI which made it quick to bundle up a bunch of files and create recovery data for those files.  Speed was fairly good, but it never did multi-threading nor did it ever support subdirectories.  It has also not been updated since the 2003-2004 timeframe, so is a bit of a "dead" project.

The successor to QuickPar, for those of us wanting a Windows program with a GUI, seems to be MultiPar.  I stumbled across this from Stuart's Blog posting about MultiPar.  Even though the download page is written in Japanese, MultiPar does have an English GUI option.  Just look for the green button in the middle of the page which says "Download now" and look at the filename (such as "MultiPar121_setup.exe").