Wednesday, November 03, 2010

bash script to verify ZIP or RAR archives

Bash script that will check all ZIP files in the current directory and all sub-directories. Valid files are left alone. Files with problems (encrypted, password protected, CRC errors, not a ZIP file) get renamed with various extensions.

#!/bin/bash

find . -type f -name '*.zip' -print0 | while read -d $'\0' file
do
    # skip loop iteration if file no longer exists
    if [[ ! -f "$file" ]] ; then continue; fi
    #echo "$file"
    
    unzip -t -qq "$file"
    RETVAL=$?

    # return values (error codes) are generally unique to each archive tool
    case $RETVAL in
        # 0=success, archive file is okay
        0) 
            echo "OK: $file"
            ;;
        
        # codes that indicate a broken archive
        3) 
            echo "BROKEN($RETVAL): $file"
            mv "$file" "$file.broken"
            ;;
            
        # codes that indicate file is not a supported archive
        9) 
            echo "WRONGFORMAT($RETVAL): $file"
            mv "$file" "$file.wrongformat"
            ;;
            
        # user pressed ctrl-C or break or killed the process    
        80) 
            echo "USER ABORT: $file"
            ;;
            
        # codes that indicate an encrypted archive
        81) 
            echo "ENCRYPTED($RETVAL): $file"
            mv "$file" "$file.encrypted"
            ;;
            
        # codes that indicate a password-protected archive
        82) 
            echo "PASSWORD($RETVAL): $file"
            mv "$file" "$file.password"
            ;;
            
        # other error codes
        *) 
            echo "ERROR($RETVAL): $file"
            ;;
    esac
done

Same script, but changed to handle RAR files. The error codes are defined as an enum in "errhnd.hpp". An approximate list is:

SUCCESS=0, WARNING=1, FATAL_ERROR=2, CRC_ERROR=3, LOCK_ERROR=4, WRITE_ERROR=5, OPEN_ERROR=6, USER_ERROR=7, MEMORY_ERROR=8, CREATE_ERROR=9, NO_FILES_ERROR=10, USER_BREAK=255

Code 3 is commonly returned if files within the RAR archive have errors. Code 10 may indicate that the RAR archive is not actually a RAR file (it might be a misnamed ZIP or 7Z archive).

#!/bin/bash

find . -type f -name '*.rar' -print0 | while read -d $'\0' file
do
    # skip loop iteration if file no longer exists
    if [[ ! -f "$file" ]] ; then continue; fi
    #echo "$file"
    
    unrar t -idq "$file"
    RETVAL=$?
    
    case $RETVAL in
        # 0=success, archive file is okay
        0) 
            echo "OK: $file"
            ;;
        
        # codes that indicate a broken archive
        3) 
            echo "BROKEN($RETVAL): $file"
            mv "$file" "$file.broken"
            ;;
            
        # probably indicates wrong format (ZIP or 7Z file as RAR)
        10) 
            echo "WRONGFORMAT($RETVAL): $file"
            mv "$file" "$file.wrongformat"
            ;;
            
        # user pressed ctrl-C or break or killed the process    
        255) 
            echo "USER ABORT: $file"
            ;;
            
        # other errors
        *) 
            echo "ERROR($RETVAL): $file"
            ;;
    esac
done

Both scripts work fine under Cygwin, but in order to get "unrar" for Cygwin you will have to download the UnRar portable source from RARLabs.

As a final note, here the script to undo the name change, for cases where the script doesn't work as expected. It searches for any files ending in '.broken' and strips that portion of the file name back off. This can be used for other mass-renaming activities with minor edits. It may be useful to change sed to use ':' as the delimiter instead of '/' (i.e. sed 's:.broken$::g') to make things easier to write.

#!/bin/bash

find . -type f -name '*.broken' -print0 | while read -d $'\0' file
do
    # skip loop if file no longer exists
    if [[ ! -f "$file" ]] ; then continue; fi
    #echo "$file"
    
    NEWF=$(echo "$file" | sed 's/.broken$//g')
    mv "$file" "$NEWF"
done