Shielding your files with Reed-Solomon codes
(Based on an idea I had a decade ago)
Shield my files? Why?You know why!
Have you never lost a file because of storage media failure? That is, have a hard drive or USB stick lose a bunch of sectors (bad sectors) that simply happened to be the ones hosting parts (or all) of your file?
I have. More than once... :-)
The way storage media quality has been going in the last years, it is bound to happen to you, too. When it does, believe me, you'll start to think seriously about ways to protect your data. And you'll realize that there's quite a lot of technology to choose from...
There's no such thing as "enough protection" for your data - the more you have, the better the chances that your data will survive disasters.
What follows is a simple description of a way I use to additionally "shield" my important files, so that even if some sectors hosting them are lost, I still end up salvaging everything.
AlgorithmThe idea behind this process is error correcting codes, like for example the ubiquitous Reed-Solomon. With Reed-Solomon, parity bytes are used to protect a block of data from a specified maximum number of errors per block. In the tools described below, a block of 223 bytes is shielded with 32 bytes of parity. The original 223 bytes are then morphed into 255 "shielded" ones, and can be recovered even if 16 bytes from inside the "shielded" block turn to noise...
Storage media are of course block devices, that work or fail on 512-byte sector boundaries (for hard disks and floppies, at least - in CDs and DVDs the sector size is 2048 bytes). This is why the shielded stream must be interleaved every N bytes (that is, the encoded bytes must be placed in the shielded file at offsets 1,N,2N,...,2,2+N,etc): In this way, 512 shielded blocks pass through each sector (for 512 byte sectors), and if a sector becomes defective, only one byte is lost in each of the shielded 255-byte blocks that pass through this sector. The algorithm can handle 16 of those errors, so data will only be lost if sector i, sector i+N, sector i+2N, ... up to sector i+15N are lost! Taking into account the fact that sector errors are local events (in terms of storage space), chances are quite high that the file will be completely recovered, even if a large number of sectors (in this implementation: up to 127 consecutive ones) are lost.
I implemented this scheme back in 2000 for my diskettes (remember them?). Recently, I discovered that Debian comes with a similar utility called rsbep, which after a few modifications is perfect for providing adequate shielding to your files.
DownloadHere is the source code for my customization of rsbep, a utility that implements the kind of Reed-Solomon-based "shielding" that we talked about. The package includes 32-bit x86 assembly that makes it an order of magnitude faster than plain C ; if however you are not on a 32bit x86 platform, it will fallback to a portable C version instead (a lot slower, unfortunately). rsbep is part of dvbackup, so some Debian users might already have it installed; my version however addresses some issues toward the goal we are seeking here, which is error-resiliency for files against the common, bursty types of media errors. More information on what was changed is below.
The package is easily installed under Linux, Mac OS/X, Windows(cygwin) and Free/Net/OpenBSD, with the usual
./configure make make install
ResultsHere is a self-healing session in action:
home:/var/tmp/recovery$ ls -la total 4108 drwxr-xr-x 2 ttsiod ttsiod 4096 2008-07-30 22:21 . drwxrwxrwt 5 root root 4096 2008-07-30 22:21 .. -rw-r--r-- 1 ttsiod ttsiod 4194304 2008-07-30 22:21 data home:/var/tmp/recovery$ freeze.sh data > data.shielded home:/var/tmp/recovery$ ls -la total 9204 drwxr-xr-x 2 ttsiod ttsiod 4096 2008-07-30 22:21 . drwxrwxrwt 5 root root 4096 2008-07-30 22:21 .. -rw-r--r-- 1 ttsiod ttsiod 4194304 2008-07-30 22:21 data -rw-r--r-- 1 ttsiod ttsiod 5202000 2008-07-30 22:21 data.shielded home:/var/tmp/recovery$ melt.sh data.shielded > data2 home:/var/tmp/recovery$ md5sum data data2 9440c7d2ff545de1ff340e7a81a53efb data 9440c7d2ff545de1ff340e7a81a53efb data2 home:/var/tmp/recovery$ echo Will now create artificial corruption home:/var/tmp/recovery$ echo of 127 times 512 which is 65024 bytes home:/var/tmp/recovery$ dd if=/dev/zero of=data.shielded bs=512 \ count=127 conv=notrunc 127+0 records in 127+0 records out 65024 bytes (65 kB) copied, 0,00026734 seconds, 243 MB/s home:/var/tmp/recovery$ melt.sh data.shielded > data3 rsbep: number of corrected failures : 64764 rsbep: number of uncorrectable blocks : 0 home:/var/tmp/recovery$ md5sum data data2 data3 9440c7d2ff545de1ff340e7a81a53efb data 9440c7d2ff545de1ff340e7a81a53efb data2 9440c7d2ff545de1ff340e7a81a53efb data3For those of you that don't speak UNIX, what you see above is a simple exercise in destruction: we "shield" a file with the freeze.sh script, which is part of my package; we then melt.sh the frozen file, and verify (through md5sum) that the new generated file is exactly the same as the original one. We then proceed to deliberately destroy 64KB of the shielded file (that's a lot of consecutive sectors!), using dd to overwrite 127 sectors with zeros. We invoke melt.sh again, and we see that the new generated file (data3) has the same MD5 sum as the original one - it was recovered perfectly.
Reed-Solomon FS (a FUSE-based filesystem)Based on these tools, I did a quick implementation of a Reed-Solomon protected filesystem, using Python/FUSE bindings:
bash$ poorZFS.py -f /reed-solomoned-data /strongThis command will mount a FUSE-based filesystem in /strong (using the /reed-solomoned-data directory to store the actual files and their "shielded" versions). Any file you create in /strong, will in fact exist under /reed-solomoned-data and will also be shielded there (via freeze.sh). When opening for reading any file in /strong, data corruption is detected (via melt.sh) and in case of corruption the file will be corrected using the Reed-Solomon "shielded" version of the file (which is stored alongside the original, and named as originalFilename.frozen.RS). The .frozen.RS versions of the files are not visible in the /strong directory, and are automatically created (in /reed-solomoned-data) when a file (opened for writing or appending) is closed.
I coded this mini-fs using Python-FUSE in a couple of hours on a boring Sunday afternoon, so don't trust your bank account data with it... It's just a proof of concept (not to mention dog-slow - due to the necessary data interleaving). Still, if your machine is only equipped with one drive, this will in fact transparently shield you against bad sectors, faulty power supplies, messy IDE cabling, etc.
Note: I coded this filesystem adding 20 or so lines of Python (spawning my freeze/melt scripts) into the Python/FUSE basic example. Anyone who has ever coded a filesystem driver for Windows knows why this justifies a heart attack - FUSE (and Python/FUSE) rock!
Changeset from original rsbepIn case you are wondering why I had to modify rsbep here's where my version differs from the original...
ConclusionThese tools works fine for me, and I always use them when I backup data or move them around (e.g. from work to home). As an example, when I move my Git repository around, I always...
cd /path/to/mygit/ git gc cd .. git clone --bare mygit mygit.bare tar jcpf mygit.tar.bz mygit.bare freeze.sh mygit.tar.bz2 > /mnt/usbStick/mygit.tar.bz2.shieldedIf you so wish, feel free to add a GUI layer over them... (I am a console kind of guy - I never code GUIs unless I really have to :-)
And yes, they have already saved my data a couple of times.