Опасная зона

Опасная зона

Wednesday, March 25, 2015

Ideas for Safe and Secure Long Term Data Storage

Over time I have gathered a quite substantial amount of data I want to keep in a safe place, such as raw data from work, personal data, collections of rare music etc.

Image is totally unrelated to this post and was just added for decoration purposes.

The question arise, how to store these data safe and recoverable and prevent data rot (some call this bit rot, apparently a confusion of terms) which may flip single bits. A single bit flip in a .tar.gz file might be catastrophic, so some preventive strategy is needed.

In a nutshell, this is what I came up with:
  • Use multiple non-identical HDDs for backup.
  • Use BTRFS file system for primary storage, and backup. On one secondary HDD use a filesystem other than BTRFS.
  • Use LZIP for compressing your tarballs.
  • Use par2 for extra recovery possibilities of your tarballs.
  • Use rsync for copying/backup files exclusively, and always check the output by a dry run (option -n) first.
Here goes the reasoning: Regarding the physical storage media, my backup media will mostly spend its life in a drawer, with only few occasional uses per year.
  • Media such as CDs/DVDs are not reliable for more than a few years of storage, and highly unpractical in handling.
  • USB memory sticks may retain data for 5 or 10 years, but I simply would not trust my data to this medium, no idea of what quality the memory chips are.
  • HDDs are convenient to use, but may fail catastrophically without any notice ahead after a few years. I never experienced this, but often had HDDs which started making weird sounds, creating bad blocks etc, sometimes already after 2 years of use.
  • SSD drives have no real track record yet, and some suggest even that their life is even shorter than that of HDDs. I do not really trust this.
  • Various tape decks may have good life-time, but I think these are clumsy to handle, and the more clumsy gadgets get, the less likely I will use them. Also again, I have no idea which vendor to choose, who has a proven track record, etc.
  • RAID1 (or higher) systems may give false sense of security. Data are stored in one place, and accidental use rm -rf is still fatal.
The only viable solution I could come up with, is based upon the expectation that at least one or two simultaneous possible point of failure will happen, and then I still want to be able to recover my data. 

Another utterly unrelated image to cheer this post up.

I decided to go with three external USB HDDs for backup/storage. Since I expect an average HDD lifetime of no more than five years, these will mirror each other. In particular:
  • One disk will act as a primary storage and accessed frequently. The two secondary HDDs are used to backup the primary disk.
  • One of the secondary HDDs is from a different vendors than the other two HDDs.
  • The two HDDs from the same vendor were not bought at the same time, in order to avoid batch errors from this particular vendor.
  • All three HDDs are not kept at the same place, so even if a meteor strikes, a copy of data should be safe. (OK, depending on the size of the meteor, of course.)
Next consideration is the file system. Most Linux PCs use ext4 as default, but I decided to store all my precious data on a BTRFS system, since this file system has intrinsic error recovery (switched on by default) which prevents data rot.
  • On my desktop PC, the partition which holds precious data, is formatted BTRFS
  • Two of the backup HDDs were formatted with the BTRFS file system: the primary HDD, and one of the two secondary HDDs.
  • One of the secondary backup HDDs is formatted ext4, just in case some grave bug in BTRFS should turn up and render my partitions unreadable.
  • All disks are encrypted using LUKS. This is probably the weakest point, if LUKS breaks for some reason (or I forget the passphrase), I will be in trouble.
Finally, I have added some software level redundancy:
  • Instead of compressing large archives to .tar.gz, I compressed them to .tar.lz using the lzip compression algorithm. Unlike gzip and xz, which can only detect errors, lzip is capable of recovering from a few bit flips. Tar supports lzip natively:
tar cvfa FOOBAR.tar.lz FOOBAR/
should do the trick.
  • I keep particular important directories also in uncompressed form.
  • On top of this, I use par2 (see https://github.com/BlackIkeEagle/par2cmdline - it is also in the Debian/Ubuntu repository) with the tar.lz files, which adds 5% redundancy using Reed-Solomon error recovery. Apart of data rot, this may also catch (and recover from) errors which occurred during copying of data from one disk to another. My desktop PC has no error correcting (ECC) RAM, so in principle one bad cell can make havoc on my data.
  • When copying data, I mostly use rsync. Rsync always checks if the file was properly reconstructed on the receiving side. Good thing!
  • I always make a dry run first with rsync -van, and check what files will be overwritten or possibly deleted.
  • In case of copying with other tools than rsync, I can test whether the data actually are the same using the --checksum option in combination with the -n option. This takes very long time, and is unfeasible for routine use. I also use --checksum -n  when I suspect problems for other reasons, which is very rare.
  • Just in case everyone will forgets how LZIP, TAR and PAR2 works, I added the uncompressed tarballs to the disk.
The three backup disks are arranged in a hierarchy, where I make regular backups from my Desktop/Laptop to the primary backup disk, with is formatted with the BTRFS filesystem. A few times per year, I then propagate the data from this primary disk to the two secondary BTRFS and ext4 HDDs.

Safe and secure storage of your data may require proactive work from your side.
To date I do not see any viable set-and-forget solutions. (Picture entirely unrelated to this post.)

Advantages of this method are obvious:
  • all 4 HDDs (the one in the Desktop/Laptop and the three backup HDDs) must catastrophically fail at the same time, before I suffer data loss. 
  • Rsync should prevent propagation of errors due to bad RAM during copying.
  • Even if single bits are flipped, both par2 and lzip should be able to recover the medium.
  • Along with rsync dry runs, the delayed propagation to the secondary disks and physical separate storing of the HDDs, I have multiple points where I can detect fumble fingered actions, like accidentally erasing important data and propagating this to other disks.

Hope this may inspire someone. Suggestions are more than welcome. :)