HACKER Q&A
📣 warrenm

What filesystem(s) are best suited for long-term use?


Given the need for digital preservation in anticipation of current-and-future digital archaeology, beyond mere file formats (and/or descriptions of how to convert them), what file systems are best suited to long-term digital data storage and retrieval?


  👤 lizknope Accepted Answer ✓
Do you want to put a bunch of data on a hard drive, tape, DVD, whatever, and leave it for 10 to 100 years and come back hoping it works?

I think this is a horrible strategy. Data needs to be verified periodically and migrated to new formats.

I have files that started on floppy disk in 1991. Over the years they migrated to hard drives, QIC tape, PD phase change optical discs, CD-RW, DVD+RW, and back to hard drives. The file systems have changed from FAT16, ext2, ISO9660, UDF, ext3, ext4.

I strongly suggest creating a list of checksums. A simple for loop on every file running md5sum, sha256sum, or similar and storing along with the filenames. You can then run again, compare, and see if all of the data is still intact.

Some filesystems like btrfs and zfs will do the checksum calculation every time you read the file.

Personally I use ext4 and run cshatag on all files every 6 months. This stores a timestamp and sha256 checksum as ext4 extended attribute metadata. If the file contents have changed but the timestamp has not then it will report corruption.

https://github.com/rfjakob/cshatag

You can also create parity info with Parchive. These are basically "sidecar" files with the same file name as the main file with a suffix. If there are errors in the file you can use parchive with the parity data to reconstruct the file. You can also adjust how much parity data to create (more parity takes more space)

https://en.wikipedia.org/wiki/Parchive


👤 fatfreddie
What are you trying to achieve? A file system, any system for that matter, is only good for as long as there are drivers to decode the format in which they are written. My concern would be to store whatever data you have in the simplest format you can think of and thoroughly document it. I don't think any file system is trying to solve that problem. To me the things that spring to my mind are cpio, tar, zip, torrent, zsync, etc.

👤 PaulHoule
Do expect to keep the data on an active array (so that you can scrub it periodically as ZFS would?) or are you keeping it on some write-once medium (optical disks?)

👤 thesuperbigfrog
Please define "long-term".

5 years?

10 years?

50 years?

How many years do you mean by "long-term"?


👤 aborsy
ZFS!