HACKER Q&A
📣 packetlost

How do you choose a checksum algorithm for serialized datastructures?


I'm building a library that serializes blocks of data to disk in 4MB~ or so increments. What would be a sufficient number of bits to allocate for checksums (ie. CRC32, CRC64, MD5, etc.) on those blocks such that corruption, torn writes, etc. can be found?


  👤 accrual Accepted Answer ✓
I would use SHA128, 256, or 512. As far as I know, MD5 and SHA1 can be considered broken from a security standpoint.

I think more information is needed. If you're storing files PAR2 can help. If you're hashing passwords, bcrypt and scrypt should be investigated.

Securing the database against bitrot, etc. would be another question entirely.


👤 codetrotter
You could also relegate this task to ZFS.

https://en.m.wikipedia.org/wiki/ZFS

ZFS is a file system that is available on FreeBSD, and Linux, and more!