HACKER Q&A
📣 j1elo

ZFS without ECC RAM, how bad is it really?


There are lots of common misconceptions around ZFS, which I've been able to clarify during my learning to build a RaspberryPi-like NAS box (a ThinkCentre M910 Tiny). For example, a popular myth is that ZFS eats GBs of RAM to work properly, but it turns out that's only true if deduplication is enabled.

But how about not using ECC RAM? (Which my box doesn't support)

The most prominent writeup for this topic is a TrueNAS forums post called "ECC vs non-ECC RAM and ZFS" [1]. The author claims that bad non-ECC RAM will cause an explosion of corrupted files in ZFS systems, much much worse than what could happen on non-ZFS ones.

The post is 8 years old, but the problem still seems relevant today, as just 20 days ago I saw an HN user commenting about how they suffered a similar issue [2].

On the other hand, I've seen comments pointing out that ZFS without ECC RAM is not as bad as it has been implied [3]. Some commenter links to a couple places in HN ([4], [5]) where they debunk the logic behind the corruption scenario given in [1].

I guess a fair question would be: What do you want from your NAS?. What I want is to have a relatively unnattended NAS that I can use for weeks or months as my computer's backup, without it silently corrupting my backup files. If a bad non-ECC RAM stick can cause whole file system corruption during a ZSF scrub, I might prefer running Ext4 with something like hashdeep [6], so I can detect the issue, change the bad RAM, and upload fresh backups to the box.

Comments and anecdotes are appreciated!

[1]: https://www.truenas.com/community/threads/ecc-vs-non-ecc-ram-and-zfs.15449/ (Under What happens when non-ECC RAM goes bad in a ZFS system?)

[2]: https://news.ycombinator.com/item?id=29492980

[3]: https://news.ycombinator.com/item?id=27553356

[4]: https://news.ycombinator.com/item?id=8293025

[5]: https://news.ycombinator.com/item?id=14207520

[6]: http://md5deep.sourceforge.net/


  👤 j1elo Accepted Answer ✓
Links from the question: (I didn't realize they wouldn't render as clickable links in the question text)

[1]: https://www.truenas.com/community/threads/ecc-vs-non-ecc-ram... (Under "What happens when non-ECC RAM goes bad in a ZFS system?")

[2]: https://news.ycombinator.com/item?id=29492980

[3]: https://news.ycombinator.com/item?id=27553356

[4]: https://news.ycombinator.com/item?id=8293025

[5]: https://news.ycombinator.com/item?id=14207520

[6]: http://md5deep.sourceforge.net/


👤 wmf
The example from [1] is misleading because it assumes that multiple ECC errors will hit the same block, which is unlikely in practice. It also assumes that the alternative to ZFS is hardware RAID which might have applied to servers ten years ago but is not the case now.