HACKER Q&A
📣 ghoward

Why don't file systems and OS's provide file system transactions?


This is inspired by [1].

Quite frankly, I don't know why filesystems don't provide these things.

I have read that Windows has a transactional API, but they've actually deprecated it! [2] They say it's because few programs use it.

I mean, sure, that might be true, but I bet it's really important for those programs that do use it.

Bonus question: why does Windows hide its equivalent of `openat()` in the NT API? [3] Rust code seems to claim its fundamental to the NT kernel [4], so why is it not exposed?

[1]: https://news.ycombinator.com/item?id=32190032

[2]: https://docs.microsoft.com/en-us/windows/win32/fileio/deprec...

[3]: https://docs.microsoft.com/en-us/windows/win32/api/winternl/...

[4]: https://github.com/rust-lang/rust/blob/1c63ec48b8cbf553d291a...


  👤 chrsig Accepted Answer ✓
i don't think it's enough to just say "provide transactions" -- that's way too general.

what sort of semantics would you want out of the transaction? atomicity? isolation? durability? how should concurrency behave? should there be a mvcc implementation?

linux provides atomic writes up to 4k. moves on the same fs are also atomic.

fsync ensures writes are durable and written to disk (allegedly[0])

file advisory locks can be used to ensure mutual exclusion. or memory mapping in the file to shared memory and allocating a mutex in it (libapr provides a few options for interprocess mutual exclusion)

...but in reality, if you need transactional semantics, you're really just better off using a database. because the database developers will have a much better idea of the nuances that applications need from transactional semantics than the kernel devs will.

and if you want your program that requires transactional semantics to be portable, major database vendors have already dealt with inconsistencies across multiple major operating systems. because of that the database gives one system to handle transactions, versus pushing the portability concerns onto each individual application.

[0] https://news.ycombinator.com/item?id=19119991


👤 dataflow
> I have read that Windows has a transactional API, but they've actually deprecated it! [2]

IMHO you can probably ignore their deprecation and keep using it. The set of things MS deprecates and the set of things they actually remove from the OS are quite different. IIRC their own components depend on FS transactions and I don't see them rewriting their own components anytime soon. However, note that even without deprecation, transactions can fail for a variety of reasons (not just conflicts), so you'll need fallbacks anyway.

> why does Windows hide its equivalent of `openat()` in the NT API?

I don't know for certain but I've always imagined it's because (a) Win32 programmers (or for that matter, most programmers) are used to the path-based API, and (b) it would be much slower to perform manual traversal level-by-level, and (c) I think in practice there aren't that many common scenarios where the race condition can realistically turn into a security vulnerability.


👤 codeflo
As a general principle, I think low-level APIs shouldn’t provide abstractions that are both expensive and have no clear “best” design: in those cases, you want your applications to have the ability to make different trade-offs, rather than being locked in to a design that might be suboptimal for your usecase. I think sibling comments explain nicely how that’s the case for transactions; I just wanted to point out why that matters in the big picture. Similar arguments can be made for other higher-level features that someone might wish their OS provided, like (tracing) GC.

👤 i_have_to_speak
> Quite frankly, I don't know why filesystems don't provide these things.

They do. F2FS [1] does. There were attempts to add them to xfs/ext4 too, but they petered out, probably because of lack of interest.

[1] https://www.kernel.org/doc/html/latest/filesystems/f2fs.html


👤 vivegi
That is because almost all applications use filesystem calls (directly or indirectly), but only some apps may need a transactional API.

Consider a process P1 using a fictional transactional API in an OS and is accessing the path `/a/b/c` and is creating some files under directory `c`.

Consider another process P2 executing a `mv /a/b /x`.

P1 uses the transaction API, but P2 is not. So, under the covers the system calls will all have to use the new transactional API to ensure global correctness. That is asking a lot of the kernel and possibly makes a lot of legacy programs slower.

The other question to answer is: do we want ACID properties or eventual consistency to be guaranteed by the transaction? What to do when some processes want ACID guarantees and some processes are okay with eventual consistency? How does the kernel handle concurrent running of these processes in contention with the same resources under two different transaction semantics.

These are some of the reasons why the transaction management is better handled in userspace.


👤 rvdginste
I have wondered about that too. But when I think about a file system transaction, I immediately think about a file system transaction enlisting in an ambient transaction, together with a database transaction. This would make code much more simple/clean for cases where you must create a file and store metadata on the file inside the database. And on the other hand, some database systems do provide features for storing large file-like blob objects, which give you these transactional features.

So I think it depends on the context and what you wanna use it for. I don't see the transactional features of a file system as useful for actual users that are directly interact with files on their file system. It seems more useful in the context of applications that maintain files and where you likely do not want the user to directly interact with those files.


👤 marcell
Move in Unix is atomic, which handles a lot of the common use cases for a transactional file system.

👤 zamalek
It's trivial to implement with CAS (e.g. Git is a transactional file system). That's a lot of code/time/money/attack surface to spend on the kernel when it is so easy to do in userspace.

👤 bhawks
Transactions are for databases, databases are for sets of 1 or more processes that are coupled to, permissioned for and willing to cooperate with each other.

Transactional operations bring the chance of deadlocks. Deadlocks cause performance and denial of service implications. Deadlocks are far more easier to detect vs prevent vs avoid. Detected deadlocks are resolved by killing one of the requests, which must be handled by the cooperating and highly coupled processes.

The filesystem is an abstraction of convenience and very loose rules. Instead of all the structure and rigor a database brings a program just gives a string to the os and gets bytes back. The cost of this ease of use is that you must keep your program's demands and expectations low.


👤 wmf
I read that the Tandem NonStop OS from the 1980s had a built-in transaction manager (because the OS was designed to run databases) and they built their filesystem on top of the transaction manager which gave them filesystem transactions for "free".

👤 TheAceOfHearts
I think macOS at some point supported file system transactions as well. If you look through the AppleScript docs there used to be references to file system transactions, but it appeared incomplete and undocumented.

Would love to hear more details if anyone is knowledgeable of this arcane history.


👤 jbverschoor
I guess the number of comments explain

👤 nobozo
Mike Stonebraker and others are working on an OS that is based on a database.

Take a look at https://vldb.org/pvldb/vol15/p21-skiadopoulos.pdf


👤 jasfi
Concepts like transactions are often about trade-offs between performance and features (which introduce complexity). It's likely that the OS architects realized that if devs wanted transactions they'd use a database.

👤 dmpk2k
ZFS does, although to fully exploit it you'll need to make DMU calls.