Why is the stack so constrained?

Question

You learn fairly early on that the call stack is small and everything on it must have its size known ahead of time by the compiler. Anything large and/or dynamically-sized goes on the heap. This is just the way it is.But I was thinking about it: it's all just memory; why can't the stack expand to fit the data (assuming that data doesn't need to be returned outside of the current stack frame)? And perhaps even more strange, why can't more memory be requested for dynamically-sized values as needed? Why does everything on the stack need to have a statically-known size?And then I found this C function that seems to do just this (but is very rarely used): https://linux.die.net/man/3/allocaThe best explanation I could gather from reading different discussions on the internet is:1) You don't put large amounts of data on the stack because the stack is small2) The stack is small because memory address space in 32-bit systems was really limited so you didn't want to eat it up reserving stack space for each thread3) Stack frame sizes are known at compile time because... that's the way C did it? Maybe it allows for some compiler optimizations? Or something?Am I missing something, or is this a vestigial idea? Wouldn't cache usage be way better if large/dynamic data structures could go on the stack? Has anyone explored this?

qlk1123 · Accepted Answer

While your question is pretty general, your perspective is quite limited to C language. Some other languages, such as Go, can have dynamic-sized stack allocation. (Actually, the user does not even bother stack or heap is used in Go)Take Linux, a giant user of C language, as an example. Allocating a large chuck of memory on stack is just not useful, even given unlimited amount of memory. You note the most critical reason yourself in the post: (assuming that data doesn't need to be returned outside of the current stack frame) However, large data structures are almost always for others to use. Just consider sys_clone (that the kernel eventually generates a body for the new thread) or sys_mmap (that the kernel manipulates existing virtual memory area structures from elsewhere). Allocating them on the stack seems pointless.

maltalex · Answer

What you may be missing is that the stack is a lot faster than the heap.
First, stack allocations and de-allocations are extremely cheap. Allocating a known amount of stack memory requires nothing more than moving a pointer. The same goes for de-allocation.
Second, accessing stack memory is typically a LOT faster than heap memory since it's continuous and can be loaded in big chunks into the CPU's cache.

bjourne · Answer

Large stack allocations are discouraged on modern CPUs because call stacks are placed in relatively small stack segments. If these overflow you get stack overflows. You can use the limit command to see default size of the stack segment on your machine. On mine it is eight megabytes which is almost nothing. :)

In "normal" C code, the layout of all call frames (stack frames) is fixed. The location of all local variables relative to the top of the stack is known at compile-time. However, in functions that use alloca, or Variable Length Arrays (VLAs), the layout of the call frame is potentially different in each function call. Therefore, the compiler has to insert code to keep track of where all local variables are and it also makes some compiler optimizations harder.

Look at the machine code generated for the following function using godbolt.org. It is way more complicated than if a and b's sizes had been compile-time constants:

    void fun(int x, int y) {
        int a[x], b[y];
        b[5] = 10;
    }

Dynamically sized call frames are a PITA to implement correctly and efficiently which I suspect is the reason why alloca/VLAs are poorly supported in many compilers.

There's some good discussion here about VLAs and alloca: https://stackoverflow.com/questions/1018853/why-is-the-use-o... Although be aware that some of the answers are wrong so read the comments too.

Even with good support for VLAs/alloca, large stack allocations violates the assumption that the stack segment is large enough. You may be a tidy coder, so you check whether your mallocs returns NULL, but I bet you have never ever checked whether you have enough stack space for a function call. :) Most of the time this works fine because relatively little data is placed on the stack (lots of recursive tree-traversal code, however, breaks badly). So you have to keep track of how much stack space you have left and grow the stack segment if you are nearing exhaustion. For example, if you only have 100 bytes left but want to create a 200 byte local variable, you would allocate a new stack segment, perhaps twice as large as the old one, copy the old stack to the new segment and continue from there. Unfortunately, it would be a major performance drag since you would need to check that you have enough stack space for every dynamic stack allocation and for every call frame.

Some virtual machines use guard pages to check for stack overflows. But the amount of overflow they can detect is limited to the size of the guard page. For example, if you have 64 kb guard pages they would be too small to detect overflows caused by 100 kb stack allocations. So you don't get around the need for explicit checks.