I'd even broaden the question and ask how did windows 95 stay so small?
Is it possible to recreate this level of efficiency on modern systems? Im curious because I'm interested in creating simulation video games. Dwarf Fortress and Rimworld eventually both suffer from the same problem: CPU death.
If I create a window with C++ and SFML, 60 MB of RAM is used (not impressive at all). If I put 3,000,000 tiles on the screen (using Vertex Arrays), 1 GB of RAM is used (admittedly, that is impressive) and I can pan all the tiles around smoothly.
What other tricks are available?
We thought about things in terms of how many instructions per pixel per frame we could afford to spend. Before the 90s it was hard to even update all pixels on a 320x200x8bit (i.e. mode 13h) display at 30 fps. So you had to do stuff like only redraw the part of the screen that moved. The led to games like donkey kong where there was a static world and only a few elements updated.
In the 90s we got to the point where you had a pentium processor at 66 Mhz (woo!) At that point your 66Mhz / 320 (height) / 200 (width) / 30 (fps) gave you 34 clocks per pixel. 34 clocks was way more than needed for 2D bitblt (e.g. memcpy'ing each line of a sprite) so we could beyond 2D mario-like games to 3D ones.
With 34 clocks, you could write a texture mapper (in assembly) that was around 10-15 clocks per pixel (if memory serves) and have a few cycles left over for everything else. You also had to keep overdraw low (meaning, each part of the screen was only drawn once or maybe two times). With those techniques, you could make a game where the graphics were 3D and redrawn from scratch every frame.
The other big challenge was that floating point was slow back then (and certain processors did or didn't have floating-point coprocessors, etc.) so we used a lot of fixed point math and approximations. The hard part was dividing, which is required for perspective calculations in a 3D game, but was super slow and not amenable to fixed-point techniques. A single divide per pixel would blow your entire clock budget! "Perspective correct" texture mappers were not common in the 90s, and games like Descent that relied on them used lots of approximations to make it fast enough.
How is it so small? No external dependancies (uses stock Haiku packages), uses the standard C++ system API, and written by a developer that learned their trade on restrained systems from the 80’s. Look at the old Amiga stuff from that era.
Lower-res samples, if any.
Lower framerate expectations - try playing Descent 2 on an emulated computer with similar specs to the lowest specs suggested on the box. Even one in the middle of the spec range probably didn't get a constant 60fps.
More hand-tuned assembly (RCT was famously 99% assembly, according to Wikipedia; this was starting to be unusual but people who'd been in the industry a while probably did at last one game in 100% assembly, and would have been pretty comfortable with hand-optimizing stuff as needed).
Simpler worlds, with simpler AI. Victory conditions designed to be reached before the number of entities in the game overwhelmed the CPU completely.
Simpler models. Most 3d games you play now probably have more polygons in your character's weapon than Descent would have in the entire level and all the active entities; they certainly have more polys in the overall character.
I mean, really, three million tiles? That's insane by that day's standards, that's a bit more than 1700x1700 tiles, a quick search tells me the maximum map size in Roller Coaster Tycoon 3 was 256x256, and it's a fairly safe assumption that RCT1 was, at best, the same size, if not probably smaller. I can't find anything similar for Sim City 2000 but I would be surprised if it was much bigger.
Some games nowadays are built using Electron, which means they include a full web browser which will then run the game logic in JavaScript. That alone can cause +1000% CPU usage.
Unity (e.g. RimWorld) wastes quite a lot of CPU cycles on things that you'll probably never use or need, but since it's a one-size-fits-all solution, they need to include everything.
For Unreal Engine, advanced studios will actually configure compile-time flags to remove features that they don't need, and it's C++ and in general well designed, so that one can become quite efficient if you use it correctly.
And then there's script marketplaces. They will save you a lot of time getting your game ready for release quickly, but they are usually coded by motivated amateurs and super inefficient. But if CPUs are generally fast enough and the budgets for game developers are generally low, many people will trade lower release performance for a faster time to market.
=> Modern games are slow because it's cheaper and more convenient that way.
But there still are tech demos where people push efficiency to the limit. pouet.net comes to mind. And of course the UE5 demo which runs on an AMD GPU and a 8-core AMD CPU:
Now with CSS frameworks, JS frameworks, ad frameworks and cross-linking, pages take forever to load with even less actual content.
Old-school games would have been forced to redesign the game so that that wasn't a problem that gets faced. For example, none of the games you list try to simulate a deep 3d space with complex pathfinding. Warcraft II I know quite well and the maps are small.
One of the reasons systems used to be highly efficient was evolutionary - it was impossible to make resource-heavy games, so people kept iterating until they found good concepts that didn't require polynomial scaling algorithms with moderate-large N inputs.
Naughty Dog Co-founder Andy Gavin discusses various hacks that were used on the Playstation to get Crash Bandicoot to run smoothly. The fuller version is also worth watching.
Most people seem to think programmers of yore were smarter, and generally, that's probably true -on average-. I mean, there weren't boot camps back then.
That aside though, the scene has changed. Write your super efficient perfect game, and nobody will play it. Look at Ludem Dare, and all the JSxK competitions. Those smart people still exist today.
But the landscape has changed so much, that consumers want the pretty graphics, heavy resources, social aspects. They don't care if it's 1kb or 1tb.
In short, people of yesterday were resource constrained so had to write smart hacks. People today have many more resources available, and use those. Both are using every bit of what they have available.
For ennemies AI, ok: we made progress. But for human-vs-human gameplays, we basically had everything in the nineties.
Warcraft II over Kali (to simulate a LAN over the Internet for Warcraft II didn't have battle.net) was basically the gameplay of Warcraft III reforged. Half-Life and then it's counterstrike mod were basically the FPS of today.
Diablo II (development for Windows 1995 started in 1997 and game came out in 2000, so not 2000 per se but nearly) was basically so perfect from a collaborative gameplay point of view and was the perfect "hack n slash" game.
It was so perfect that a few weeks ago Blizzard released the exact same game, with an identical gameplay, even down to most bugs (some were fixed, like the eth armor upgrade bug). It's the same game, but with modern looking graphics.
Games were so good that more than two decades later you can re-release the exact same gameplay with updated graphics.
That's how perfect things already were.
So very hard disagree that "games had simpler gameplay".
Many game genre back then had already reached perfection from the gameplay point the view.
Since then we added shiny.
As part of a Kickerstarter campaign, Morphcat Games made this video explaining how they eked out a really incredible game with only 40 Kb (a lot like Super Mario Bros 2). I definitely recommend checking this out as they go over interesting compression methods and more general thought processes.
People bought bleeding-edge games to show off their conspicuous consumption. Everyone else played them years later. People like me would stay late and play Decent, Quake, Warcraft, etc (RCT was a trailing edge game) on our work computers, because we certainly didn't have those kinds of specs at home until much later.
Reading about the tricks that were used to make the most out of the limited resources is fascinating to me as well! We should not take our current hardware for granted!
I'm sure RimWorld can be made more efficient; but it actually runs fairly well on my cheap laptop. There is, essentially, no real need. And any time spent on making the game run faster is taken away from all other development tasks (fixing bugs, adding features, etc.)
https://www.lexaloffle.com/pico-8.php
Also JS13k
I know that's not the same but the concepts are similar. The core of most of those games is only a few k, everything else is graphics and/or interfacing with modern OSes.
Older systems had much lower resolutions, they also used languages that were arguably harder to use. In the 80s it was BASIC (no functions, only global variables and "GOSUB"). Variable names generally were 1 or 2 letters making them harder to read. Or they used assembly.
90s (or late 80s) C started doing more but still, most games ran in 320x240, sprites were small, many games used tiles, many hardware only supported tiles and sprites (NES, Sega Master System, SNES, Genesis). It wasn't really until 3DO/PS1 that consoles had non-tiled graphics. PC and Amiga always did had bitmapped graphics but Atari 800, C64, the majority of games used the hardware tiled modes.
Funny you should mention Rollercoaster Tycoon because it was actually written in Assembly for performance reasons.
The public was accustomed that the software should be nimble and run fast. If you would have given the regular Electron app to the '90s public they would have been displeased.
Hardware limits and developers being more thoughtful and less lazy are pretty much the answer.
In the 90s most software was C/C++ running on hardware. Now we have layers upon layers of abstractions. VMs, Kubernetes, Docker, jitted languages, OOP codebases written against Uncle Bob's preachings and GoF patterns. And the jitted language is best scenario. Many software is written in interpreted languages.
If anyone cares to make a trivial benchmark, it would be telling: write a simple C program and time it. Write the same in Java or .NET, but use a class for everything and introduce some abstractions like builders, factories etc. Run the program in a container in a virtalized Kubernetes cluster. Time it.
Dwarf Fortress and Rimworld are both interesting topics on their own though, and while I'm dreadfully underfamiliar with their codebases I do love the games to death. I'd guess that if you profiled them, the heaviest slowdown would be accessing/manipulating memory with hundreds of values across thousands of instances. Both games are extremely heavy-handed in their approach to simulation, so it wouldn't surprise me if that constituted the bulk of their usage. Dwarf Fortress itself is an interesting case study though, because it's world generation can take hours. As more years pass, it takes longer to generate, which is probably a testament to how many interconnected pieces it's weaving.
Most of games these days weight so much due to high-res assets. 4K textures weight a lot that's why game with single map like COD: Warzone takes 200GB -.-
As we move more into realistic graphic I think this trend will reverse as this can be simulated with ML at real-time already. So you will get away with custom-ML-compressor and low poly assets that will be extrapolated at run time.
Proceed from the other direction: you have those resources, what creatively can you do with them?
I think the three big differences are:
- nothing was "responsive", everything targeted a (pretty small) fixed resolution. Fonts were bitmap, ASCII, and fixed-width. No need to run a constraint solver to do layout.
- object orientation was rare. Data would be kept in big arrays. No virtual dispatch, minimal pointer-chasing (this is bad for pipelines), no reflection, no JITs, no GC pauses.
- immediate-mode GUIs rendering on a single thread. Draw everything in order, once. If something's off screen, ignore it as much as possible.
You can see all these in play in the Windows 3.0 -> 95 era. Classic "Windows Forms" apps render in response to a WM_PAINT message, in the (immediate) thread of the message loop, directly into the framebuffer. This could be seen by moving a window on top of a crashed app or part thereof - moving it away would leave a white space. Classic Windows apps are also not responsive - all the positions for controls are fixed pixel locations by the designer, so they look stupidly tiny on high resolution monitors. ( https://docs.microsoft.com/en-us/previous-versions/windows/d... )
Microsoft tried to supercede this with WPF/XAML and the compositing window manager Aero, but it's not been entirely successful as you can see by the number of obvious Forms dialogs you can find from Control Panel.
> Dwarf Fortress and Rimworld eventually both suffer from the same problem: CPU death.
Simulation games tend to expand to fill the space available. It's very easy to have complexity runaway of O(n^2) if you're not careful, which gets much worse at high numbers of objects. The trick there is to constrain the range of interactions; both what interactions are possible and over what range, so you can keep them in a quadtree/octree/chunk system and reduce the N input to each tick of the simulation.
Further reading: not just Masters of Doom, but Racing The Beam.
it is far easier, and faster, to use a library that is bad than to write what you need efficiently.
"CPU is cheap." "RAM is cheap." two of the many cancerous sayings that developers use to excuse their poor efforts.
we don't do this now because we simply do not care enough as an industry. we want the software written immediately and for as little money as possible.
At that time PC did not have hardware-accelerated scrolling and sprites like consoles and Amiga, so many PC ports of console games had higher hardware requirements and lower fps.
Windows 95 did not have a reputation of a small OS. It was seen as a complete waste of RAM for MS-DOS games. Microsoft had to invest a lot of effort to turn that around (with DirectX).
It's a matter of perspective. You can already wonder how Portal 2 is so small compared to multi-GB games. In 30 years when GTA 6 comes out with 7TB day-one patch, we're going to marvel how GTA 5 could fit in mere 30GB.
The short of it is that game devs used to write code for the hardware as well. Most code now is written for a layer in between, but now there's so many layers that code is unreliable and slow.
Every abstraction layer is designed to be multi-purpose, meaning not optimized for a single purpose. As these layers accumulate, opportunities for optimization are lost.
A 90s game like Rollercoaster Tycoon was coded in assembly language on bare metal and limited OS services.
A modern game might have game logic coded in Lua, within a game engine framework coded in C++, which calls a graphics API like OpenGL, which all runs in an OS that does a lot more.
These modern layers are still clever and efficient and optimized using tricks as far as they can be, and the increased resource requirements come mostly from the demand for higher resolutions, higher polygon counts, and more graphical special effects.
That being said, better art is also very expensive. Consider expensive it is to display a 4k screen versus a vga screen with 256 colors.
Small bitmaps, with indexed palettes and run-length-encoding to further shrink them.
Caches were tiny, memory was slower, hard disks massively slower, and if you had to hit a CD to load data, forget about it. So packing data was important, and organizing data for cache locality, and writing algorithms to take advantage of instruction pipelining and avoid branches.
Fabian Sanglard has a great blog that goes into the nitty gritty of a lot of these techniques as used in different games.
It's less about old games being optimised so much as modern non-game software being mind-blowingly wasteful.
Modern, well optimised AAA stuff like Doom 2016 or The Last of Us 2 is as much a work of genius design (if not more) as Rollercoaster Tycoon or Warcraft II. If anything, Vulkan is bringing us closer to the metal than ever before.
It's just a general shift in consumer expectation over time. There is no longer any pressure for regular apps to be performant, so they aren't.
For instance checking if a mouse button is pressed on the Amiga is one assembly instruction to test a bit in a memory-mapped register. Compare that to all the code that's executed today between the physical mouse button and some event handler in the game code.
On 8-bit home computers like the C64, the video hardware was basically a realtime image decompressor which decoded a single byte into an 8x8 pixel graphics block, without CPU intervention. Scrolling involved just setting a hardware registers to tell the video hardware where the framebuffer starts instead of 'physically' moving bytes around. For audio output the CPU just had to write some register values to control how the sound chip creates audio samples.
On the PC the video and sound hardware is much more generic, much less integrated with the CPU, and there's a driver layer to hide hardware differences and mediate hardware access between different processes and on top of that are even more operating system and game engine layers, and that's how we got to where we are today :)
You could do the same thing today but spend at least 4-5 more time developing your software.
For example, I was a kid and used to fill the screen using Look up tables fo colors, you had access to the frame buffer and do things like displace the elements in the table, as a cycle, the entire image will change giving a movement impression. The work was done by the electronics of the framebuffer.
In the past the people that had access to real terminals did the same, change the information of the text on screen and the hardware will render it beautifully(even prettier that today's terminals). But those terminals were super expensive and proprietary(tens of thousands of dollars). Today shells are standardized, free and open source(and usually use much ugly fonts)with fonts from all over the world(terminals have very limited character sets).
Also game consoles did the same. The hardware did draw tiles that were previously drawn. You told the console what tiles to draw and the hardware did most of the work.
>Is it possible to recreate this level of efficiency on modern systems?
Of course you just need to pay the price: Development is so slow and painful. Low level programming is prone to very hard to find bugs.
Programming things like FPGAs or GPUs is hell in life. You work super hard, you get little reward.
And I guess at a certain world size good performance would need less detailed simulation in other parts of the game world.
Some of the current game developers have lost that art because most of the games that are developed by these developers are on top of some massive framework.
One classic example of what level of understanding of the platforms earlier game developers had was to look at the very well known function from Quake for inverse square-root: https://en.wikipedia.org/wiki/Fast_inverse_square_root
Another big problem they have to solve is how to make the crash character fits into the system with too many "models"
- The difference between 8 or 16 bit graphics at 320x200 30 fps and modern hdr 2k at 120 fps is orders of magnitude less data to manage.
- Software developers were not abstracted as far away from the hardware as they are today.
- Most games ran on DOS which was basically “bare metal” vs windows/multitasking/background processes..
- And you HAD to make efficient use of the compute and ram when dealing with limited resources, which honed the skills of serious game devs.
3D accelerators, to a large extent, changed what we (game developers) needed to focus on. If I'm doing a software rasterization of a bunch of triangles, then the speed at which I sit in that tight loop rendering each pixel is of utmost importance. Getting a texture mapping routine down to 9 clock cycles matters. The moment I worked on a game that used 3d accelerators, that all changed. I submit a list of triangles to the 3D accelerator, and it draws them. It rarely matters how much time I spend getting everything ready for the 3D accelerator, since it is rendering while I am processing, and my processing for all intents and purposes never takes longer than the rendering.
Once we had that architectural paradigm shift, the clock cycles it takes for me to prepare the drawing list are entirely irrelevant. The entire burden of performance is in getting the 3d accelerator to draw faster. This means that optimization is in reducing the work it has to do, either through fancy shading tricks to simplify what it takes to make something look right, or reducing the amount of geometry I send to the 3d accelerator to draw.
The neat thing is that we've gone full circle in a way. When it used to be important to hand optimize the inner loop of a rasterisation function, now we have pixel, vertex, and fragment shaders. The pixel and fragment shaders are getting run millions of times, and we used to have to code shaders in shader assembly. Now they're generally written in higher level shading language, but nonetheless, optimizing shader code often becomes quite important. It feels like the same work as assembly optimization to me.
I don't have any answer. But I am interested in the transition that took place mid-90s from 2D to 3D. The latter SNES era pseudo-2.5D sprite animations are masterpieces of digital art. And then: discontinuity. Low-poly first gen 3D game design arrives. And no one knows where to put the camera.
One thing I have noticed is that dev cycles are roughly the same. It takes roughly the same time to create a game for N64 or PS5: 18 months. But one is 50 megs the other 50 gigs. They took the time and care back then to get things right. Although many releases were still unplayable disasters
"What were the worst video game excesses of the 90s?" It's another question to learn from!
Coming from another angle, the developers at that time considered those resources to be an amazing amount of plenty. Compared to, say, an 8086 with 128k or 256k of RAM, no HDD and maybe 512k or 1.44MB of floppy storage, those specs were huge.
If you're looking for other kinds minimal examples, I recommend glancing through the demoscene (https://en.wikipedia.org/wiki/Demoscene) stuff. To some degree, the "thing" there was amazing graphical outputs for minimal size.
IIRC assembly language expert Michael Abrash makes an appearance.
I 100% recommend it!
> CPU > Z80 @ 3.25 MHz[4] > Memory > 1 KB (64 KB max. 56 KB usable)
Now that boggles my mind.
Constraints can fuel creativity as well; the first Metal Gear, for the MSX, was supposed to be a Commando competitor, but because of the constraints of the hardware, it turned into a stealth game [0] (there is a better story from an interview with Hideo but I cannot find it now).
Just copy and paste the solution but conform it to our networking or state storage system.
The bottlenecks for most apps I work on (not games) are somewhere else, like the network.
So there's a lot of crap code that makes a lot of money.
Unfortunately the abstraction comes at a cost - in CPU cycles.
The gist of programming console games of that period is I suspect somewhere between an Arduino and a Raspberry Pi.
Casey Muratori explains it well: https://youtube.com/watch?v=pgoetgxecw8
It can still be done today of course, but it is less of a necessity. Teams choose to spend that time working on other aspects of the game instead.
"js13kGames is a JavaScript coding competition for HTML5 Game Developers running yearly since 2012. The fun part of the compo is the file size limit set to 13 kilobytes."
The results are really impressive !
There weren't a zillion layers of OS underneath.
esp the 3d dir, tons of assembly.
I'm guessing most of those tiles are off-screen, and you're creating a mesh for an entire game level, not just the visible area?
The thing that changed with the shift to 16 and 32-bit platforms was the opening of doing things more generally, with bigger simulations or more elaborate approaches to real-time rendering. Games on the computers available circa 1990, like Carrier Command, Midwinter, Powermonger, and Elite II: Frontier, were examples of where things could be taken by combining simple 3D rasterizers with some more in-depth simulation.
But in each case there was an element of knowing that you could fall back on the old tricks: Instead of actually simulating the thing, make more elements global, rely on some scripting and lookup tables, let the AI be dumb but cheat, and call a separate piece of rendering to do your wall/floor/ceiling and bake that limit into the design instead of generalizing it. Simcity pulled off one of the greatest sleights of hand by making the map data describe a cellular automata, and therefore behave in complex ways without allocating anything to agent-based AI.
So what was happening by the time you reach the mid-90s was a crossing of the threshold into an era where you could attempt to generalize more of these things and not tank your framerates or memory budget. This is the era where both real-time strategy and texture-mapped 3D arose. There was still tons of compromise in fidelity - most things were still 256-color with assets using a subset of that palette. There were also plenty of gameplay compromises in terms of level size or complexity.
Can you be that efficient now? Yes and no. You can write something literally the same, but you give up lots of features in the process. It will not be "efficient Dwarf Fortress", but "braindead Dwarf Fortress". And you can write it to a modern environment, but the 64-bit memory model alone inflates your runtime sizes(both executable binary and allocated memory). You can render 3 million tiles more cheaply, but you have to give up on actually tracking all of them and do some kind of approximation instead. And so on.
Sure, you can recreate this level of efficiency on modern systems. But you have to throw most of the modern systems away.
httpdito is 2K and comes close enough to complying with the HTTP spec that you can serve web apps to any browser from it: http://canonical.org/~kragen/sw/dev3/server.s. Each child process uses two unshared 4K memory pages, one stack and one global variables, and there are three other shared memory pages. On my laptop, it can handle more load than all the HTTP servers on the entire WWW had when I started using it in 01992. It's only 710 lines of assembly language. I feel confident that no HTTP/2 implementation can be smaller than 30 times this size.
BubbleOS's Yeso is an experiment to see how far you can get wasting CPU to simplify doing graphics by not trying to incrementally update the screen or use the GPU. It turns out you can get pretty far. I have an image viewer (121 lines of C), a terminal emulator (328 lines of C), a Tetris game (263 lines of C), a real-time SDF raymarcher (51 lines of Lua), a death clock (864 lines of C, mostly actuarial tables), and a Mandelbrot browser (75 lines of Lua or 21 lines of Python), among other things. Most of these run in X-windows, on the Linux frame buffer, or in BubbleOS's own windowing protocol, Wercam. I haven't gotten around to the Win32 GDI and Android SurfaceFlinger ports yet.
On Linux, if you strip BubbleOS's terminal emulator executable, admu-shell https://gitlab.com/kragen/bubbleos/blob/master/yeso/admu-she..., it's only 35 kilobytes, though its glyph atlas is a JPEG which is another 81K. About a quarter of that is the overhead of linking with glibc. If you statically link it, it ends up being 1.8 megabytes because the resulting executable contains libjpeg, libpng, zlib, and a good chunk of glibc, including lots of stuff about locales which is never useful, just subtle bugs waiting to happen. There's a huge chunk of code that's just pixel slinging routines from these various libraries, optimized for every possible CPU.
Linked with shared libraries instead, an admu-shell process on this laptop has a virtual memory size (VSZ in ps u) of 11.5 megabytes, four megabytes of which are the pixmap it shares with the X server, containing the pixels it's showing on the screen. Several megabytes of the rest are memory maps for libc, libm (!), libX11, libjpeg, and libpng, which are in some sense not real because they're mostly shared with this browser process and most of the other processes on the system. There's a relatively unexplained 1.1-megabyte heap segment which might be the font glyph atlas (which is a quarter of a megapixel). If not I assume I can blame it on libX11.
The prototype "windowing system" in https://gitlab.com/kragen/bubbleos/blob/master/yeso/wercaμ.c only does alpha-blending of an internally generated sprite on an internally generated background so far. But it does it at 230 frames per second (in a 512x828 X window, though) without even using SSE. The prototype client/server protocol in wercamini.c and yeso-wercam.c is 650 lines of C, about 7K of executable code.
Speaking of SSE, nowadays you have not only MMX, but also SSE, AVX, and the GPU to sling your pixels around. This potentially gives you a big leg up on the stuff people were doing back then.
In the 01990s programs usually used ASCII and supported a small number of image file formats, and the screen might be 1280x1024 with a 256-color palette; but a lot of games used 640x480 or even 320x240. Nowadays you likely have a Unicode font with more characters than the BMP, a 32-bit-deep screen containing several megapixels, and more libraries than you can shake a stick at; ImageMagick supports 200 image file formats. And you probably have XML libraries, HTML libraries, CSS libraries, etc., before you even get to the 3-D stuff. The OS has lots of complexity to deal with things like audio stream mixing (PulseAudio), USB (systemd), and ACPI, which is all terribly botched, one broken kludge on top of another.
The underlying problems are not really that complicated, but organizationally the people solving them are all working at cross-purposes, creating extra complexity that doesn't need to be there, and then hiding it like Easter eggs for people at other companies to discover through experimentation. Vertical integration is the only escape, and RISC-V is probably the key. Until then, we have to suck it up.
Most of this doesn't really affect you, except as a startup cost of however many hundreds of megs of wasted RAM. Once you have a window on the screen, you've disabled popup notifications, and you're successfully talking to the input devices, you don't really need to worry about whether Wi-Fi roaming changes the IP address the fileserver sees and invalidates your file locks. You can live in a world of your own choosing (the "bubble" in "BubbleOS"), and it can be as complex or as simple as you figure out how to make it. Except for the part which deals with talking to the GPU, I guess. Hopefully OpenCL 3.0 and Vulkan Compute, especially with RADV and WGSL, will have cleaned that up. And maybe if the underlying OS steals too much CPU from you for too long, it could tank your framerate.
To avoid CPU death, use anytime algorithms; when you can't use anytime algorithms, strictly limit your problem size to something that your algorithms can handle in a reasonable amount of time. I think GPGPU is still dramatically underexploited for game simulation and AI.
Unreal Engine 5's "Nanite" system is a really interesting approach to maintaining good worst-case performance for arbitrarily complex worlds, although it doesn't scale to the kind of aggregate geometry riddled with holes that iq's SDF hacking excels at. That kind of angle seems really promising, but it's not the way games were efficient 30 years ago.
Most "modern systems" are built on top of Blink, V8, the JVM, Linux, MS-Windoze, AMD64, NVIDIA drivers, and whatnot, so they're saddled with this huge complexity under the covers before the horse is out of the barn door. These systems can give you really good average-case throughput, but they're not very good at guaranteeing worst-case anything, and because they are very complex, the particular cases in which they fall over are hard to understand and predict. Why does my Slack client need a gibibyte of RAM? Nobody in the world knows, and nobody can find out.
The answer is: Everything.
If you are cynical, you could say that all of the resources are just being wasted on bad software that is bloated and coded lazily by less skilled people who only care about money. If you are optimistic, you might point to advancements in both the C++ language that requires more resources as well as improvements to compilers, like ridiculously powerful optimizers, static analysis, and language features, as well as the modularization of the compiler.
I think the truth is basically a bit of both, though probably mostly the latter. Software naturally expanded to require more because the hardware offered more. When computers ran 486s and Pentiums, spending 10x more resources on compile times was probably not a popular proposition. But computers becoming faster made compilers faster, too, so compilers could make more use of those resources. At the same time, they could clean up their codebases to remove hacks that reduce memory or CPU utilization at the cost of making code difficult to read, debug, understand, and modularize code to allow for compilers that are easier to expand, can provide language servers, support multiple language frontends, and support compiling to multiple architectures transparently.
What does this have to do with games? Well, a DOS game would more or less write directly to VGA registers and memory, but doing so is a sort-of fraught-with-peril situation. It’s hard to get 100% right, and it exposes applications to hardware implementation details. Plenty of games behaved strangely on some VGA cards or soundblaster clones, for example. Using an abstraction layer like DirectX can be a dramatic improvement as it can provide a higher level interface that shields the application from having to worry about hardware details, and the interface can do its best to try to prevent misuse, and drivers for it can be tested with extensive test suites. If a VGA card behaves differently from what a game expects and it doesn’t work, you’re SOL unless the game is updated. If a Direct3D game doesn’t work properly, even if it is ultimately due to a hardware quirk, it can usually be fixed in software because of the abstraction.
SFML is even further up. It is a library that abstracts the abstractions that abstract the hardware, to allow your application to run across more platforms. There could be three or four layers of abstraction to bare metal. They all do important things, so we want them. But they also introduce places where efficiency suffers in order to make a more uniform interface. Also, a modern app running on a modern OS incurs costs from other similar improvements: language runtimes, OS security improvements, etc. sometimes offer improvements that come at non zero CPU and memory costs that we were willing to eat because computers improving was more than offsetting it.
Programmers today have different skills and programs today have different challenges. It’s only natural that the kinds of things that made Sim City 2000 run well are not being applied to make modern games on modern computers.
In addition, the source code for both RollerCoaster Tycoon and Transport Tycoon Deluxe have been decompiled by volunteers and released as OpenRCT2 and OpenTTD respectively. So we can actually get a glimpse at how the games worked originally.
Disclaimer: I am not an expert in both of these games, and the following examples may be wrong. In any case, you can just take these as hypothetical made-up examples.
As far as I remember, both games have a very fast heuristic for wayfinding.
In RCT the agents ("peeps") just wander most of the time (at the very least 50%) and just pick a path at random at each intersection. This is obviously very cheap even for many agents. Peeps also have the possibility to seek out an actual goal (like a specific ride in your park), but even then peeps do not employ global pathfinding to reach that target, but again they just check at each intersection which path would lead them closest to their target and move there.
This works well in most cases. But it is well-known to RCT players that certain patterns of paths can lead to peeps getting stuck in cycles when the heuristic fails. In addition, at least in RCT1, for that reason double paths were completely broken, as every path segment is an intersection and it becomes evident that peeps wander completely aimlessly.
The thing is: Players usually see this as a challenge rather than an annoyance. The game design even incorporates this, for example by giving you a negative "award" for a "confusing park layout", or scenarios which are built around the gimmick that the pre-existing path layout in the scenario is actively horrible and should be reworked asap. The problems of double paths actually make the game better as those are a naive and overly cheap solution for overcrowding (another mechanic which penalizes you for having to many peeps on a stretch of path at the same time.)
Another example of technical limitations turning into important gameplay elements can be seen e.g. in Starcraft. Only being able to select 12 units at the same time, and having wonky pathfinding, especially in chokepoints, is completely outdated. Still, having to deal with these limitations is actually considered part of required skill in high-level play.
In addition, some of these limitations are actually realistic, as people in a real amusement park will not have a perfect map of the park in their heads and just wander around like the peeps do. In game you also have the possibility to sell park maps, which are consulted by the peeps when they look for something. Theoretically, even if the game had implemented a perfect pathfinding algorithm, you could still cut down on the opportunities where you need to run that algorithm by tuning how often the peeps will check that map. Peeps also have a mechanic where they have a "favorite ride" which they seem to visit all the time. When they stay in close vicinity of the ride, even targetted pathfinding gets very easy.
Transport Tycoon actually had pretty much the same pathfinding algorithm as RCT. One of the first things that OpenTTD did was reworking the pathfinding. As you are building a railway network in that game, splitting the network into an actual graph, keeping that structure in memory, and running A* or something on it, is actually not that inefficient.
It seems that it would have been possible even on slower PCs, but remember that Chris Sawyer wrote the game in assembly, and having a simple pathfinding algorithm actually really helps managing the complexity. After decompiling to a higher-level language like C++, resolving these issues became much easier.
I also remember playing The Settlers 2 a lot as a kid. This game had a weird mechanic where you had to place flags on the map and you built paths between each of those flags. Each of those path segments would then be manned by one of your infinite supply of basic settlers. Those settlers would never leave the path, instead just carry items from one flag to the other, handing it off to the next settler. I have never found an explanation for this design decision, but I am pretty sure that the reason is that you don't have to induce a graph into your roadway system as the player is pretty much building the graph of the road network themselves.
Windows 95 took all that away for the better.