Edit: to clarify what I mean by fragility, it's how complex software, when changed, is likely to break with unexpected bugs, i.e., fixing a bug causes more.
I found Out of the Tar Pit[1] somewhat useful. I thought the back half of the paper was disappointing (sorry, functional is not the cure to all problems, and state is something inherent and we must deal with it), but the definition of "essential complexity" and "inessential complexity" from that paper are invaluable, and too often I see people/devs/PMs going "simpler is better" where "simpler" would not address essential complexity: i.e., their simpler == broken, for the use case at hand.
But once you have that, then when you see a fragile system, you can start looking it through a more productive lens of "okay, what of this must I keep, and what complexity can I dispense with?"
Sussman wrote an essay in 2007 called "Building Robust Systems" https://groups.csail.mit.edu/mac/users/gjs/essays/robust-sys... it's not study, admittedly, but an example of the term under which you might find what you're seeking.
I think the relevant academic research areas would be software resiliency, software reliability and error recovery, static and dynamic analysis, fuzzing, as well as conceptual frameworks like LANGSEC[1].
[1]: https://langsec.org/
https://scholar.google.com/scholar?q=software+fragility
A lot of the results are about seismic simulations, but some are about software defects:
Fragility of evolving software - https://dial.uclouvain.be/downloader/downloader.php?pid=bore...
Software is not fragile - https://hal.archives-ouvertes.fr/hal-01291120/document
Overcoming Software Fragility with Interacting Feedback Loops and Reversible Phase Transitions - https://www.scienceopen.com/hosted-document?doi=10.14236/ewi...
Agile or Fragile? - The Depleting Effects of Agile Methodologies for Software Developers - https://core.ac.uk/download/pdf/301378665.pdf
My general process is once you've found an article, like the "Backstabber's knife collection: A review of open source software supply chain attacks" from my link, read through it with a text editor open. Make notes like a summary of what they did, what they found, strengths and weaknesses, and what I like to call "the rabbit hole". If you see a reference to something you're curious about, find what the citation for that piece of information is and follow up with a reading of that article next. Repeat until you've exhausted the findings and move on to the next interesting reference.
[1] https://scholar.google.com/scholar?as_ylo=2019&q=software+su...
API versioning, API deprecation
Code bloat: https://en.wikipedia.org/wiki/Code_bloat#
"Category:Software maintenance" costs: https://en.wikipedia.org/wiki/Category:Software_maintenance
Pleasing a different audience with fewer, simpler features
Lack of acceptance tests to detect regressions
Regression testing https://en.wikipedia.org/wiki/Regression_testing :
> Regression testing (rarely, non-regression testing [1]) is re-running functional and non-functional tests to ensure that previously developed and tested software still performs as expected after a change. [2] If not, that would be called a regression.
Fragile -> Software brittleness https://en.wikipedia.org/wiki/Software_brittleness
Perhaps because software is now the interface between different systems, and we are desperately trying to abstract away the underlying system yet the details are eventually somehow leaked and cause other issues? Perhaps because complexity is similar to multiplication, physical systems are limited by 3D space and softwate systems can become entangled without bound? Just some naive thoughts.
Think of Jenga tower: the more blocks you have, the more fragile it is. And that is despite Jenga being nicely layered (each block has limited number of direct dependencies).
There are two main ways to decrease fragility:
1) lay out your blocks more carefully.
2) decrease the total number of blocks.
What's interesting, is that most software development practices focus on 1. How to make complex system less fragile? Use a big framework, do unit tests, use static typing, have protected branches, write documentation.
While the biggest payoffs are always in the reduction of blocks. KISS.
Software is hard because it is the last 10% ... Hardware was the first 90%, but anyone with project experience knows how long the last 10% lasts.
Even if there is redundancy in hardware to catch a bad bit, software contains a lot of inter-connected logic in which there is no mitigation for an unexpected, incorrect value.
There are chains of dependencies such that the correct behavior is a giant conjunction of prpositions: if this works, and this is correct, and this configuration is right, and, and .... then we get stable behavior with good results. Conjunctions are fragile; one incorrect proposition and the conjunction is false.
As someone who writes a lot of complex/evolving data analysis software that needs to work correctly, I find some of the considerations listed in the above to be immensely helpful.
The backbone of the internet.
It’s still not working , and it’s been two days.
Software is cobbled together with ducktape and cowchips.
After 10 years in the field. My take is:
We’ve lied to ourselves so much that we believe our own lies