This makes me suspect that the real reason why mainframes continue to exist are due to industry inertia, vendor lock-in or even legacy code rather than any performance/cost reasons.
Not that legacy will do better, but keeping your own fate in own hands does feel better. And these are at this point pretty proven systems. They most likely would have already failed.
I would rather be without Netflix and Google, than banking and food ... but each to their own..
While some is inertia (mostly doing to rewriting truly large applications are hard and expensive), there is also the the point that most of those industries cannot easily handle "eventually consistent" data..
Not all transactions are created equally, the hardest usually have a set of requirements called ACID.
ACID in the classic RDBMS is not a random choice, but driven by real requirements of their users (the database user, i.e. applications in the business sense - and not the users as people). The ACID properties are REALLY hard to do in scale in a distributed system with high throughput. Think of the rate of transactions in the bitcoin system (500k/day with many, many "servers") vs. visa (500M+/day) - the latter is basically driven by two (!) large mainframes (with 50ish km distance) the last I heard of any technical details.
None of the companies you mention need to have strict ACID, as nobody will complain if different users see slightly different truths - hence scaling writes is faily easy.
You spend a lot of money, and the best case is you end up with the same services as before.
(IBM should send me money for this tagline).
Seriously though, there are businesses where sending the data to a third party is absolutely impossible. You can spend a lot of money building your own private cloud... or, you can buy a mainframe, which will be cheaper, and if your industry is mainframe-friendly, there will be a lot of support and accumulated experience.
It feels like non mainframe world decided to not put effort into hardware reliability and tries to fix it at the architecture and software lvl, which is kinda sad.
> Why hasnt cloud killed the mainframe
A mainframe is typically a very powerful machine/cluster. You may be able to get those in the cloud (although I doubt you can get one with 40TB RAM, for example) , but why pay a premium to rent instead of buy?
> Why do newer companies not use mainframes
Some do, but with your examples, its because they have been engineered to scale horizontally - i.e. using lots of less powerful machines.
> Are mainframes used because of lock-in
In some cases, sure. But there are many cases where big powerful machines are needed.
You're also often dealing with very sensitive information. Zenbleed style attacks are an unacceptable risk, so you'll need separate hardware from the rest of the cloud anyway. Maybe you can find a data center that's reliable and secure enough to put all of that sensitive data, but there's a good chance you'll be wiring up your basement with fiber optics if you're dealing with finance.
The cloud, as in "other people's computers", is horrifically expensive. If you're going to set up mass throughput systems, you'd better start your own data center. This is expensive as well, but it's not impossible.
If you want scalability to reduce power consumption and deal with burst workloads, you'll have to separate out your processing systems from your storage systems. Your average data center probably runs a lot of iSCSI or similar remote disk tech, possibly based on some kind of software wrapper at the cost of latency and performance but with the benefit of quickly swapping drives and expanding capacity.
Then you'll have to architect your software, of course; if you're doing batch work, you'll probably want to distribute programs in batches and run similar programs on similar chips, making optimal use of data locality and CPU cache. Maybe do the whole Hadoop thing depending on your workload.
You'll also want to figure out maintenance. You can hire a team of your own techs dealing with replacements or upgrades, but in many cases hiring external talent for a limited amount of time per month is probably cheaper and saves you the effort of keeping your workers trained.
When a machine fails and the SKU you've selected has gone out of production you'll need to figure out a replacement. This doesn't have to be a problem, but servers are fickle things and you'll probably want to use something that works with the other server vendors' tools, so you're stuck buying hardware from a limited number of suppliers in a limited number of configurations.
For management, you'll want to pretend all the computers are part of the same system. Whether you pick Kubernetes or OpenStack, you want central control to save you the headache of a million dashboards.
When you're done, you've built yourself a mainframe, except you've spent your own money on R&D. Have you saved money compared to buying overpriced mainframe hardware? Tough to say, mainframe hardware is often better at its job than normal servers. You'll save money on developers, as you no longer need to keep the old COBOL around, but when that happens you've probably paid more than you've saved when the decade long project to rewrite the backend to another language is finally done.
Mainframes are just computers good at batch jobs. They're not better or worse than regular computers, they're just different.
They are not at all a trash or a product of corruption or nepotism as many tend to think. That stuff works all the time, and did since ~1960, while we serfs spend our lives fixing bugs resulting from never-ending updates in that hodgepodge of javascript libraries our "efficient", "cheap", "FOSS-based" products are made of.
We all don't use mainframes because marginal or downright dodgy business cases of our products simply won't pay for it, thus we are stuck in this race to the bottom.
I used the phrase "quiver of mainframes" as the compute tasks are best served by having a cluster of architectures - dedicated RAM in the tens of terrabytes, dedicated fast local storage in the petabytes, large custers of computer nodes optimised for pipelined throughput, others optimised for hyper cubed deep mesh computations, other architures again purely for graphical representations, etc.
Google and Amazon don't use "the cloud" in the normal sense of a third-party public offering. They own the infrastructure that is also used by other customers as a cloud so in one sense it is private cloud but closer to on-prem than anything else.
Many other large businesses run on-prem instead of "the cloud" because the cost savings of a op-ex cloud system start to diminish when you already have your own infrastructure/networks/specialist staff as these large businesses have. Again, this is often a mix of traditional on-prem infrastructure and private clouds that offer some sharing of resources to applications that do not need an entire physical server.
Netflix, I think is a mixture of on-prem and public cloud but not sure.
So I'm not sure if you are asking "why mainframes instead of distributed e.g. microservices systems" or "why on-prem instead of using the cloud".
"""
Many factors from cost to regulatory requirements to cryptography and other security requirements play a role in making a decision to run a core banking platform in the cloud. Whilst the cloud is good for managing certain services (which we indeed use), it becomes a challenge to manage a bank’s core banking platform.
Once infrastructure is in a cloud, it is outsourced to that cloud provider. That means that the business is bound by those agreements covering aspects such as scalability, usage, capacity-on-demand and disaster recovery. These costs could grow exponentially as volumes grow. By using our own infrastructure for our core banking platform, we have full control over these factors.
Our chosen mainframe solution provides the ability for us to grow exponentially, while controlling all factors (CPU, Memory, Disk, Network, DR, Remote capability, etc) allowing us to manage the environment efficiently and effectively.
"""
When you have reliability/continuity as a top business requirement and ACID transactions (e.g. billing) must be processed at scale, then mainframes shine.
The argument "Never change a running system." is not a wrong one, but it does not on its own explain the existence of mainframes; there are non-legacy scenarios where using a mainframe is the most reasonably choice. Finally, "cloud" as a term denoting outsourced compute/storage capacity can also apply to mainframes, see e.g. IBM's pricing brief at https://www.ibm.com/downloads/cas/YM94KV6N - some own their mainframe, some rent it - like a prive or public (internal or external) cloud.
No business in their right mind would want to be on out of date, proprietary technology with hard to source skills, but the cost and effort of migration is enormous. There are scare stories of SAP migrations costing $billions and I assume a mainframe migration could be multiples of this.
What would be interesting is if they let actual enlightened techies size up and run these projects, rather than giving to Accenture and the like. Maybe it wouldn’t be insurmountable then?
Lack of impetus
There are plenty of neobanks out there that have built scalable, secure infrastructures using modern development practices in the cloud. But despite growing rapidly, none of them has the scale to challenge the big banks. Similarly in areas like the airline industry, the big airlines are all old and back-end technology is rarely the deciding factor in whether or not one airline is more efficient than another. The simply isn't as strong an impetus to change as you'd expect.
Risk
There is very little incentive for any given executive at one of these firms to take the risk involved in staking their reputation on a big technology migration. Equally, there's nothing quite like a failed transformation project to destroy the careers of those associated with it. If you think mainframe is antediluvian, take a look at the ERP software the same companies are running. Layer upon layer of legacy with custom code built to manage myriad edge-cases that nobody understands anymore. Why take the risk when you can build a new system that integrates with the mainframe using (for example) a modern database that gives you a modern transactional API while micro-batching updates back to the mainframe. The incentives are all to create additional cruft.
Outsourcing
Most if not all of the big banks, airlines etc. have outsourced considerable parts of their operations over the years. In doing so, institutional knowledge was shifted out of the business into those outsourcers. The outsourcers in turn have little incentive to drive transformation of the mainframe given that a move to cloud sees their revenue deriving from the infrastructure management go to near zero. The outsourcers don't even have to act in bad faith for this to be a major problem. McKinsey and the rest thrive on complexity and by advising clients to outsource, they layered organizational and contractual complexity on the technology complexity, making the problem of transformation increasingly irreducible.
After risk, outsourcing is probably the most important factor since it is extremely difficult to create outsourced structures which maintain and develop an organic link between those responsible for business processes, and those responsible for technology. The result an ever growing pile of sclerotic processes, dysfunctional governance bodies and uni-functional teams (often themselves outsourced to different parties for competitive purposes) that purport to control but which really just create complexity.
Outsourcing has served to worsen the organizational complexity that most mainframe users already suffered from. The result is a situation in which any programme of work to get off mainframe becomes fearsomely complex. I've worked in places which would have regular meetings of large parts of the company to try to coordinate major business process change in a single area. I've seen companies nearly break themselves trying to bring a single outsourced business function back in house. The question is why, when they're so incredibly inefficient and inflexible, the aren't competed away. That's a different question for which I have my own opinions, but this comment is too long already.
Knowledge
The loss of COBOL and other mainframe technology knowledge is real. I remember working at a bank in the EU around 2010 where I sat with a bunch of elderly gentlemen (walking sticks were a theme) who had been contracted back into the bank to develop integration between an ancient mainframe application and something modern the bank was building.
But that stereotype aside (there are surprising numbers of younger mainframe experts in India thanks to outsourcing), the problem is real, particularly when it comes to migration of software from mainframe to cloud using modern development practices. Any migration away from mainframe software requires understanding the whole technology stack and more importantly, how that stack interacts with the equally complex stack of business processes.
AI code interpretation and generation might take a COBOL program and translate it into modern code, or even help re-architect it using modern principles. But without that understanding of the business processes as well as the up and downstream dependencies in their many forms, anything other than piecemeal change looks terrifying to owners.
IBM
The fact is that mainframe is an effective technology stack. But more importantly, IBM has become extremely good at both keeping it up to date.
They're also good at making sure they control the path away from mainframe. The best, simplest and lowest risk approaches to getting off legacy code on mainframe are either developed by or bought by IBM. By enabling Linux on mainframe and providing straightforward migration paths from legacy code to that platform, IBM (and its many partners) ensures that modernization of mainframe for the most part means staying on mainframe. This has gone through multiple phases and taken lots of forms over the years but really, IBM has done a stupendous job of ensuing that the future of mainframe is always mainframe.
The advent of AI code interpretation and generation is another example of this. IBM has already announced their own AI tooling to help customers make the migration to mainframe Linux faster and smoother: https://newsroom.ibm.com/2023-08-22-IBM-Unveils-watsonx-Gene....
The challenge for any AI startup or professional services company wanting to help customers move away from mainframe is that the company best placed to sell those tools is... IBM.
Might the situation change?
AI code interpretation and generation is getting better all the time. LLM context sizes are growing rapidly. The possibilty of fine-tuning a code-generation model using a businesses own source code is there. It's even possible that businesses who no longer have source code can use AI to analyze and decompose binaries. The days when AI can analyze a whole software infrastructure, re-architect it and re-write it are coming. But even with those tools, the organizational layering, process cruft and generalized loss of institutional knowledge is going to make elimination of mainframe a long-term project.
This is not to say that it won't happen. But technology change can only ever happen successfully at the fastest rate an organization is able to change along with it. The organizations which still use mainframe tend to be the biggest, most complex and sclerotic organizations on the planet. IBM is going to be enjoying the benefits of what it built decades ago for decades to come.
When you say "performance/cost reasons" do you mean "just the reasons that I am familiar with and am competent to judge myself", or do you mean "all of the plethora of reasons I am clueless about, and that only an experienced actuary, accountant, or project manager from the industry has the background to judge"? I bet you're just thinking of things from your own (likely very limited) perspective. Anybody who has run even a very small business by themselves quickly finds out that a lot of the costs are hidden, subtle, and not noticed by outsiders. Can you even imagine what sorts of surprise costs are involved in running a bank?! You may think buying a cloud database from Microsoft makes total sense, and has an obvious ROI. You may also have never run a business larger than mowing your neighbors' lawns.
(I don't know you. Maybe you are the CTO of an international telecom firm. Maybe I am rude to assume your background. That doesn't change my answer, though.)