HACKER Q&A
📣 ak_111

Why hasn't the cloud killed the mainframe?


I was surprised to see how few mainframes Big Tech companies that handle a huge amount of transactions (Netflix, meta, google,...) use relative to legacy industries (banking, retail, insurance,...)

This makes me suspect that the real reason why mainframes continue to exist are due to industry inertia, vendor lock-in or even legacy code rather than any performance/cost reasons.


  👤 taubek Accepted Answer ✓
I would says "vendor lock-in or even legacy code". It seems to me that the systems that use mainframes have business continuity and stability as main goal. If it works there is not sense in migration. I think that for systems that use mainframes money is not the issue and on top of that migration costs would be huge.

👤 Ekaros
Also if you are in banking or insurance and so on. Do you really want to put your trust in someone else? Cloud going down is not uncommon, but what if there is catastrophic loss of data? Or somehow their security systems break and everything or even last day is wiped?

Not that legacy will do better, but keeping your own fate in own hands does feel better. And these are at this point pretty proven systems. They most likely would have already failed.


👤 svennek
First of all, I find I kinda funny that you call banking, retail and insurance "legacy industries".

I would rather be without Netflix and Google, than banking and food ... but each to their own..

While some is inertia (mostly doing to rewriting truly large applications are hard and expensive), there is also the the point that most of those industries cannot easily handle "eventually consistent" data..

Not all transactions are created equally, the hardest usually have a set of requirements called ACID.

ACID in the classic RDBMS is not a random choice, but driven by real requirements of their users (the database user, i.e. applications in the business sense - and not the users as people). The ACID properties are REALLY hard to do in scale in a distributed system with high throughput. Think of the rate of transactions in the bitcoin system (500k/day with many, many "servers") vs. visa (500M+/day) - the latter is basically driven by two (!) large mainframes (with 50ish km distance) the last I heard of any technical details.

None of the companies you mention need to have strict ACID, as nobody will complain if different users see slightly different truths - hence scaling writes is faily easy.


👤 tuatoru
The risk analysis for conversion projects comes out with a negative result every time.

You spend a lot of money, and the best case is you end up with the same services as before.


👤 ch_123
The mainframe ecosystem (both hardware and software) is sufficiently different to modern commodity hardware that ports of large applications off the mainframe are highly risky or impossible. For organizations which adopted the mainframe way back when, the cost of staying on the mainframe is often lower than the cost to migrating to something else. It many cases, a migration is likely to mean rebuilding a decades-old codebase from scratch.

👤 loup-vaillant
Or perhaps some industries would rather have servers on premises than trust an external provider? Could be part of the inertia (mainframe being the old way of doing things), but if there is any safety, confidentiality, or trust issue, relying on servers you don’t own may be frowned upon.

👤 atemerev
A mainframe is your own on-premises AWS.

(IBM should send me money for this tagline).

Seriously though, there are businesses where sending the data to a third party is absolutely impossible. You can spend a lot of money building your own private cloud... or, you can buy a mainframe, which will be cheaper, and if your industry is mainframe-friendly, there will be a lot of support and accumulated experience.


👤 deafpolygon
because cloud is the mainframe

👤 hardware2win
Mainframes are really impressive

It feels like non mainframe world decided to not put effort into hardware reliability and tries to fix it at the architecture and software lvl, which is kinda sad.


👤 INTPenis
I don't even work in banking or insurance, but we still have regulations and certifications that require us to have on-prem level of security.

👤 supermatt
You statement and question are different things. Its also worth pointing out that a mainframe can mean different things. Here I will use the definition of a large central machine (or cluster) designed for HPC.

> Why hasnt cloud killed the mainframe

A mainframe is typically a very powerful machine/cluster. You may be able to get those in the cloud (although I doubt you can get one with 40TB RAM, for example) , but why pay a premium to rent instead of buy?

> Why do newer companies not use mainframes

Some do, but with your examples, its because they have been engineered to scale horizontally - i.e. using lots of less powerful machines.

> Are mainframes used because of lock-in

In some cases, sure. But there are many cases where big powerful machines are needed.


👤 jeroenhd
Mainframes are more efficient at some tasks. Most tasks aren't particularly suited for mainframes, but the ones that are become a lot more expensive if you try to put stuff into the cloud.

You're also often dealing with very sensitive information. Zenbleed style attacks are an unacceptable risk, so you'll need separate hardware from the rest of the cloud anyway. Maybe you can find a data center that's reliable and secure enough to put all of that sensitive data, but there's a good chance you'll be wiring up your basement with fiber optics if you're dealing with finance.

The cloud, as in "other people's computers", is horrifically expensive. If you're going to set up mass throughput systems, you'd better start your own data center. This is expensive as well, but it's not impossible.

If you want scalability to reduce power consumption and deal with burst workloads, you'll have to separate out your processing systems from your storage systems. Your average data center probably runs a lot of iSCSI or similar remote disk tech, possibly based on some kind of software wrapper at the cost of latency and performance but with the benefit of quickly swapping drives and expanding capacity.

Then you'll have to architect your software, of course; if you're doing batch work, you'll probably want to distribute programs in batches and run similar programs on similar chips, making optimal use of data locality and CPU cache. Maybe do the whole Hadoop thing depending on your workload.

You'll also want to figure out maintenance. You can hire a team of your own techs dealing with replacements or upgrades, but in many cases hiring external talent for a limited amount of time per month is probably cheaper and saves you the effort of keeping your workers trained.

When a machine fails and the SKU you've selected has gone out of production you'll need to figure out a replacement. This doesn't have to be a problem, but servers are fickle things and you'll probably want to use something that works with the other server vendors' tools, so you're stuck buying hardware from a limited number of suppliers in a limited number of configurations.

For management, you'll want to pretend all the computers are part of the same system. Whether you pick Kubernetes or OpenStack, you want central control to save you the headache of a million dashboards.

When you're done, you've built yourself a mainframe, except you've spent your own money on R&D. Have you saved money compared to buying overpriced mainframe hardware? Tough to say, mainframe hardware is often better at its job than normal servers. You'll save money on developers, as you no longer need to keep the old COBOL around, but when that happens you've probably paid more than you've saved when the decade long project to rewrite the backend to another language is finally done.

Mainframes are just computers good at batch jobs. They're not better or worse than regular computers, they're just different.


👤 anovikov
It is simply about a wholly different level of reliability that mainframes provide.

They are not at all a trash or a product of corruption or nepotism as many tend to think. That stuff works all the time, and did since ~1960, while we serfs spend our lives fixing bugs resulting from never-ending updates in that hodgepodge of javascript libraries our "efficient", "cheap", "FOSS-based" products are made of.

We all don't use mainframes because marginal or downright dodgy business cases of our products simply won't pay for it, thus we are stuck in this race to the bottom.


👤 defrost
Something like the Square Kilometre Array (SKA) project(s) are best served by having their own dedicated quiver of mainframes coupled with dedicated instrument to supercomputing centre fibre bundles.

I used the phrase "quiver of mainframes" as the compute tasks are best served by having a cluster of architectures - dedicated RAM in the tens of terrabytes, dedicated fast local storage in the petabytes, large custers of computer nodes optimised for pipelined throughput, others optimised for hyper cubed deep mesh computations, other architures again purely for graphical representations, etc.

See: https://pawsey.org.au/supercomputing/


👤 lbriner
I'm not sure the question is quite right.

Google and Amazon don't use "the cloud" in the normal sense of a third-party public offering. They own the infrastructure that is also used by other customers as a cloud so in one sense it is private cloud but closer to on-prem than anything else.

Many other large businesses run on-prem instead of "the cloud" because the cost savings of a op-ex cloud system start to diminish when you already have your own infrastructure/networks/specialist staff as these large businesses have. Again, this is often a mix of traditional on-prem infrastructure and private clouds that offer some sharing of resources to applications that do not need an entire physical server.

Netflix, I think is a mixture of on-prem and public cloud but not sure.

So I'm not sure if you are asking "why mainframes instead of distributed e.g. microservices systems" or "why on-prem instead of using the cloud".


👤 lz400
I think part of the answer is that is because of the software, not the hardware. The hard lock in in banking/insurance industry is to things like Cobol/DB2. They happen to run in mainframes but that's not where most of the migration cost would be. Cloud providers afaik don't provide support for this type of software stack and migrating huge transaction processing software backends to other stacks is considered prohibitive from what I've heard.

👤 candyman
The maintenance budget the central administrative system at IBM in 1985 was $4B or (about $9.6B in current dollars.) The idea of moving something like this to the cloud is hard to comprehend. Even if it could be done it would take years and cost maybe $100B. And what would be the ROI versus building new applications or businesses. These are business critical systems built to run in a specific environment, it is very hard to justify moving them to another platform.

👤 debok
One of the new banks in my country (founded in 2018) decided to use a mainframe for all their core functionality. The actually have statement about it on their website:

"""

Many factors from cost to regulatory requirements to cryptography and other security requirements play a role in making a decision to run a core banking platform in the cloud. Whilst the cloud is good for managing certain services (which we indeed use), it becomes a challenge to manage a bank’s core banking platform.

Once infrastructure is in a cloud, it is outsourced to that cloud provider. That means that the business is bound by those agreements covering aspects such as scalability, usage, capacity-on-demand and disaster recovery. These costs could grow exponentially as volumes grow. By using our own infrastructure for our core banking platform, we have full control over these factors.

Our chosen mainframe solution provides the ability for us to grow exponentially, while controlling all factors (CPU, Memory, Disk, Network, DR, Remote capability, etc) allowing us to manage the environment efficiently and effectively.

"""


👤 jll29
The cloud is great for elasticity, so in scenarios where compute or storage demands can surge extremely and unexpectedly/at short notice, that is core "cloud territory".

When you have reliability/continuity as a top business requirement and ACID transactions (e.g. billing) must be processed at scale, then mainframes shine.

The argument "Never change a running system." is not a wrong one, but it does not on its own explain the existence of mainframes; there are non-legacy scenarios where using a mainframe is the most reasonably choice. Finally, "cloud" as a term denoting outsourced compute/storage capacity can also apply to mainframes, see e.g. IBM's pricing brief at https://www.ibm.com/downloads/cas/YM94KV6N - some own their mainframe, some rent it - like a prive or public (internal or external) cloud.


👤 benjaminwootton
The main reason is almost certainly legacy code and integrations which are hard, risky and most of all expensive to change.

No business in their right mind would want to be on out of date, proprietary technology with hard to source skills, but the cost and effort of migration is enormous. There are scare stories of SAP migrations costing $billions and I assume a mainframe migration could be multiples of this.

What would be interesting is if they let actual enlightened techies size up and run these projects, rather than giving to Accenture and the like. Maybe it wouldn’t be insurmountable then?


👤 tannhaeuser
That question can't be answered without defining "the" cloud (a marketing term if anything) and "the" mainframe (there can be multiple). It's conceivable that IBM rents out z/OS machine capacity as a "cloud service".

👤 dusted
I can't think of any valid arguments for moving anything important away from a well understood and proven-reliable system.

👤 doingtheiroming
Mainframe will be with us for a long time yet. The reasons are many and complex.

Lack of impetus

There are plenty of neobanks out there that have built scalable, secure infrastructures using modern development practices in the cloud. But despite growing rapidly, none of them has the scale to challenge the big banks. Similarly in areas like the airline industry, the big airlines are all old and back-end technology is rarely the deciding factor in whether or not one airline is more efficient than another. The simply isn't as strong an impetus to change as you'd expect.

Risk

There is very little incentive for any given executive at one of these firms to take the risk involved in staking their reputation on a big technology migration. Equally, there's nothing quite like a failed transformation project to destroy the careers of those associated with it. If you think mainframe is antediluvian, take a look at the ERP software the same companies are running. Layer upon layer of legacy with custom code built to manage myriad edge-cases that nobody understands anymore. Why take the risk when you can build a new system that integrates with the mainframe using (for example) a modern database that gives you a modern transactional API while micro-batching updates back to the mainframe. The incentives are all to create additional cruft.

Outsourcing

Most if not all of the big banks, airlines etc. have outsourced considerable parts of their operations over the years. In doing so, institutional knowledge was shifted out of the business into those outsourcers. The outsourcers in turn have little incentive to drive transformation of the mainframe given that a move to cloud sees their revenue deriving from the infrastructure management go to near zero. The outsourcers don't even have to act in bad faith for this to be a major problem. McKinsey and the rest thrive on complexity and by advising clients to outsource, they layered organizational and contractual complexity on the technology complexity, making the problem of transformation increasingly irreducible.

After risk, outsourcing is probably the most important factor since it is extremely difficult to create outsourced structures which maintain and develop an organic link between those responsible for business processes, and those responsible for technology. The result an ever growing pile of sclerotic processes, dysfunctional governance bodies and uni-functional teams (often themselves outsourced to different parties for competitive purposes) that purport to control but which really just create complexity.

Outsourcing has served to worsen the organizational complexity that most mainframe users already suffered from. The result is a situation in which any programme of work to get off mainframe becomes fearsomely complex. I've worked in places which would have regular meetings of large parts of the company to try to coordinate major business process change in a single area. I've seen companies nearly break themselves trying to bring a single outsourced business function back in house. The question is why, when they're so incredibly inefficient and inflexible, the aren't competed away. That's a different question for which I have my own opinions, but this comment is too long already.

Knowledge

The loss of COBOL and other mainframe technology knowledge is real. I remember working at a bank in the EU around 2010 where I sat with a bunch of elderly gentlemen (walking sticks were a theme) who had been contracted back into the bank to develop integration between an ancient mainframe application and something modern the bank was building.

But that stereotype aside (there are surprising numbers of younger mainframe experts in India thanks to outsourcing), the problem is real, particularly when it comes to migration of software from mainframe to cloud using modern development practices. Any migration away from mainframe software requires understanding the whole technology stack and more importantly, how that stack interacts with the equally complex stack of business processes.

AI code interpretation and generation might take a COBOL program and translate it into modern code, or even help re-architect it using modern principles. But without that understanding of the business processes as well as the up and downstream dependencies in their many forms, anything other than piecemeal change looks terrifying to owners.

IBM

The fact is that mainframe is an effective technology stack. But more importantly, IBM has become extremely good at both keeping it up to date.

They're also good at making sure they control the path away from mainframe. The best, simplest and lowest risk approaches to getting off legacy code on mainframe are either developed by or bought by IBM. By enabling Linux on mainframe and providing straightforward migration paths from legacy code to that platform, IBM (and its many partners) ensures that modernization of mainframe for the most part means staying on mainframe. This has gone through multiple phases and taken lots of forms over the years but really, IBM has done a stupendous job of ensuing that the future of mainframe is always mainframe.

The advent of AI code interpretation and generation is another example of this. IBM has already announced their own AI tooling to help customers make the migration to mainframe Linux faster and smoother: https://newsroom.ibm.com/2023-08-22-IBM-Unveils-watsonx-Gene....

The challenge for any AI startup or professional services company wanting to help customers move away from mainframe is that the company best placed to sell those tools is... IBM.

Might the situation change?

AI code interpretation and generation is getting better all the time. LLM context sizes are growing rapidly. The possibilty of fine-tuning a code-generation model using a businesses own source code is there. It's even possible that businesses who no longer have source code can use AI to analyze and decompose binaries. The days when AI can analyze a whole software infrastructure, re-architect it and re-write it are coming. But even with those tools, the organizational layering, process cruft and generalized loss of institutional knowledge is going to make elimination of mainframe a long-term project.

This is not to say that it won't happen. But technology change can only ever happen successfully at the fastest rate an organization is able to change along with it. The organizations which still use mainframe tend to be the biggest, most complex and sclerotic organizations on the planet. IBM is going to be enjoying the benefits of what it built decades ago for decades to come.


👤 nunez
Decades of business logic resides in the mainframe, and the risk of moving that into commodity hardware is in the billions (maybe trillions?)

👤 JohnDeHope
Different industries have different amount of willingness to endure foolishness. For at least the first 10 years, if not the first 20, any new technology is mostly foolishness. Some technology never even makes it out of that timeframe, and just disappears before then. It takes a long time to figure out which parts of a new technology are actually useful, and which parts are just foolishness. A small SaaS startup has a relatively high tolerance for foolishness. A giant bank with billions of dollars under management has close to zero tolerance.

When you say "performance/cost reasons" do you mean "just the reasons that I am familiar with and am competent to judge myself", or do you mean "all of the plethora of reasons I am clueless about, and that only an experienced actuary, accountant, or project manager from the industry has the background to judge"? I bet you're just thinking of things from your own (likely very limited) perspective. Anybody who has run even a very small business by themselves quickly finds out that a lot of the costs are hidden, subtle, and not noticed by outsiders. Can you even imagine what sorts of surprise costs are involved in running a bank?! You may think buying a cloud database from Microsoft makes total sense, and has an obvious ROI. You may also have never run a business larger than mowing your neighbors' lawns.

(I don't know you. Maybe you are the CTO of an international telecom firm. Maybe I am rude to assume your background. That doesn't change my answer, though.)