HACKER Q&A
📣 c0mptonFP

Why are distributed systems so polarizing?


There is an odd gatekeeping duality on tech forums:

1. You're not a proper engineer if you can't write scalable, highly available software that scales to infinity. Real engineers write production-grade, robust, fault tolerant, scalable, highly available, observable mission-critical systems.

2. No one actually needs large distributed systems, you're not google, stop trying to build large scalable systems. One Server + backup is enough. Everything else is overkill, complexity, resume-driven engineering. I can handle 50k RPS with one beefy bare metal machine, written in Rust. Unless you have 10 million customers, which 99.9% of companies don't have.

I'm not sure how to feel about this.


  👤 jyounker Accepted Answer ✓
Those seem like extreme positions. Reality is more like this:

1) Real engineers write systems that accomplish the organization's goals.

2) Most people don't need to write large distributed system, but they will end up writing small distributed systems.

3) Small distributed systems can be surprisingly complicated.


👤 pclmulqdq
The tension comes from our addiction to FAANG-style software engineering.

You can't be a Google engineer without thinking about scale-out. However, Google et al have a different kind of engineering than most other companies: they have tons of requirements of you that make engineering hard, but they also have tons of tools and libraries that make scale easy. In these environments, it makes sense to make even the most trivial systems horizontally scalable.

Conversely, if you do not have Google's or Amazon's distributed system components, and you don't have access to expertise with those tools, scale-out is likely the hardest problem to solve. GCP, AWS and others know this, so they charge you a lot to solve the scale-out problem for you (using their internal tools).

This is the source of the tension. FAANG-style engineering is what pads your resume (and establishes you as a "smart engineer" in the eyes of people who want to work at FAANGs), and simpler systems get things done until you absolutely need to scale out.


👤 didgetmaster
It is my opinion that too many programmers jump too soon to scale-out systems before making sure that every node is as highly optimized as it could be.

Data will always be able to outgrow hardware's capability of processing it in a timely manner. Parallel systems are critical to handle database tables with billions of rows; file systems with hundreds of millions of files; and NoSql stores with an ever increasing number of KV pairs or documents. So a distributed system becomes necessary at some point.

The problem comes when the threshold for scaling out is set too low. A process gets too slow and instead of optimizing the code, they immediately try breaking it up and distributing it. So instead of needing a dozen servers to handle a big problem, the algorithms are inefficient enough that it takes 100 servers or more to solve the same problem in a reasonable amount of time.

I am working on a distributed data system https://didgets.com/ that handles all kinds of data. I have focused on making sure that every node can process large amounts of data in an efficient manner. It can be so much faster when you don't have to coordinate between too many pieces and more of the data is located close to the CPU processing it.


👤 kjeetgill
I think you should stop reading tech forums as if they're supposed to converge on some sort of coherent consensus and form a singular voice. It's just thousands of opinions partially dictated by who was bored when something was posted.

Read and form your own coherent opinions, bust stop asking whole forums to.

Edit: I also want to jump in again and pooh-pooh this redditized style of askhn.


👤 stuckinhell
IMO, polarizing for political decisions.

Distributed systems are harder to reason about, and so people have different feelings on the maintenance burden. Things get complicated around career goals and personal goals like resume driven development.

The odd gate-keeping duality is due to large super corporations dominating the public discussion, while smaller firms and developers fight back.

The truth is more nuanced though. Most places don't need or have high quality and highly available software, and can STILL make millions of dollars a year. Banks, and critical infrastructure SHOULD have very scalable software in key places. I know for a fact many investment banks have tons of good scalable software, and tons of absolute shit software depending on where you look.


👤 wayne-li2
You’ve presented two extremes here. With no indication of timescale either.

For example, starter projects in general should lean towards option 2 for obvious reasons. But as they grow, naturally you’re going to become distributed. You’re right that very few companies need hundreds of infinite scale, but many companies need “2” scale right?

Also, Rust is a fairly new language. The majority of companies out there are on slower languages. Are you asking them to fire all the Python people and bring in Rust developers to rewrite everything?

The issue is so complex and nuanced that discussing it without context and detail seems pointless. To be honest if anyone holds any of the above opinions in real life I’m probably just going to smile and nod and move on.


👤 h2odragon
"forest or trees". Obviously theres differences of opinion resulting from different needs.

"gate keeping" is probably more "argument as sport" indulgence than actual passion; I expect few of the discussions you're summarizing have enough detail for anyone to say they're advising on a specific solution.

Put another way: tho I'm an advocate of "use what you got and keep it under your thumb," if the situation actually called for it I wouldn't consider it a bad thing to implement a solution using cloud or CDN services either. I will say I think I'd try to limit the need for them, so that the system could run alone, but the first goal is that the system runs to the level it needs to solve the problem, not that it satisfy abstract design notions.

> I'm not sure how to feel about this.

You don't have to "feel" about it. People disagree. When you face a problem that you are trying to solve, their passionate writing may aid you in finding your solution, but that's worth the same gratitude you give to the rest of the world that offers you knowledge.

I suggest you take a diversity of opinion and vigorous debate on a subject as a reason for joy: vicious rants can be fun to read and the only certainty in dogma is that it will be boring.


👤 lliamander
One of the mistakes I see happen in discussions about architecture is that just because something is scalable doesn't mean that's always the only (or even primary) reason for choosing that approach.

I use DynamoDB, not because it is scalable, but because it's a simpler developer experience that covers my rather simple query use cases.

Many teams adopt a multi-service architecture (not always "micro" services) not because of scalability, but because it allows multiple development teams to work and release separately (most of the time).

Don't let arguments of scalability and performance distract you from other considerations.


👤 mritun
There are bunch of reasons, but here’s the one topping my list:

1. Heroes!

One man’s hero is another man’s jerk. A single box becomes a single point of failure. Once the hero guy gets pulled out of bed one too many times because the one box is down due to a “simple” change the other “stupid other guy who’s not the real coder” pushed, the hero guy decides they have a duty to protect the tin box with their blood.

It gets messy real fast. Soon you get a tin box that no body except our “Hero” can touch. The hero has all the passwords and they decide where the backups are kept (you don’t want the lesser people to mess them up! don’t you?), the hero has the encryption keys, the hero is only person who knows what to do when the tin box goes up in smoke.

Any org worth it’s salt does not want that hero guy or their one magic tin box!

TL;DR: Organizations are willing to pay for the slow distributed system that costs 10x to run to avoid the real shitty people situation that often comes with a magic all-powerful single machine.

(edit: tldr)


👤 Kranar
People in general notice and give attention far more to the extreme opinions then the more nuanced opinions. The vast majority of comments on distributed systems are nothing like what you mention, but they don't receive the same level of attention.

👤 Veuxdo
I think because it dis-empowers programmers and gives that power to "architects", SREs, system administrators etc. Decisions that could have been made and enforced through code instead must be coordinated across multiple systems.

👤 ozim
It is not even about this specific topic - it is basically on most of tech.

It is ORM vs writing SQL queries by hand it is tabs vs spaces etc.

How to feel about this?

All these are worthless water treading arguments. You can simply ignore that stuff and don't waste your time on getting a feeling on it - any opinion on forum or blog post is just an opinion. So unless you really know background of person stating opinion and you are absolutely sure that person is an expert in matter just ignore it.

People love to extrapolate their experience and they think they have all the answers, they don't understand how big world really is and how many different companies and how many use cases are there.


👤 jmfldn
Microservices - which are related to what we're talking about - have so many benefits when done right. Teams can deploy independently and interactions between teams just become about API contracts or data relationships if we're talking aysnc systems / message passing. Sure, there is a whole lots of other complexity and issues (technical and organisational) that can arise here but, scale-aside, it is often about solving the problem of people working on a big system. A single monolith is hard to work on if your business requires continuous deployment type work flows for example.

All that said, I'm a big fan of single applications where you don't need it. Eg single teams should often be striving to build single apps with well-defined boundaries and interfaces over putting these modules into their own services. You have to ask yourself why you're doing it. There are many reasons to do so but don't cargo cult and make your team's domain into twenty small services just for the sake of it.

As with everything in software engineering, whether you should do something is answered by "it depends". What the OP is alluding too doesn't seem to reflect the nuanced reality out there or what many really think.


👤 tetha
I think both of these are missing the fundamental question of: What does the business, or what do the product teams need? And what is the business willing to pay for it?

We have a whole bunch of simple systems with very lax SLAs around. Those are just running as a single container in the orchestration, because upon failure, they just restart with a minute of downtime or five and it's available enough. For these systems it wouldn't make sense to really think about HA postgres clusters for example. It would be entirely fine to have 2 postgres instances with replication and alerting so the admin can trigger a failover as needed.

However, we also have systems with rather strict and ambitious SLAs, and we're being paid well for the availability of these systems. And at that point, the buisness decisions start to accrue - the company considers it a selling point to be self-hosting, we have ambitious SLAs, we have a lot of products settling on postgres.

At that point, it makes sense for the company to have 1-2 engineers focusing on an HA postgres setup and a couple more engineers who can handle it during an oncall incident. It took us like 2 years and a lot of head scratching to get where we are (which would be an unacceptable investment of time and money to a startup), but now these rock-solid database clusters are turning into an actual asset for sales and product development.

This has been my learning over the last 2-4 years overall: You kinda have to do the smallest and simplest thing to make the business requirement work. Oftentimes, a simple single node solution or a 2-node setup with some defined manual emergency handling works surprisingly well and you don't need any fully scalable auto-failover setup. In other times, big requirements require the big hammer, but that one doesn't come cheap.


👤 nevinera
You're calling it 'gatekeeping', but in my experience the latter position is mostly an expression of _frustration_. Dealing with the fallout from dozens of different decisions by engineers and directors to build "for scale" without the proper understanding of the costs involved in doing so makes many of us react with exasperation to suggestions in that vein.

We rarely have to convince someone that they _do_ need scalable highly available software - mouthing the idea from across the room is generally sufficient to make a director sign off on such a plan. It's convincing people that the costs those approaches inflict (which can be very difficult to explain, even to other engineers) are _not yet warranted_ that tends to be hard.


👤 samsquire
Would be better if people didn't put down other developers to feel better of themselves.

👤 yen223
I think two things are true

1. There's a hard limit to the rate of computation you can do within a single instance. Anything beyond that will require distribution, and you as an engineer have to be able to handle that.

2. That limit is very high nowadays, and distributed systems have very high overheads. Most systems that are distributed don't actually need to be, and are paying the overhead costs unnecessarily.


👤 prewett
Having come from a time where loading a small JPEG required a progress bar, it seems like one beefy server machine ought to be able to handle thousands of concurrent requests per second. This seems like it should fit most companies' needs until a hundred million in revenue or so:

1000 req/sec * 6 hours of daytime in one market * 365 days = 1.3 billion, so $0.10 revenue/req would be $130 million. Seems like even node.js could do 1000 req/sec unless the database bottleneck is large, let alone something like Go.

Of course, I might be wrong about this, these are just unverified estimates.

But, another reason is that I, personally, don't want to manage computers, I want to do interesting things with code. As soon as you get a second computer, infrastructure starts becoming non-trivial. I am not interested in infrastructure, so I'd like to be sure I've maxed out one computer first.

Also, my set of problems that I'm interested in does not really require distributed systems.


👤 notacoward
This glosses over availability a bit too quickly. You can't have high availability without at least dipping your toe into distributed systems. (BTW don't tell me about "single box" high availability or fault tolerant systems. I was there when they were created. They're distributed systems wrapped in tin, one with and one without extra circuitry to add complexity and cost.) A lot of people need high availability, including data availability, even if they don't need high scale.

There's a lot of jumping toward higher-than-necessary degrees of scalability, and lots of gatekeeping, but it's still true that for a lot of jobs "one beefy bare metal machine" thinking just won't allow you to meet requirements - with or without a backup that has to be promoted manually.


👤 mikkergp
1. Reasonable disagreements on the definition of “simplicity”.

2. Neither option is perfect, some people are victims of the bad examples of that, and the here are cases where people have made the wrong decisions. Three blind men and the elephant

3. Normal human emotions in response to change.


👤 cgdub
No one needs to know about distributed systems until your company wants to send automated emails to customers and someone thinks all requests need retry logic.

👤 syntheweave
It probes at "philosophy of computing" to ask what kind of tech is needed to solve the problem.

For many everyday tasks with personal records or small business, paper records continue to work fine provided the scale isn't too large and only a few hands are in the pot. Slightly larger, and you jump over to the spreadsheet.

Once you involve custom development you've moved from commodity solutions into the realm of architects making a bespoken design, and like with architecture, there's a strong desire to be a star and work on a monumental structure, not a shed.

But...there's a gap between spreadsheet and Bigtable. Where you can start adding requirements of more "9's" of reliability, deep access control policies, frontend dashboards and the like.

These things aren't the informational problem, they're the control of information problem. They don't follow the literal grain of the technology, but exist in an imagined universe where more and more power is consolidated into the hands of the system's owners.

That is the actual statement of purpose you have to make to justify a big tech kind of solution.

There are distributed systems that are not of that sort, the Internet itself among them. They exist, they have some value, they evolve and gain some complexity, but they don't naturally turn into platform monopolies.

And so the tension of "wanting to build distributed but being unable to justify it" is kind of specific to the economic thrust of SV style business and companies trying to ape that model. They're charged with leveraging tech to grow faster and control more, so they have to invent it. But if your business isn't that, you don't need it. But you can't conquer the world without doing that, so if you don't do that, you aren't playing for the real stakes. And that drives a certain kind of conflict in engineering orgs between pure problem solvers and the power hungry.

The only way out of thinking like that, really, is to let go and find balance. The people who are seriously happy with distributed systems work will do it with no paycheck. And for most other people, the spreadsheet, or at most a SQL database, is where it's at. For all the rest, it's the business card scene in American Psycho; the technical demonstration is simply keyfabe for one's personal advancement.


👤 onekorg
I think it's a mix of several things at play:

1. Fast growing organizations struggle to keep up with the communication overhead when rapidly onboarding new engineers. Most common open source frameworks lack good interfaces for developing isolated components in the same project. In the short term it's easier to spin up a new project than defining and enforcing interface and dependency boundaries.

2. Cloud providers and consultants are incentivized to propagate the myth that distributed systems is the best solution for all problems.

3. Engineers looking to grow are incentivized to add popular new tools. In particular, the less equity you have in the company, the greater financial incentive you have to become an expert of a tool in high demand and land a job elsewhere with higher pay.

4. In my experience very few engineers learn the fundamentals of computers and systems. Instead they follow "gurus" that tell them the current "best practices" are. I think it's easier to feel you're doing a good job by making all your code comply to some style guide, or building systems with an architecture discussed in some cloud provider blog.

5. A VP of engineering I worked with told me in private that one of the reasons we were adding a lot of distributed systems components was so that we could sell ourselves as a tech company to VCs in the next funding round rather than a tech enabled business. I doubt that VCs care about this, but it's telling that a VP of eng thinks it matters.

6. If you start breaking up your monolith into a distributed system you won't feel the pain until you have several systems that are struggling to coordinate and keep data consistent. For the first few months or even years you'll only see the upsides of quicker iterations. It can be enough time that all the engineers that added the distributed systems got promoted and left for another job.

For companies growing quickly or large companies I don't see how you're able to mitigate the communication overhead without adding distributed systems. It allows different teams to ignore each other for the most part and respond to the market quicker. It's often easier for teams to re-build systems than trying coordinate with a different team that has different incentives.

But for all other companies I think people are adding distributed systems prematurely. But lots of individuals in the decision making chain are incentivized to add them. Unless you have an experienced CTO that can enforce a sane policy, it's inevitable that someone will add a distributed system without understanding the nuances that come with it.


👤 amacneil
Moderate viewpoints don't get upvotes.

👤 PaulHoule
... there was that time I had a 3-machine Hadoop cluster at home that was highly effective for the graph processing I was doing.

👤 obviouslynotme
Number 2 is the factually correct position. Number 1 is the professionally correct position. Resume driven development is the name of the game, and companies reward it.