HACKER Q&A
📣 margorczynski

How would you build a budget CPU compute cluster in 2023?


So lately I've been dabbling in a lot of stuff that requires a lot of CPU compute, 0 GPU and can be basically linearly scaled across threads and nodes in a cluster.

Now using only my own box is proving to be a bottleneck so I've been thinking of either using AWS spot instances or building my own mini-cluster (2-3 machines + switch) at home. Does it make sense to go cloud (even spot) when I would aim at high utilization?

As for the potential node spec:

- Ryzen 4500/5500 (seems best perf/$)

- Some mATX AM4 mobo with integrated GPU

- 2x8GB RAM

- mATX case, the smaller the better. ITX seems pricier.

All the box does is basically run the k8s pod(s).

WDYT?


  👤 phamilton4 Accepted Answer ✓
Honestly, look on ebay for used Lenovo ThinkCentre M/P "Tiny" or Dell Optiplex "Tiny" computers. They can be purchased by the lot for most of them as businesses get rid of the old ones.

I use 4 Lenovo M910x's as a kubernetes cluster and home lab. Have them all connected with a netgear switch. The whole setup costs about the same as a single new quality work station. Each has: i7 8700 (6c - 12t), 32gb memory, 1TB SSDs, <1L case, they're practically silent. easy to find parts, they even use lenovo laptop chargers. if one dies, I can easily purchase + replace in a few days.

You can even go cheaper if you don't need the absolute fastest cpus. Some of these older tiny computers can be purchased for around 100 bucks if you look for them. It has worked like a charm for me. Not sure how much horsepower you really need, but this is a cheap way to build a home cluster. I think they hover around using 40w most the time, so power isn't really too big of a cost either.


👤 PragmaticPulp
> As for the potential node spec:

> - Ryzen 4500/5500 (seems best perf/$)

Price/performance is great for the CPU, but you have to spend hundreds of dollars on the motherboard, RAM, power supply, and case for each one.

You need to look at the overall system cost. If you’re building new, it could be cheaper overall to put 12-core or 16-core CPUs into a smaller number of machines than it would be to put a lot of $100 budget CPUs into many machines.

Unless your goal is to build a cluster for the sake of building a cluster, you might have a better price/performance ratio by building a single 16-core 7950X box than you would with three separate Ryzen 4500 or 5500 machines.

Even with perfect scaling, you would need at least 4 separate Ryzen 5500 machines to have a chance at beating a single 7950X for CPU-bound tasks. The 7950X CPU alone is barely more than 4X the cost of a Ryzen 5500, but you only need to buy one motherboard, one power supply, and so on.


👤 3np
- Cases are overrated. You may not need them. Corkboard is underrated. Just make sure you keep your hardware clean from dust and debris.

- Have you found a mobo? Due to the existence of APUs (I assume), there doesn't really exist many AM4 mobos with iGPU. You have the Asrock RACK boards, which are great (I operate a few) but maybe over budget if you're on a shoestring. You may not need a GPU at all but then you prob want an APU or dGPU on hand for troubleshooting and potentially flashing (not all mobos and firmware versions boot headlessly from what I hear)

- General rule of thumb: if youre going to use something for prod, buy at least 2 of each. It's great to have an extra for experiments and you'll be grateful in case of hardware failure.

- in case you plan to run your host OS straight on the metal (as opposed to VMs): It's recommended to separate your control plane from your workers. Use Pis or similar for this; whatever you can find cheap.

- Rather than HN, I highly recommend you check out ServeTheHome (forums/blog/yt). Lots of great stuff there. The "tinyminimicro" (that would be the small dell/lenovo/HP units other commenters mention) and ali-router-board tracks can be worth considering as well. You should be able to get good ideas about switches here too, maybe even score something on the trade board if you live in US or EU.

- Screw AWS. You should be able to run the money numbers on that yourself.


👤 anonym29
Unixsurplus. Just like with cars, the cost savings of buying used are difficult to beat. Who cares if the hardware is a few years old if you're getting it 90% off original MSRP?

As an example, visiting the site now, first thing I see is a box with 2x E5-2667 v2 (8c,16t, 3.3ghz base clock, 22nm). These were $2300 each when new. It also comes with 128 GB RAM, case, PSU, 1U rail kit, and two 500 GB SATA SSD's to partially fill it's 10 caddies.

The entire thing is $260 + $65 shipping. You can't even get 16c/32t of 3.3ghz compute alone for that price these days, let alone a whole bootable system.

This entire system is about 7% of the price of those two CPU's when new, so you're getting at least 93% off MSRP there (in reality, higher, after subtracting the cost of the RAM, case/chassis, PSU, disks, etc).

Sure, 4x R5 5500 does give you a passmark of around 76k compared to the 24k you're going to get with 2 of those xeons, but then again, you couldn't even buy four of those R5 5500 CPUs alone (let alone 4 cases, mobos, PSU's, coolers, HDDs, RAM, etc) for the cost of that system on Unixsurplus.

I am not affiliated with Unixsurplus and don't personally know anyone who is, but man do I love their store. It's the technology hardware implementation of "one person's trash is another person's treasure"


👤 ericpauley
Cloud vs. on prem is going to come down basically to your duration of use and average/peak utilization ratio. For instance, if you want to run large parallel experiments 1 hour a day then 23 hours a day your compute is sitting idle then that will favor cloud more. Fundamentally on-prem you incur capital costs proportional to peak use whereas cloud opex is proportional to average use.

Once you have your intended compute lifecycle figured out you can compute the cloud cost and hardware cost and compare. Given you’re mentioning k8s I’m assuming this might be a continuous load in which case you’d amortize your hardware capital costs much faster.


👤 tyingq
What are you trying to optimize for? Solely the initial cost of hardware? Power consumption? Is there some performance target, task that needs to complete in X minutes, etc?

For example, I might suggest buying used Lenovo Tiny M75q's on eBay. The Ryzen 3400GE is significantly slower than your Ryzen 4500, but also lower TDP and very cheap procurement cost. Also fits your "smaller the better" wish. No ECC, though.


👤 h2odragon
How long is this gonna be up and running? Cases may not be needed. If you're optimizing for cheap; then perhaps motherbaords and CPUs of a couple generations back are a better bet. "Refurb" deals can save a lot of money there.

Of course "cheap" can cost too much: if you need reliability and want it to run first time after assembly, then it might pay to spend more.


👤 henrixh
There are a few more things to consider:

- Do you need shared storage? If so, how fast? Read or write heavy? - Do you need performant interconnect? (for say MPI? Used IB-cards are cheap on ebay) - Is your software limited by memory bandwidth? (If so, aybe go with more memory lanes than 2)

I'd rent a few different configurations from Hetzner to benchmark before buying.

If you don't need more than a few nodes and you are not limited on memory bandwidth, you could consider a single, faster, node. But the sweetspot is probably consumer-grade Ryzen.

As for cloud, as long as you know you'll actually use everything you buy for a long enough time period, buying will be cheaper.


👤 DeathArrow
Get as much cores as you can, use dual socket boards. You can find 12 core Xeon E5-2670V3 for about $12 on Aliexpress and dual socket X79 boards for about $75. Used ECC RAM is cheap.

Build your own boxes.

You can use Kerrighed or OpenSSI for the software side.


👤 cturner
Graphics. You may not need to have a graphics card in nodes once they are installed. So you may find you can get a single low end card and use that to install each host.

Power and heat. Will you have enough power for the nodes? What is the power trade-off if you get low-end chips vs higher-end chips? Have a look at the Ryzen page on wikipedia to get a feel for power use of each chip. How will you understand how much cooling you need? (more cooling takes more power)

RAM. How much does accuracy matter? Should you use ECC RAM? You can get UDIMMs to work in Ryzen kit, but not with the chips with integrated graphics card (i.e. avoid APU chips if you want ECC). Get Asrock or Asus AM4 motherboards, then get RAM like this - Samsung M393A4K40DB3-CWE. If you go cloud, you may find the hardware has ECC.

IO. Once the grid-of-nodes is in place will you be moving data to functions, or functions to data? How much data are you moving over the network per-job? Might there be IO bottlenecks when you scale up? How detailed a model of IO can you build before you settle on hardware?


👤 t0mas88
AWS / GCP / Azure don't make a lot of sense if you're looking for cheap compute and don't need the rest of the cloud.

Take a look at dedicated servers at Hetzner. They're very cheap, have enough bandwidth to transfer the things you calculate into and out of there (at no extra cost, unlike the three big cloud providers), and come with some serious CPU power if you pick the right model.

You can even email their support staff to get you a couple of machines in the same rack so you get fast network between them.

And contracts are month to month so at the end of the project you can easily cancel.

Edit: but do note that these are consumer type machines, no dual power supply, no ECC etc. That's why the cost is low. Threat them like a bit more durable version of spot instances but definitely not datacenter level stuff.


👤 pella
> Does it make sense to go cloud (even spot) when I would aim at high utilization?

Hybrid cloud ?

"combines and unifies public cloud, private cloud and on-premises infrastructure to create a single, flexible, cost-optimal IT infrastructure."

Hetzner has a dedicated cheap server: ( monthly pricing )

https://www.hetzner.com/de/dedicated-rootserver/matrix-ax

- AMD Ryzen™ 5 3600 ( € 37.30 + VAT) / month

- AMD Ryzen™ 7 7700 ( € 59.00 + VAT ) / month + setup

- AMD Ryzen™ 9 5950X ( € 103.30 + VAT ) / month

- AMD EPYC™ 7502P ( € 119.80 + VAT ) / month + setup


👤 jhot
I just upgraded compute hardware at home and went with: ASRock Deskmeet X300 (itx case, motherboard, and 500w power supply), Ryzen 5600g, 32 gb ddr4, 512 gb m.2 for around $400. Pretty good bang for the buck.

👤 ClumsyPilot
As others have said -this is vague.

For my home server, I pick the smallest case that can fit a desktop CPU, so just a bit bigger than Intel Nuc. Those have laptop CPU's, you are overpaying. I am willing to pay extra for it to be small.

The two best contenders for me are Asrock DeskMini barebones system (picoITX) and IN WIN Chopin case - you gave to buy an ITX motherboard.

I use Chopin with an Intel CPU, they work for my usecase.

Also some motherboars can boot a ryzen without any GPU at all. Asrock usually will. If you are willing to deal with a totally headless system, go for it.


👤 dale_glass
I'd go with a rack and rack cases, if you can. There's a reason why the industry uses it.

I find that once you have a bunch of equipment piled up it makes a huge hard to manage mess, and that happens a lot faster than you'd expect. Before finally getting a rack I ended up with with a bunch of hardware caked in dust because it was all lying in such a precarious way that I was afraid to touch anything in there.


👤 h0bb3z
Have you checked out the resources in https://www.reddit.com/r/homelabsales/

There are usually some good deals on used gear and things suitable for selfhosting if you want to go that route. I was able to build a 3 node cluster with lots of CPU/RAM (~100 vCPU/256G RAM) and storage (30+TB) on systems with redundant power supplies made for datacenters for under $500.

Upside: one time cost and usually cheaper than cloud-hosting costs.

Downside: power consumption (energy bill) increase unless you go with something like a Pi cluster, and you need to setup security well if you intend to expose any services to the Internet.


👤 DeathArrow
If it's just for fun and learning purposes you can build a RaspberryPi cluster: https://www.youtube.com/watch?v=X9fSMGkjtug

👤 tw1984
If 2-3 Ryzen machines is all you need, you should just go for a Xeon/Epyc server. You can build a decent Epyc workstation/server with 64 cores for as little as $2k USD when using 2nd gen Epyc processors.

👤 thinkmassive
I have a couple similar machines in my homelab, running in ASRock X300 boxes w/Ryzen APUs (cpu+gpu), in a similar role (k3s).

The reason I have two is I started with a 3400g (4c/8t) due to supply limitations, then upgraded to a 4750g (8c/16t) when it became feasible. Over time I upgraded memory and storage, so eventually I had everything but the case for a second “half-power” machine.

Having multiple medium-power machines can be useful for rolling upgrades (and for learning purposes), but otherwise it’s very uneconomical.

If your goal is to maximize cores/$ then a single beefy machine will do best.


👤 dcchambers
If your workload is able to be split up and run efficiently on different machines, just buy as many cheap (or free) 1-to-10 year old used computers as you can, and run HTCondor on them for scheduling the jobs.

👤 psyklic
I built my own CPU cluster with mATX AM3s connected via a switch, powerful processor/RAM and everything else was as cheap as possible. At the time, it easily saved money compared to AWS.

The biggest issue I had was overheating. The small mATX cases don't fit fans sufficient to cool powerful CPUs running at 100% 24/7. So, you may have to get midsize cases, or leave the cases open with fans sticking out, which is louder.


👤 shiftpgdn
You can buy complete Dell Workstations with e5 v4/v3 CPUs or even Xeon Gold CPUs for a pittance on ebay. They already have power management, etc built in and tend to run whisper quiet.

👤 oh-my-god
Have you think on mac minis? Compact, cheap and super powerfull

I use AWS c7g (64x arm graviton3 cpus)instances to run some simulations. They are the fastest instances for our work.

If I had to run simulations daily it will be cheaper to have 8x mac minis M1 at a buying cost of around 5200 Eur.


👤 DeathArrow
If you run them 24/7 another idea will be to rent cheap dedicated servers. Hetzner has some: https://www.hetzner.com/dedicated-rootserver

They even have auctions: https://www.hetzner.com/sb


👤 mamcx
Apart for small lenovos machines, mac minis can be a good proposition. I think a m1/m2 is hard to beat. You could also do refurb like in https://eshop.macsales.com/configure-my-mac/mac-mini

👤 SXX
This will heavily depend on power costs in your area, but if your tasks are not just CPU heavy but also require a lot of RAM it's totally worth to consider old Xeons from China with used Samsung RAM.

This used hardware can be easily be 2-4 times cheaper than building using modern CPUs, but power usage is also much higher.


👤 DeathArrow
There's also a very good resource for home labs, but I must warn you it can be a rabbit hole: https://www.reddit.com/r/homelab/

👤 Consumer8735
It may not be the best use case for you. But you can use the Oracle Cloud forever free tier to run a basic Kubernetes setup. If you don't want the limitations on the networking then you can always pay extra for another public IP.

👤 so-and-so
Mobos these days don't come with iGPUs. You'll need CPUs with graphics or buy cheap graphic adapters on eBay.

You can buy a couple of 16 to 22 core Xeons on AliExpress and a dual CPU motherboard for them. Plenty of reviews on YouTube.


👤 Gordonjcp
Thousands and thousands of RP2040s at 50p or so a pop, configured as a massive transputer.

It wouldn't be all that fast, really, but it sure would be elegant.


👤 postalrat
I would investigate buying used servers, workstations, desktops. Often you can find them for very cheap and in large quantities.

👤 jononor
How much data do you need to move in/out? How much variation is there in demand?

👤 nubinetwork
If you want high throughput, you may want to consider 10gbe and nvme.

👤 nspattak
cpu cost is only a small fraction of overall cost: for every cpu, there is a power supply, ram, motherboard and case. You also haven't mentioned anything about data: do they need local storage or will it be only temporarily stored locally?

it sound like minipcs can be an excellent solution for you.

For how long will you be using this? AWS may be preferable in the short term while local hardware may be cheaper in the long term/a lot of cpu hours.

there is also the question of your application performance on different cpus. there are older servers available for very cheap prices but is it worth it to buy a 12/20core xeon cpu that consumers 200-300W if its performance is similar to a 5900 at 150W ?


👤 PaywallBuster
or hetzner

beefy dedicated servers for 50/100 eur per month

you can use it for a few months and return it any time monthly contract)