Now using only my own box is proving to be a bottleneck so I've been thinking of either using AWS spot instances or building my own mini-cluster (2-3 machines + switch) at home. Does it make sense to go cloud (even spot) when I would aim at high utilization?
As for the potential node spec:
- Ryzen 4500/5500 (seems best perf/$)
- Some mATX AM4 mobo with integrated GPU
- 2x8GB RAM
- mATX case, the smaller the better. ITX seems pricier.
All the box does is basically run the k8s pod(s).
WDYT?
I use 4 Lenovo M910x's as a kubernetes cluster and home lab. Have them all connected with a netgear switch. The whole setup costs about the same as a single new quality work station. Each has: i7 8700 (6c - 12t), 32gb memory, 1TB SSDs, <1L case, they're practically silent. easy to find parts, they even use lenovo laptop chargers. if one dies, I can easily purchase + replace in a few days.
You can even go cheaper if you don't need the absolute fastest cpus. Some of these older tiny computers can be purchased for around 100 bucks if you look for them. It has worked like a charm for me. Not sure how much horsepower you really need, but this is a cheap way to build a home cluster. I think they hover around using 40w most the time, so power isn't really too big of a cost either.
> - Ryzen 4500/5500 (seems best perf/$)
Price/performance is great for the CPU, but you have to spend hundreds of dollars on the motherboard, RAM, power supply, and case for each one.
You need to look at the overall system cost. If you’re building new, it could be cheaper overall to put 12-core or 16-core CPUs into a smaller number of machines than it would be to put a lot of $100 budget CPUs into many machines.
Unless your goal is to build a cluster for the sake of building a cluster, you might have a better price/performance ratio by building a single 16-core 7950X box than you would with three separate Ryzen 4500 or 5500 machines.
Even with perfect scaling, you would need at least 4 separate Ryzen 5500 machines to have a chance at beating a single 7950X for CPU-bound tasks. The 7950X CPU alone is barely more than 4X the cost of a Ryzen 5500, but you only need to buy one motherboard, one power supply, and so on.
- Have you found a mobo? Due to the existence of APUs (I assume), there doesn't really exist many AM4 mobos with iGPU. You have the Asrock RACK boards, which are great (I operate a few) but maybe over budget if you're on a shoestring. You may not need a GPU at all but then you prob want an APU or dGPU on hand for troubleshooting and potentially flashing (not all mobos and firmware versions boot headlessly from what I hear)
- General rule of thumb: if youre going to use something for prod, buy at least 2 of each. It's great to have an extra for experiments and you'll be grateful in case of hardware failure.
- in case you plan to run your host OS straight on the metal (as opposed to VMs): It's recommended to separate your control plane from your workers. Use Pis or similar for this; whatever you can find cheap.
- Rather than HN, I highly recommend you check out ServeTheHome (forums/blog/yt). Lots of great stuff there. The "tinyminimicro" (that would be the small dell/lenovo/HP units other commenters mention) and ali-router-board tracks can be worth considering as well. You should be able to get good ideas about switches here too, maybe even score something on the trade board if you live in US or EU.
- Screw AWS. You should be able to run the money numbers on that yourself.
As an example, visiting the site now, first thing I see is a box with 2x E5-2667 v2 (8c,16t, 3.3ghz base clock, 22nm). These were $2300 each when new. It also comes with 128 GB RAM, case, PSU, 1U rail kit, and two 500 GB SATA SSD's to partially fill it's 10 caddies.
The entire thing is $260 + $65 shipping. You can't even get 16c/32t of 3.3ghz compute alone for that price these days, let alone a whole bootable system.
This entire system is about 7% of the price of those two CPU's when new, so you're getting at least 93% off MSRP there (in reality, higher, after subtracting the cost of the RAM, case/chassis, PSU, disks, etc).
Sure, 4x R5 5500 does give you a passmark of around 76k compared to the 24k you're going to get with 2 of those xeons, but then again, you couldn't even buy four of those R5 5500 CPUs alone (let alone 4 cases, mobos, PSU's, coolers, HDDs, RAM, etc) for the cost of that system on Unixsurplus.
I am not affiliated with Unixsurplus and don't personally know anyone who is, but man do I love their store. It's the technology hardware implementation of "one person's trash is another person's treasure"
Once you have your intended compute lifecycle figured out you can compute the cloud cost and hardware cost and compare. Given you’re mentioning k8s I’m assuming this might be a continuous load in which case you’d amortize your hardware capital costs much faster.
For example, I might suggest buying used Lenovo Tiny M75q's on eBay. The Ryzen 3400GE is significantly slower than your Ryzen 4500, but also lower TDP and very cheap procurement cost. Also fits your "smaller the better" wish. No ECC, though.
Of course "cheap" can cost too much: if you need reliability and want it to run first time after assembly, then it might pay to spend more.
- Do you need shared storage? If so, how fast? Read or write heavy? - Do you need performant interconnect? (for say MPI? Used IB-cards are cheap on ebay) - Is your software limited by memory bandwidth? (If so, aybe go with more memory lanes than 2)
I'd rent a few different configurations from Hetzner to benchmark before buying.
If you don't need more than a few nodes and you are not limited on memory bandwidth, you could consider a single, faster, node. But the sweetspot is probably consumer-grade Ryzen.
As for cloud, as long as you know you'll actually use everything you buy for a long enough time period, buying will be cheaper.
Build your own boxes.
You can use Kerrighed or OpenSSI for the software side.
Power and heat. Will you have enough power for the nodes? What is the power trade-off if you get low-end chips vs higher-end chips? Have a look at the Ryzen page on wikipedia to get a feel for power use of each chip. How will you understand how much cooling you need? (more cooling takes more power)
RAM. How much does accuracy matter? Should you use ECC RAM? You can get UDIMMs to work in Ryzen kit, but not with the chips with integrated graphics card (i.e. avoid APU chips if you want ECC). Get Asrock or Asus AM4 motherboards, then get RAM like this - Samsung M393A4K40DB3-CWE. If you go cloud, you may find the hardware has ECC.
IO. Once the grid-of-nodes is in place will you be moving data to functions, or functions to data? How much data are you moving over the network per-job? Might there be IO bottlenecks when you scale up? How detailed a model of IO can you build before you settle on hardware?
Take a look at dedicated servers at Hetzner. They're very cheap, have enough bandwidth to transfer the things you calculate into and out of there (at no extra cost, unlike the three big cloud providers), and come with some serious CPU power if you pick the right model.
You can even email their support staff to get you a couple of machines in the same rack so you get fast network between them.
And contracts are month to month so at the end of the project you can easily cancel.
Edit: but do note that these are consumer type machines, no dual power supply, no ECC etc. That's why the cost is low. Threat them like a bit more durable version of spot instances but definitely not datacenter level stuff.
Hybrid cloud ?
"combines and unifies public cloud, private cloud and on-premises infrastructure to create a single, flexible, cost-optimal IT infrastructure."
Hetzner has a dedicated cheap server: ( monthly pricing )
https://www.hetzner.com/de/dedicated-rootserver/matrix-ax
- AMD Ryzen™ 5 3600 ( € 37.30 + VAT) / month
- AMD Ryzen™ 7 7700 ( € 59.00 + VAT ) / month + setup
- AMD Ryzen™ 9 5950X ( € 103.30 + VAT ) / month
- AMD EPYC™ 7502P ( € 119.80 + VAT ) / month + setup
For my home server, I pick the smallest case that can fit a desktop CPU, so just a bit bigger than Intel Nuc. Those have laptop CPU's, you are overpaying. I am willing to pay extra for it to be small.
The two best contenders for me are Asrock DeskMini barebones system (picoITX) and IN WIN Chopin case - you gave to buy an ITX motherboard.
I use Chopin with an Intel CPU, they work for my usecase.
Also some motherboars can boot a ryzen without any GPU at all. Asrock usually will. If you are willing to deal with a totally headless system, go for it.
I find that once you have a bunch of equipment piled up it makes a huge hard to manage mess, and that happens a lot faster than you'd expect. Before finally getting a rack I ended up with with a bunch of hardware caked in dust because it was all lying in such a precarious way that I was afraid to touch anything in there.
There are usually some good deals on used gear and things suitable for selfhosting if you want to go that route. I was able to build a 3 node cluster with lots of CPU/RAM (~100 vCPU/256G RAM) and storage (30+TB) on systems with redundant power supplies made for datacenters for under $500.
Upside: one time cost and usually cheaper than cloud-hosting costs.
Downside: power consumption (energy bill) increase unless you go with something like a Pi cluster, and you need to setup security well if you intend to expose any services to the Internet.
The reason I have two is I started with a 3400g (4c/8t) due to supply limitations, then upgraded to a 4750g (8c/16t) when it became feasible. Over time I upgraded memory and storage, so eventually I had everything but the case for a second “half-power” machine.
Having multiple medium-power machines can be useful for rolling upgrades (and for learning purposes), but otherwise it’s very uneconomical.
If your goal is to maximize cores/$ then a single beefy machine will do best.
The biggest issue I had was overheating. The small mATX cases don't fit fans sufficient to cool powerful CPUs running at 100% 24/7. So, you may have to get midsize cases, or leave the cases open with fans sticking out, which is louder.
I use AWS c7g (64x arm graviton3 cpus)instances to run some simulations. They are the fastest instances for our work.
If I had to run simulations daily it will be cheaper to have 8x mac minis M1 at a buying cost of around 5200 Eur.
They even have auctions: https://www.hetzner.com/sb
This used hardware can be easily be 2-4 times cheaper than building using modern CPUs, but power usage is also much higher.
You can buy a couple of 16 to 22 core Xeons on AliExpress and a dual CPU motherboard for them. Plenty of reviews on YouTube.
It wouldn't be all that fast, really, but it sure would be elegant.
it sound like minipcs can be an excellent solution for you.
For how long will you be using this? AWS may be preferable in the short term while local hardware may be cheaper in the long term/a lot of cpu hours.
there is also the question of your application performance on different cpus. there are older servers available for very cheap prices but is it worth it to buy a 12/20core xeon cpu that consumers 200-300W if its performance is similar to a 5900 at 150W ?
beefy dedicated servers for 50/100 eur per month
you can use it for a few months and return it any time monthly contract)