I need to buy a rack-mountable machine (ideally one) that has at least one decent GPU (A6000 or A100), a good CPU (thinking Epyc or Threadripper) and > 40TB of storage.
Usage will be all over the place - medium-sized offline (batched) data processing, deep learning with language and visual models, inference, and live data ingestion. Possibly all at the same time.
What should I buy? Budget is 20-40k.
I'm wondering in particular: - Is the HBM2 memory on an A100 worth the 2x price over a A6000?
- Should I go for a tower (such as the Lenovo P620) or a 2U rack? (space and power are irrelevant to me)
- Are there hidden costs with major suppliers (I really don't want to pay $1k for a 4TB dell hdd)
- Are there any gotchas with report to lights out management? I.e. can I remote reboot a threadripper machine?
Thanks!
- You definetly want to do rack mount. It is more extensible and future proof.
- You will get a server grade mainboard which features industry standards such as IPMI remote managament, redundant power supplies or SAS (which allows for extending hard drives, for instance).
- You should probably put as many RAM into this machine in this machine as you can afford, since this will be a limiting factor, especially if you do "all at the same time"
- With your budget, you can consider buying a few less powerful compute nodes (think of 5k-10k per node) which can be more suitable if you divide the computer to multiple users.
[edit:formatting]
For CPU I cant say much cause we are always limited by PCIE speed in our use case and Xeons are idle most of the time :P
If you plan to spend a lot time around it, then I suggest getting some kind of tower. Otherwise rackmount all the way. There are so many benefits to rack machines, here are a few:
* Superior cooling
* Enterprise reliability
* Built-in remote access system (so you can remotely cycle the power in the rare event of a hang)
* Easy to open up and work on
* Tons of HDD slots
Downsides: Rackmount machines tend to be varying degrees of insanely loud.
Is 2U a hard limit on the size?
I love me a Supermicro rackmount. They are the most cost-effective good quality servers that I know of. Overall quite reliable, easy to work with, easy to source replacement parts for in the rare case a component croaks.
GPU lines:
https://www.supermicro.com/en/products/gpu?pro=pl_grp_type%3...
Please follow up and let us know what you end up getting!
Off topic: Are FatTwins still the big kahunas of high-performance, cost-effective enterprise rackmount servers?
Final food for thought: HN has a lower concentration of hardware nerds than the homelab subreddit. You might like to ask there as well, you'll get better advice.
If you have issues they prioritize new builds over service; my parts went from 2 weeks to 10+ and they finally gave me a huge credit on a Thinkpad because I needed something for work.
So for customer satisfaction I’m happy; the laptop is fitting with my new situation better. The machine was awesome. If something goes wrong plan on it being down for a LONG time, and I wasn’t the only one hit by one of these (there have been a couple) firmware bugs.
Regarding cooling: I designed and printed an upgraded extractor fan (92mm to 120mm) and saw a 15c drop in temps on a SINGLE loaded 3080. Do NOT plan on loading this up and running it with non blower cards without a second intake (there is a spot for a second 92mm) and an upgraded exhaust.
You just need one mega machine from the sounds of it and 40K seems like way more than enough to throw together a system that has a remote management capable mobo and all that compute power and drive storage.
Maybe also ask around at the Level1Techs forum https://forum.level1techs.com/
> I really don't want to pay $1k for a 4TB dell hdd)
Good thing HDDs do not cost that much at all
Also, it sounds like you want this... For yourself? At your own place? At your own building? Where will this be used and who is paying? That factors into this.