HACKER Q&A
📣 alpineidyll3

Best way to configure Onprem DB+GPU servers?


Help HN. I'm director of ML at a series A pharma startup. Although we're small, we chug through a lot of data (typical datasets are 1Tb). The experimental side of the business is less data intensive and lives on AWS. However even data egress to our onprem GPU training machines makes putting our ML-db on AWS prohibitive. We basically need to configure a rack with a few replicated DB servers hosting ~10tb of data, and some quad H100 boxes. How can anyone buy such a rack without getting totally fleeced from a single vendor? Has anyone had any luck hiring someone who could build out and maintain a small server room for these purposes? Where did you find such a person? I'm open to all good opinions.


  👤 speedgoose Accepted Answer ✓
You could use tools such as https://maas.io/ and Ansible to setup the software on your machines in an automated way.

You can buy hardware from many vendors. Maybe Dell or HP(E) websites are an easy start.

You also need a good place for your rack, that will generate a lot of heat and noise. Not every closet is a good datacenter.

You are looking for a competent sysadmin I believe, but good luck to attract one with such a small setup. It’s probably not worth the cost anyway, and perhaps a few existing people in your team should do it themselves.

What you want to do is very common in research labs, where people are competent, motivated, and with more time than AWS budget.


👤 mattpallissard
I've got a bunch of HPC experience (similar hardware and database requirements). Feel free to email me. I'd be happy to answer any questions and depending on your location I have reputable contacts that are vendors in that space.