Self-Hosted AI Infrastructure Options?

Question

There are a bunch of run-ai-in-the-cloud services like Replicate and RunPod, but what are my options if I have data compliance needs like HIPAA?If my data can't leave my network, what are my options to self-host AI models like fine-tuned LLama2? Is this a solved problem?

speedgoose · Accepted Answer

Yes, you can get on-premise hardware.If you have a limited budget, you will probably opt to use quantised models.The 7B and 14B quantised models can run on most gaming GPUs and Apple hardware, or even CPU at descent speeds.For bigger models like the 70B, the most cost effective solution right now seems to be a Mac with a lot of ram. If you want faster inference, and still relatively cheap, people say a couple of RTX 3090 with NVlink works well too (the 4090 doesn&rsquo;t have nvlink of course).If budget isn&rsquo;t a problem, you local reseller of Nvidia datacentre hardware can probably send you a quote.