1. What are the hardware specifications you would recommend for running Language Models?
2. What are the building options available for Language Models and which one is the easiest to set up?
3. Is it better to rent or buy hardware for running Language Models?
4.What are some cost-saving strategies that have worked for you when running Language Models?
If you want to run Vicuna without quantization you need 25GB of VRAM, which exceeds pretty much all consumer GPUs. Vicuna 4bit GPTQ is decent though I personally notice a quality difference when comparing it to 16bit.
CPU is also an option, you can run pretty much any model that will fit in your RAM, although your performance will obviously suffer. LlamaCPP has gotten very popular.
I think the current proliferation of AI and general awareness of LLM can be a major selling point if they make sure their neural engine is well optimized for it. Will put them right at the center of the conversation, especially since one of the current concern is the cost of training these models.
Disclosure: I am the author of the website, and it's extremely light on content currently.
2. Take a look at huggingface.co
3. Rent for shot periods, buy if you need it for a long time. You can do the maths.
4. Smaller models, quantization, running on CPU when the speed and increased energy usage isn’t a problem.