I'm going to set up a local LLM, e.g., Llama2 for our team (10 people), but I'm not quite sure about the process. I'm planning to run the model on single/multiple H100s and which is the best tool/framework to handle multi-users at the same time? I did a bit research into vLLM but still I wish I could hear from the experts who did this local LLM thing before. Our team members have the IT background so the UI is not top concern. I think it's more about the efficiency that all can use it at the same time at worktime. Thanks!
Best,