Calculating Compute Needed for ‘N’ Developers Using LLMs

Question

Hey HN,I&rsquo;m trying to figure how much compute that I&rsquo;d need if I would want to run something like StarCoder or StarCoderPlus in inference mode for `N` developers.- Let&rsquo;s say that I have 10 - 20 developers who would like to use StarCoder within their VSCode, like CoPilot. - How would you calculate the number of tokens/s one would need to be able to produce for it to be a viable service? - What version of the model would you be using? - Would you use a compressed model that was generated via quantization? - Would you be using GPU / CPU?I don&rsquo;t know that much about this, but i&rsquo;m trying to gather information and am willing to read what&rsquo;s needed to understand what&rsquo;s needed, but truthfully, there&rsquo;s so much that&rsquo;s getting published every day on this, i feel like i&rsquo;m drowning.Any insights would be great.

gwbas1c · Accepted Answer

I bet if you start with yourself, you can get a decent estimate!