HACKER Q&A
📣 _boffin_

Calculating Compute Needed for ‘N’ Developers Using LLMs


Hey HN,

I’m trying to figure how much compute that I’d need if I would want to run something like StarCoder or StarCoderPlus in inference mode for `N` developers.

- Let’s say that I have 10 - 20 developers who would like to use StarCoder within their VSCode, like CoPilot. - How would you calculate the number of tokens/s one would need to be able to produce for it to be a viable service? - What version of the model would you be using? - Would you use a compressed model that was generated via quantization? - Would you be using GPU / CPU?

I don’t know that much about this, but i’m trying to gather information and am willing to read what’s needed to understand what’s needed, but truthfully, there’s so much that’s getting published every day on this, i feel like i’m drowning.

Any insights would be great.


  👤 gwbas1c Accepted Answer ✓
I bet if you start with yourself, you can get a decent estimate!