What's the current state of in-browser GPU accelerated LLM inference?
Curious if anything is approaching usability at GPT3+ levels, even with a large binary download.
WebGPU is getting close to general support which will make things a bit faster but compute isn't as much of the issue as ram.