I feel there's a large opportunity here for a more privacy-friendly, on-device solution that doesn't send the user's data to OpenAI.
Is RAM the current main limitation?
(V)RAM+processing power+storage(I mean what kind of average user wants to clog half their hard drive for a subpar model that output 1 token a second?)
IMO the main limitation is access to powerful GPUs for running models locally and the size of some models causing UX problems with cold starts