However it's unlikely OpenAI will just release the model like they did with whisper due to the size and commercial constraints. For it to be applied in industries it requires experimentation and changing from our side.
When will ChatGPT have it's stable diffusion moment? Is it possible for any machine to run these kinds of models?
For Bloom 176B, an alternative to GPT3, you may need a cluster/machine with 512GB of GPU memory. That’s expensive.
On a more normal machine, you can run something like GPT J 6B, but it’s very limited in comparison to ChatGPT.
Maybe we will find a way to reduce the size of the models while keeping the capabilities.
You can already run smaller language models on your own hardware if you have a GPU with sufficient VRAM. For example, with quantization, you can run gpt-neox-20b (512 token context window) or gpt-pythia-13b (full context window) on an RTX 3090 with 24GB VRAM. Quantization allows you to run the model with less memory, where each parameter utilizes 8 bits or 4 bits instead of 16 or 32 bits.
Another possibility is to use reinforcement learning with human feedback to tune smaller models to give results comparable to larger models.
I've also been using RWKV with good results. It is a language model that uses an RNN and only needs matrix-vector multiplication instead of matrix-matrix, so inference runs much faster. The 7B model uses about 14GB VRAM without quantization. A 14B model is currently in training, but progress checkpoints are available. You can also do inference on a CPU, although it is much slower than GPU.
StableDiffusion needed to be in RAM because you were iterating over all the pixels, but if it's just a token at a time coming out of ChatGPT, it might just be possible, right?