If you are interested in inference only, not training, its not really worth it to invest in cards. Use the online inference tools. And for training, even a pair of 4090s aren't going to be that good without a good CPU and lots of RAM to keep the cards fed as much as possible.
For example, Llama has versions that take 32GB of VRAM, even after quantization (compression):
https://old.reddit.com/r/LocalLLaMA/comments/1806ksz/informa...
There are smaller versions too however if you're VRAM constrained.