AI builders, if you are using Llama 3 in backend, where do you host it or what API do you use? (For production usecases with good speed and rate limits close to ChatGPT or Claude)
-AWS sagemaker
-Self host on cloud GPUs
-Replicate API (just found them, 0.05/1m token, legit?)
-AWS bedrock (seems pricy)
-Others - pls comment
Any feedback is welcome!