Best practice to use Llama 3 8B on production server

Question

The new Llama 3 8b in on par with 22b models, better but could be 10x cheaper than GPT3.5AI builders, if you are using Llama 3 in backend, where do you host it or what API do you use? (For production usecases with good speed and rate limits close to ChatGPT or Claude)-AWS sagemaker-Self host on cloud GPUs-Replicate API (just found them, 0.05/1m token, legit?)-AWS bedrock (seems pricy)-Others - pls commentAny feedback is welcome!

whereismyacc · Accepted Answer

$0.05 is per million token input, it's $0.25 for output tokens