HACKER Q&A
📣 andrew_zhong

Best practice to use Llama 3 8B on production server


The new Llama 3 8b in on par with 22b models, better but could be 10x cheaper than GPT3.5

AI builders, if you are using Llama 3 in backend, where do you host it or what API do you use? (For production usecases with good speed and rate limits close to ChatGPT or Claude)

-AWS sagemaker

-Self host on cloud GPUs

-Replicate API (just found them, 0.05/1m token, legit?)

-AWS bedrock (seems pricy)

-Others - pls comment

Any feedback is welcome!


  👤 whereismyacc Accepted Answer ✓
$0.05 is per million token input, it's $0.25 for output tokens