HACKER Q&A
📣 sa-code

What is the best Llama model that you can deploy on a single A10G?


It's hard to choose between a 13B model with 8 bit quantization, or the 33B model with 4 bit quantization.

For some context, the idea is to build a text to SQL interface. The interface allows you to select certain tables from the data warehouse and injects their definitions in the prompt, so the 4096 context limit is useful here.


  👤 pocketarc Accepted Answer ✓
The 33B model with 4 bit quantisation would be better.

Check this PR out, you can see the chart showing that even the best 13B quantisation would be a far cry from the 30B with 2 bit quantisation: https://github.com/ggerganov/llama.cpp/pull/1684