((Final .safetensors [GB]) / (Total Training Data [GB])) * 100 = ?
You could measure how well it memorizes via prediction accuracy on the training set, but this wouldn't indicate whether it generalizes well.
https://github.com/meta-llama/llama-models/blob/main/models/...
The heaviest quantised LLaMa 3.1 8B is about 3.4GB.
So 0.005% compression rate, if you don't mind the intelligence of a heavily quantised 8B model.