The tokenizer might need tweaking. Base Llama models, for example, are trained on text that has had consecutive spaces reduced to a single space. This is unhelpful for coding where specific amounts of whitespace is at least very nice to have and can even be meaningful.
When you talk about a grammar, that's in the decoder, right? You don't need to retrain a model to use one of those.