I can ask it to paraphrase the rules and it totally understands, it just can’t get close to the right answer. Same with other AI chat models that I’ve tried. Any idea why this seemingly simple question is a limitation?
Because of the way the model (i.e. the projection surface) was constructed, the strings returned look plausible. However, you're still just seeing the number-back-to-language translation of a vector which was guessed by statistical inference.
LLMs typically struggle to do things about the words themselves, or basic counting, some AIs like OpenAI use hacks to not have it fail in a miserable way.
You are transforming text using a text transformer. You have input text and output text.
You are asking why is this output text not the what you expected. That is because this particular transformer has said weights.