There are very long-range transformers but as of yet they don't work so well. They will be a continuing research topic because a longer attention window allows applying LLMs to more problems. For instance, document classification or retrieval of documents longer than 4000 tokens.
The problems I see is beside wrong answers, e.g. some Excel logic, that it loses its state in the middle.
As LLM work in a 3d graph space, it seems to often take some wrong shortcut in that graph space causing it to lose some important parts of the information.
They need to clean the rubbish connection between the graphs out, whats may be impossible. Or to train it even more to make certain connections in the graph more "used".
Refine..