Maybe AI researchers overvalued the importance of circularity?
You need the vast amount of compute to accelerate matrix computation, that came from accelerating distbelief (https://research.google/pubs/large-scale-distributed-deep-ne...) for Search and Ads. Google developed custom ASICs (TPUs) because it would have been too costly to use CPUs and GPUs for their use cases.
You need the world information that came from Search.
You need the money to pay the researchers, and the willingness to do discretionary research that may not be directly applicable to your main products.
Transformers derive from neural machine translation.