This information can be useful if you want to keep stats or attempt to limit or block training.
E.g., OpenAI's user-agent for their bot is `GPTBot`
Common Crawl's is `CCBot`.
If you aren't aware of such a list, would you find one useful?
I think I'd also like to expand beyond just UAs and also curate IP ranges, docs..etc.
Starting a repo at https://github.com/JoshuaGoode/ai-user-agents