1. Feed each crawl to a LLM prompt automation like ChatGPT and ask it to take each of your categories and score the article against them to get an idea as to how similar they are. You'll have to decide the thresholds yourself.
2. Grab lots and lots and lots of examples of articles and label them. Run a model, either on your own servers or using something like OpenAI's embeddings endpoints to generate a vector embedding for each document. Then take your whole dataset of category->vector pairings, split it in two, a training set and a verification set, and train a classifier model that can take vector embeddings and spit out probabilities of different classes.
3. Fine-tune an already existing language->category classification model.
4. Buy services to do any combination of the above work for you.