But it's advancing very quickly, and it definitely merits further study. People are interested to see how far we can take it.
And the topic hasn’t been fully explored yet so there’s a lot of low hanging fruit
Another deeper reason to me is that it employs some neural network techniques -which have differentiability-. This means that you can backpropagate errors from a domain to another. In the future, we'll have videos as embeddings that can understand written signs, and interpret the voices and what they refer to in the scene.
This is not AGI, but will be much closer to the sensor fusion us humans do.
A few more users than average submit articles on a topic. Other people see them and find them interesting. Nobody is fatigued enough to flag the topic.
The front page doesn't happen in a vacuum. Other sites, conferences, youtube etc. all contribute to discovery, background knowledge, and saturation.