This got me thinking about what makes it that hard to build a good recommendation system?
On my side, I'm currently working on a new way to do so. I called it Channel Tree. The goal is to start out with a list of YouTube channels provided by the user.
Then my software will go look for the channel section of each channels to look for other channels there. This has the effect of building a deep tree of different channels related to each other through the channel section. The user will only have to specify how many layers of the tree he would like to have so the algorithm can stop there.
Finally, you'll only have to look at the tree and look out for channels that may seem interesting based on who's the parent in the tree.
I would assume that Google has tried various tweaks to their algorithm and that it is the way it is for a reason.
It could be that most people's goal in life is not to watch the maximum amount of YouTube.
One thing I've wanted since I saw a prototype Joe Edelman made for Chrome, is after a video is played, prompt the user for a rating, and a reason for the rating. Joe had a taxonomy of human values for this, but it could also be a freeform tag based system.
Then when you go to YouTube next time, you say that the reason you're at YouTube is to "increase my knowledge of machine learning." Or maybe it's Friday evening and you just want to "make me laugh."
Most people probably won't pick "make me outraged or scared.", but that does give good engagement metrics...
2) COPPA means youtube doesn't allow under 13s to have an account. But those children want to use the like and subscribe and notification bell. This means that youtube gets all of my viewing and all of my child's view, but can't tell the difference.
3) There's no way to tell youtube what I like. I watch tiny channels doing original songs and covers. Youtube thinks that's music, and so pushes general chart shit at me. Or it thinks it's some genre of music and pushes huge channels that are roughly that genre at me. I have no way of telling YT that it's the small channels with fewer than 1000 / 10000 subs that I want to see.
Your proposed algorithm wouldn't work for me because the majority of the good videos I'm exposed to in education/science come from channels that are not listed in the channel sections of my subscribed list.
E.g., here's a channel about machine learning that doesn't list any other channels: https://www.youtube.com/c/YannicKilcher/channels
Here's another machine learning channel that doesn't reference any other AI channels: https://www.youtube.com/c/K%C3%A1rolyZsolnai/channels
Neither of the above channels reference each other but both have videos that are relevant to my interests.
Because the Youtube algorithm didn't depend on building a "channel tree" from whitelisted channel listings, it can suggest quality videos from both of those channels that your algorithm would miss.
Also as a person working on recommendation algorithms at a large competitor, I would say Youtube does a pretty good job overall.
I just think its trendy to hate on things even when it gives you tools to tweak its behavior
I just did a small random sampling of my own subscriptions, and I'd say about 30% or more have no channels listed in their Channels Section. And of the ones that do, it's all much larger youtuber's in their own circles that I'm usually already aware of.
I understand your approach to be more of a heuristic - build a tree like structure by traversing through the channels for each channel.
If I’m interested in A, how do you determine where to start traversal in this tree? And how do you pick out a set of recommendations and rank them?
I have a cursory understanding of ML. In addition to finding the relevant entities, you need to rank them.
An important question is, “what are you optimizing for?” For YouTube, presumably they want to optimize for watch time. If they want to suggest ads, that model might want to optimize for maximizing revenue. I’m going on a tangent here, but when we say “YouTube’s recommendations are bad” we should keep in mind YouTube might be optimizing for revenue... which isn’t the same as optimizing for seconds watched or optimizing for clicks.
The foundational error I feel is trying to capitalize on peoples attention as opposed to aiding in the public good.
Sure when youtube and like services were starting out it was the wild west and all about gaining users and creating traction, however that was a long time ago. We now know where that party was headed and how it ended, and it almost ended society!
At this point any improvements I feel should be focused on minimizing damage and maximizing the public good.
This is the antithesis of anticipating interests as that approach has failed to deliver and even worse has only served to exacerbate echo chambers and divide populations.
Better to gauge interests and make suggestions from a large and vetted list of diverse sources as a primary ingredient of a larger cake of suggestions with actual user defined interest suggestions being the subtle and minimal spice.
This lead to much more relevant results than what it does now it seems, especially when looking for more niche topics.
Now a days notifications are irrelevant and comment you posted gets a notification and start searching what you actually posted. Those are hard to find.
YT is imperfect I am fine with that.
What I hate is though l, when a video gets deleted from your playlist, they should have courtesy to say the video title what got deleted. It just blindly says, video is deleted, it is restricted in your country amd I have no clue what video was that.
Feedback from someone who is vaguely disappointed in YouTube's recommendation algorithm: I don't love every video by a given channel. Which videos I specifically enjoyed watching matter quite a bit.