Naturally I was curious about using simple distance-based calculations instead, and I'm hearing about the curse of dimensionality. I think I sort of get it but only on a surface level, not very deeply.
I'm also curious if it's possible to take any high dimensional space and somehow convert it into a weighted graph instead? And if there are recommendation/similarity systems based on this approach instead, and basically how all of this works in general at a holistic level.
I find 3Blue1Brown's videos (like https://www.youtube.com/watch?v=WUvTyaaNkzM) to be so good at explaining subject matter. Is there anything in this format that really helps illustrate all of this new material I'm referring to?
One way to get around this is to take the cosine of the angle between the documents' respective vectors. While this often works better than the Euclidean distance, it is also affected by the curse of dimensionality. Afaik, there is no good way to work around the curse of dimensionality and similarity metrics not using vector representations often work better in practice. For example, one can model documents as bag-of-words and divide the size of the intersection with the size of the union.
Simply put, when performing SVM you want to find a plane that minimizes the sum of its distance from coordinates belonging to two classes. The coordinates can be very high dimensional, which makes the minimization problem hard (primal formulation: $y = w^Tx + b$, w being the weights you need to estimate, which is equal to a very high dimension).
With help of the Lagrangian of the primal formulation, you can show that $x' ~ w^Tx + b ~ ayx^Tx + b$. So instead of having to estimate a very high number of w's, you can do away with only estimating the dot product x^Tx, which is limited by the number of inputs. This is the kernel trick you can use to avoid the curse of dimensionality.