How to better grok the “curse of dimensionality”?

Question

I've sort of stumbled onto this topic organically and now I'm rather curious. In the context of building a recommendation system, I understand the concept of a high dimensional space made of vectors, and how cosine similarity is a way to derive similarity between vectors regardless of their magnitude.Naturally I was curious about using simple distance-based calculations instead, and I'm hearing about the curse of dimensionality. I think I sort of get it but only on a surface level, not very deeply.I'm also curious if it's possible to take any high dimensional space and somehow convert it into a weighted graph instead? And if there are recommendation/similarity systems based on this approach instead, and basically how all of this works in general at a holistic level.I find 3Blue1Brown's videos (like https://www.youtube.com/watch?v=WUvTyaaNkzM) to be so good at explaining subject matter. Is there anything in this format that really helps illustrate all of this new material I'm referring to?

bjourne · Accepted Answer

Distances grows as dimensionality grows is the curse of dimensionality. Imagine you have a circle with unit radius inscribed in a square whose sides are 2. The circle's area is pi, so pi/2^2 = 79% of all points in the space [-1,-1]-[1,1] are less than unit distance away from origin. For a sphere inside a cube the fraction is (4*pi/3)/2^3 = 52%, for a hypersphere it is (pi^2/2)/2^4 = 31%, and so on. As the number of dimensions grows the fraction approaches zero. E.g. only 0.25% of all points are inside the 10-dimensional sphere. So the distance between two random points in a high-dimensional space is likely large. Thus, modelling documents as high-dimensional vectors breaks down because they will be distant from each other and therefore dissimilar even though they may be "similar" to a human.One way to get around this is to take the cosine of the angle between the documents' respective vectors. While this often works better than the Euclidean distance, it is also affected by the curse of dimensionality. Afaik, there is no good way to work around the curse of dimensionality and similarity metrics not using vector representations often work better in practice. For example, one can model documents as bag-of-words and divide the size of the intersection with the size of the union.

esalman · Answer

I first encountered this in my machine learning class in the context of solving ridge regression or SVM.
Simply put, when performing SVM you want to find a plane that minimizes the sum of its distance from coordinates belonging to two classes. The coordinates can be very high dimensional, which makes the minimization problem hard (primal formulation: $y = w^Tx + b$, w being the weights you need to estimate, which is equal to a very high dimension).
With help of the Lagrangian of the primal formulation, you can show that $x' ~ w^Tx + b ~ ayx^Tx + b$. So instead of having to estimate a very high number of w's, you can do away with only estimating the dot product x^Tx, which is limited by the number of inputs. This is the kernel trick you can use to avoid the curse of dimensionality.

abecedarius · Answer

Why don't you try explaining what it is you find puzzling, yourself? That's not clear to me from your questions.

nextworddev · Answer

As you add more criteria, it becomes harder to find the result you are looking for.