Skip to content Skip to sidebar Skip to footer

Matlab K-means Cosine Assigns Everything To One Cluster

I'm using Matlab's regular kmeans algorithm with 'Distance','cosine','EmptyAction','drop' on an L2-normalized feature matrix and I have a problem. The output that Matlab generates

Solution 1:

It is the cosine distance that is making it fail, it works with sqEuclidean. I think the cosine distance needs more info, or else doesn't make sense on your data set.

Edit: I will agree with you that the documentation is a little vague here...but the definition of cosine distance in the pdist function of Matlab is: "One minus the cosine of the included angle between points (treated as vectors)."

I take it from that, that the angle must be included(I am assuming in the next column). But that kind of seems like it defeats the purpose.cosine similarity Edit again: I guess it is more likely that included means "the included angle between 2 vectors". In this case I think cosine expects 2 or more columns to work on.

Also, if your already into python there are some good machine learning tools there as well. Here is one I have used. There is also MILK, but I have never used it myself.

Post a Comment for "Matlab K-means Cosine Assigns Everything To One Cluster"