Scikit-learn: Label Not X Is Present In All Training Examples
Solution 1:
There are 2 problems here:
1) The missing label warning 2) You are getting all 0's for predictions
The warning means that some of your classes are missing from the training data. This is a common problem. If you have 400 classes, then some of them must only occur very rarely, and on any split of the data, some classes may be missing from one side of the split. There may also be classes that simply don't occur in your data at all. You could try Y.sum(axis=0).all()
and if that is False, then some classes do not occur even in Y. This all sounds horrible, but realistically, you aren't going to be able to correctly predict classes that occur 0, 1, or any very small number of times anyway, so predicting 0 for those is probably about the best you can do.
As for the all-0 predictions, I'll point out that with 400 classes, probably all of your classes occur much less than half the time. You could check Y.mean(axis=0).max()
to get the highest label frequency. With 400 classes, it might only be a few percent. If so, a binary classifier that has to make a 0-1 prediction for each class will probably pick 0 for all classes on all instances. This isn't really an error, it is just because all of the class frequencies are low.
If you know that each instance has a positive label (at least one), you could get the decision values (clf.decision_function
) and pick the class with the highest one for each instance. You'll have to write some code to do that, though.
I once had a top-10 finish in a Kaggle contest that was similar to this. It was a multilabel problem with ~200 classes, none of which occurred with even a 10% frequency, and we needed 0-1 predictions. In that case I got the decision values and took the highest one, plus anything that was above a threshold. I chose the threshold that worked the best on a holdout set. The code for that entry is on Github: Kaggle Greek Media code. You might take a look at it.
If you made it this far, thanks for reading. Hope that helps.
Post a Comment for "Scikit-learn: Label Not X Is Present In All Training Examples"