Optimizing multi-class multimedia data classification using negative data
First Claim
1. A computer-implemented method comprising:
- accessing a corpus of multimedia data items, the corpus of multimedia data items including positive multimedia data items and negative multimedia data items, wherein;
individual positive multimedia data items of the positive multimedia data items are associated with individual labels of a plurality of labels; and
the negative multimedia data items are not associated with any label of the plurality of labels;
extracting a first set of features from the individual positive multimedia data items;
training a classifier based at least in part on the first set of features, the classifier including a plurality of model vectors each corresponding to one of the individual labels;
based at least in part on applying the classifier to one or more of the individual positive multimedia data items, collecting statistics corresponding to each of the individual labels;
extracting a second set of features from a new multimedia data item;
applying the classifier to the second set of features to determine similarity values corresponding to each of the individual labels;
determining that the new multimedia data item is one of the negative multimedia data items;
based at least in part on determining that the new multimedia data item is one of the negative multimedia data items, comparing the statistics with the similarity values corresponding to each of the individual labels; and
based at least in part on comparing the statistics with the similarity values, updating individual model vectors of the plurality of model vectors.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for optimizing multi-class image classification by leveraging negative multimedia data items to train and update classifiers are described. The techniques describe accessing positive multimedia data items of a plurality of multimedia data items, extracting features from the positive multimedia data items, and training classifiers based at least in part on the features. The classifiers may include a plurality of model vectors each corresponding to one of the individual labels. The system may iteratively test the classifiers using positive multimedia data and negative multimedia data and may update one or more model vectors associated with the classifiers differently, depending on whether multimedia data items are positive or negative. Techniques for applying the classifiers to determine whether a new multimedia data item is associated with a topic based at least in part on comparing similarity values with corresponding statistics derived from classifier training are also described.
38 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
accessing a corpus of multimedia data items, the corpus of multimedia data items including positive multimedia data items and negative multimedia data items, wherein; individual positive multimedia data items of the positive multimedia data items are associated with individual labels of a plurality of labels; and the negative multimedia data items are not associated with any label of the plurality of labels; extracting a first set of features from the individual positive multimedia data items; training a classifier based at least in part on the first set of features, the classifier including a plurality of model vectors each corresponding to one of the individual labels; based at least in part on applying the classifier to one or more of the individual positive multimedia data items, collecting statistics corresponding to each of the individual labels; extracting a second set of features from a new multimedia data item; applying the classifier to the second set of features to determine similarity values corresponding to each of the individual labels; determining that the new multimedia data item is one of the negative multimedia data items; based at least in part on determining that the new multimedia data item is one of the negative multimedia data items, comparing the statistics with the similarity values corresponding to each of the individual labels; and based at least in part on comparing the statistics with the similarity values, updating individual model vectors of the plurality of model vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
one or more processors; and instructions stored in computer storage media executable by the one or more processors to perform operations comprising; accessing a corpus of multimedia data items, the corpus of multimedia data items including positive multimedia data items and negative multimedia data items, wherein; individual positive multimedia data items of the positive multimedia data items are associated with individual labels of a plurality of labels; and the negative multimedia data items are not associated with any label of the plurality of labels; extracting a first set of features from the individual positive multimedia data items; training a classifier based at least in part on the first set of features, the classifier including a plurality of model vectors each corresponding to one of the individual labels; based at least in part on applying the classifier to one or more of the individual positive multimedia data items, collecting statistics corresponding to each of the individual labels; extracting a second set of features from a new multimedia data item; applying the classifier to the second set of features to determine similarity values corresponding to each of the individual labels; determining that the new multimedia data item is one of the negative multimedia data items; based at least in part on determining that the new multimedia data item is one of the negative multimedia data items, comparing the statistics with the similarity values corresponding to each of the individual labels; and based at least in part on comparing the statistics with the similarity values, updating individual model vectors of the plurality of model vectors. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. One or more computer storage media encoded with instructions that, when executed by a processor, configure a computer to perform acts comprising:
-
accessing a corpus of multimedia data items, the corpus of multimedia data items including positive multimedia data items and negative multimedia data items, wherein; individual positive multimedia data items of the positive multimedia data items are associated with individual labels of a plurality of labels; and the negative multimedia data items are not associated with any label of the plurality of labels; extracting a first set of features from the individual positive multimedia data items; training a classifier based at least in part on the first set of features, the classifier including a plurality of model vectors each corresponding to one of the individual labels; based at least in part on applying the classifier to one or more of the individual positive multimedia data items, collecting statistics corresponding to each of the individual labels; extracting a second set of features from a new multimedia data item; applying the classifier to the second set of features to determine similarity values corresponding to each of the individual labels; determining that the new multimedia data item is one of the negative multimedia data items; based at least in part on determining that the new multimedia data item is one of the negative multimedia data items, comparing the statistics with the similarity values corresponding to each of the individual labels; and based at least in part on comparing the statistics with the similarity values, updating individual model vectors of the plurality of model vectors. - View Dependent Claims (17, 18, 19, 20)
-
Specification