Learning category classifiers for a video corpus
First Claim
1. A computer-implemented method for determining category classifiers applicable to videos of a digital video repository, the method comprising:
- accessing a category-instance repository comprising relationships between categories and instances of categories, the category-instance repository derived from a corpus of documents comprising textual portions, the derivation comprising computing strengths for relationships between categories and instances based at least in part on frequencies of co-occurrence of the categories and instances over the corpus of documents;
accessing a set of video concept classifiers derived from the videos and associated with concepts derived from textual metadata of the videos of the digital video repository;
computing consistency scores for a plurality of the categories based at least in part on scores obtained from video concept classifiers associated with concepts corresponding to the instances of the plurality of categories;
selectively removing categories of the category-instance repository based at least in part on whether the computed consistency scores indicate a threshold level of inconsistency; and
determining, for each category of a plurality of the categories not removed, a category classifier based at least in part on the video concept classifiers of concepts associated with the category, the determined category classifier when applied to a video producing a score indicating whether the video represents the category for which the category classifier was determined.
2 Assignments
0 Petitions
Accused Products
Abstract
A classifier training system learns classifiers for categories by combining data from a category-instance repository comprising relationships between categories and more specific instances of those categories with a set of video classifiers for different concepts. The category-instance repository is derived from the domain of textual documents, such as web pages, and the concept classifiers are derived from the domain of video. Taken together, the category-instance repository and the concept classifiers provide sufficient data for obtaining accurate classifiers for categories that encompass other lower-level concepts, where the categories and their classifiers may not be obtainable solely from the video domain.
65 Citations
19 Claims
-
1. A computer-implemented method for determining category classifiers applicable to videos of a digital video repository, the method comprising:
-
accessing a category-instance repository comprising relationships between categories and instances of categories, the category-instance repository derived from a corpus of documents comprising textual portions, the derivation comprising computing strengths for relationships between categories and instances based at least in part on frequencies of co-occurrence of the categories and instances over the corpus of documents; accessing a set of video concept classifiers derived from the videos and associated with concepts derived from textual metadata of the videos of the digital video repository; computing consistency scores for a plurality of the categories based at least in part on scores obtained from video concept classifiers associated with concepts corresponding to the instances of the plurality of categories; selectively removing categories of the category-instance repository based at least in part on whether the computed consistency scores indicate a threshold level of inconsistency; and determining, for each category of a plurality of the categories not removed, a category classifier based at least in part on the video concept classifiers of concepts associated with the category, the determined category classifier when applied to a video producing a score indicating whether the video represents the category for which the category classifier was determined. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable storage medium having executable computer program instructions embodied therein for determining category classifiers applicable to digital media items of a digital media repository, actions of the computer program instructions comprising:
-
accessing a category-instance repository comprising relationships between categories and instances of categories, the category-instance repository derived from a corpus of documents comprising textual portions, the derivation comprising computing strengths for relationships between categories and instances based at least in part on frequencies of co-occurrence of the categories and instances over the corpus of documents; accessing a set of media item concept classifiers derived from the media items and associated with concepts derived from textual metadata of the digital media items in the digital media repository; computing consistency scores for a plurality of the categories based at least in part on scores obtained from media item concept classifiers associated with concepts corresponding to the instances of the plurality of categories; selectively removing the categories of the category-instance repository based at least in part whether the computed consistency scores indicate a threshold level of inconsistency; and determining, for each category of a plurality of the categories not removed, a category classifier based at least in part on the media item concept classifiers of concepts associated with the category, the determined category classifier when applied to a media item producing a score indicating whether the media item represents the category for which the category classifier was determined. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer system for determining category classifiers applicable to videos of a digital video repository, the system comprising:
-
a computer processor; and a computer program executable by the computer processor and performing actions comprising; creating a category-instance repository comprising relationships between categories and instances of categories, the creating comprising; applying textual patterns over a corpus of documents comprising textual portions, and computing strengths for category-instance relationships based at least in part on frequencies of co-occurrence of the categories and instances; training a set of video concept classifiers on the videos and on textual metadata associated with the videos, each of a plurality of the classifiers corresponding to a concept derived from the textual metadata; removing instances of categories, responsive to the instances not corresponding to any of the concepts; filtering the categories of the category-instance repository based on the concepts associated with the video concept classifiers to remove categories that are not likely to be accurately recognized in videos; removing inconsistent categories by; identifying, as video concept classifiers associated with a category, video concept classifiers of concepts with labels corresponding to labels of the instances of the category; applying the video concept classifiers associated with the category to a video, thereby obtaining concept scores; computing a consistency score for the category based at least in part on the concept scores; and removing the category responsive to the computed consistency score indicating some threshold level of inconsistency; and determining, for each of a plurality of the categories, a category classifier based at least in part on the video concept classifiers of concepts associated with the category. - View Dependent Claims (17)
-
-
18. A computer-implemented method comprising:
-
accessing a category-instance repository comprising relationships between categories and instances of categories, the category-instance repository derived from a corpus of documents comprising textual portions, the derivation comprising computing strengths for relationships between categories and instances based at least in part on frequencies of co-occurrence of the categories and instances over the corpus of documents; accessing a set of media item concept classifiers derived from media items and associated with concepts derived from textual metadata of the media items; computing consistency scores for a plurality of the categories based at least in part on scores obtained from media item concept classifiers associated with concepts corresponding to the instances of the plurality of categories; selectively removing the categories of the category-instance repository based at least in part whether the computed consistency scores indicate a threshold level of inconsistency; and determining, for each category of a plurality of the categories not removed, a category classifier based at least in part on the media item concept classifiers associated with concepts that are associated with the category, the determined category classifier when applied to a media item producing a score indicating whether the media item represents the category for which the category classifier was determined. - View Dependent Claims (19)
-
Specification