Training of adapted classifiers for video categorization
First Claim
1. A computer implemented method of training video classifiers, the method comprising:
- storing a taxonomy of hierarchically-arranged categories;
storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories;
storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category;
identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels;
training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category;
identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels;
for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos;
applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers;
extracting a content feature vector from video content of frames of the video;
forming a hybrid feature vector comprising the vector of scores and the content feature vector for that video; and
training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category.
2 Assignments
0 Petitions
Accused Products
Abstract
A classifier training system trains adapted classifiers for classifying videos based at least in part on scores produced by application of text-based classifiers to textual metadata of the videos. Each classifier corresponds to a particular category, and when applied to a given video indicates whether the video represents the corresponding category. The classifier training system applies the text-based classifiers to textual metadata of the videos to obtain the scores, and also extracts features from content of the videos, combining the scores and the content features for a video into a set of hybrid features. The adapted classifiers are then trained on the hybrid features. The adaption of the text-based classifiers from the textual domain to the video domain allows the training of accurate video classifiers (the adapted classifiers) without requiring a large training set of authoritatively labeled videos.
-
Citations
18 Claims
-
1. A computer implemented method of training video classifiers, the method comprising:
-
storing a taxonomy of hierarchically-arranged categories; storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories; storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category; identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels; training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category; identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels; for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos; applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers; extracting a content feature vector from video content of frames of the video; forming a hybrid feature vector comprising the vector of scores and the content feature vector for that video; and training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium storing executable computer program instructions comprising:
-
instructions for storing a taxonomy of hierarchically-arranged categories; instructions for storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories; instructions for storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category; instructions for identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels; instructions for training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category; instructions for identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels; instructions for, for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos; applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers; extracting a content feature vector from video content of frames of the video; forming a hybrid feature vector comprising the vector of scores and the content feature vector for the video; and instructions for training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer system comprising:
-
a computer processor; and a computer program executable by the computer processor, the program comprising; instructions for storing a taxonomy of hierarchically-arranged categories; instructions for storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories; instructions for storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category; instructions for identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels; instructions for training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category; instructions for identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels; instructions for, for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos; applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers; extracting a content feature vector from video content of frames of the video; forming a hybrid feature vector comprising the vector of scores and the content feature vector for the video; and instructions for training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification