Training of adapted classifiers for video categorization

US 8,452,778 B1
Filed: 09/01/2010
Issued: 05/28/2013
Est. Priority Date: 11/19/2009
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method of training video classifiers, the method comprising:

storing a taxonomy of hierarchically-arranged categories;

storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories;

storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category;

identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels;

training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category;

identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels;

for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos;

applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers;

extracting a content feature vector from video content of frames of the video;

forming a hybrid feature vector comprising the vector of scores and the content feature vector for that video; and

training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A classifier training system trains adapted classifiers for classifying videos based at least in part on scores produced by application of text-based classifiers to textual metadata of the videos. Each classifier corresponds to a particular category, and when applied to a given video indicates whether the video represents the corresponding category. The classifier training system applies the text-based classifiers to textual metadata of the videos to obtain the scores, and also extracts features from content of the videos, combining the scores and the content features for a video into a set of hybrid features. The adapted classifiers are then trained on the hybrid features. The adaption of the text-based classifiers from the textual domain to the video domain allows the training of accurate video classifiers (the adapted classifiers) without requiring a large training set of authoritatively labeled videos.

Citations

18 Claims

1. A computer implemented method of training video classifiers, the method comprising:
- storing a taxonomy of hierarchically-arranged categories;
  
  storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories;
  
  storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category;
  
  identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels;
  
  training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category;
  
  identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels;
  
  for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos;
  
  applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers;
  
  extracting a content feature vector from video content of frames of the video;
  
  forming a hybrid feature vector comprising the vector of scores and the content feature vector for that video; and
  
  training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer implemented method of claim 1, wherein the text-based classifiers are trained with a first learning algorithm, and the adapted classifiers are trained with a second learning algorithm different from the first learning algorithm.
  - 3. The computer implemented method of claim 1, wherein the text documents are web pages.
  - 4. The computer implemented method of claim 1, wherein the taxonomy is represented as a taxonomy tree, each node of the tree representing a category, and wherein identifying the positive training subset of the plurality of the labeled videos for a category comprises determining whether a category label of a video corresponds to the category or to a descendant of the category in the taxonomy tree.
  - 5. The computer implemented method of claim 1, further comprising:
    - applying a first one of the adapted classifiers associated with a first category to a first video, thereby producing a score quantifying a degree to which the first video represents the first category; and
      
      responsive at least in part to determining that the score is greater than a first threshold, automatically labeling the first video with the first category.
  - 6. The computer implemented method of claim 5, further comprising:
    - receiving a query for videos of the first category; and
      
      providing a result set that includes the first video.

7. A non-transitory computer-readable storage medium storing executable computer program instructions comprising:
- instructions for storing a taxonomy of hierarchically-arranged categories;
  
  instructions for storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories;
  
  instructions for storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category;
  
  instructions for identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels;
  
  instructions for training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category;
  
  instructions for identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels;
  
  instructions for, for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos;
  
  applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers;
  
  extracting a content feature vector from video content of frames of the video;
  
  forming a hybrid feature vector comprising the vector of scores and the content feature vector for the video; and
  
  instructions for training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The non-transitory computer-readable storage medium of claim 7, wherein the text-based classifiers are trained with a first learning algorithm, and the adapted classifiers are trained with a second learning algorithm different from the first learning algorithm.
  - 9. The non-transitory computer-readable storage medium of claim 7, wherein the text documents are web pages.
  - 10. The non-transitory computer-readable storage medium of claim 7, wherein the taxonomy is represented as a taxonomy tree, each node of the tree representing a category, and wherein identifying the positive training subset of the plurality of the labeled videos for a category comprises determining whether a category label of a video corresponds to the category or to a descendant of the category in the taxonomy tree.
  - 11. The non-transitory computer-readable storage medium of claim 7, further comprising:
    - instructions for applying a first one of the adapted classifiers associated with a first category to a first video, thereby producing a score quantifying a degree to which the first video represents the first category; and
      
      instructions for automatically labeling the first video with the first category, responsive at least in part to determining that the score is greater than a first threshold.
  - 12. The computer implemented method of claim 11, further comprising:
    - instructions for receiving a query for videos of the first category; and
      
      instructions for providing a result set that includes the first video.

13. A computer system comprising:
- a computer processor; and
  
  a computer program executable by the computer processor, the program comprising;
  
  instructions for storing a taxonomy of hierarchically-arranged categories;
  
  instructions for storing a set of labeled videos, each of the labeled videos having associated textual metadata and being initially labeled as representing one or more of the categories;
  
  instructions for storing labels initially associated with a set of text documents distinct from the labeled videos, each stored label corresponding to one of the categories and indicating that the associated text document represents the category;
  
  instructions for identifying, for each of the categories, a positive training subset of the text documents that represent the category based on their stored labels, and a negative training subset of the text documents that do not represent the category based on their stored labels;
  
  instructions for training a set of text-based classifiers based on the positive training subsets and the negative training subsets, each text-based classifier associated with one of the categories and producing, when applied to text, a score providing a measure of how strongly the text represents the associated category;
  
  instructions for identifying, for each of the categories, a positive training subset of the labeled videos that represent the category based on their labels, and a negative training subset of the labeled videos that do not represent the category based on their labels;
  
  instructions for, for each video of the positive training subsets of the labeled videos and of the negative training subsets of the labeled videos;
  
  applying the text-based classifiers to the associated textual metadata of the video, thereby producing a vector of scores for the video, the scores providing measures of how strongly the textual metadata of the video represents the categories associated with the text-based classifiers;
  
  extracting a content feature vector from video content of frames of the video;
  
  forming a hybrid feature vector comprising the vector of scores and the content feature vector for the video; and
  
  instructions for training a set of adapted classifiers based on the hybrid feature vectors of the videos in the positive training subsets of the labeled videos and on the hybrid feature vectors of the videos in the negative training subsets of the labeled videos, each adapted classifier associated with one of the categories and producing, when applied to an unlabeled video, a score providing a measure of how strongly the unlabeled video represents the associated category.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer system of claim 13, wherein the text-based classifiers are trained with a first learning algorithm, and the adapted classifiers are trained with a second learning algorithm different from the first learning algorithm.
  - 15. The computer system of claim 13, wherein the text documents are web pages.
  - 16. The computer system of claim 13, wherein the taxonomy is represented as a taxonomy tree, each node of the tree representing a category, and wherein identifying the positive training subset of the plurality of the labeled videos for a category comprises determining whether a category label of a video corresponds to the category or to a descendant of the category in the taxonomy tree.
  - 17. The computer system of claim 13, further comprising:
    - instructions for applying a first one of the adapted classifiers associated with a first category to a first video, thereby producing a score quantifying a degree to which the first video represents the first category; and
      
      instructions for automatically labeling the first video with the first category, responsive at least in part to determining that the score is greater than a first threshold.
  - 18. The computer system of claim 17, further comprising:
    - instructions for receiving a query for videos of the first category; and
      
      instructions for providing a result set that includes the first video.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Song, Yang, Zhao, Ming, Yagnik, Jay
Primary Examiner(s)
Chbouki, Tarek

Application Number

US12/874,015
Time in Patent Office

1,000 Days
Field of Search

None
US Class Current

707/748
CPC Class Codes

G06F 16/7844   using original textual cont...

G06F 16/7847   using low-level visual feat...

G06V 20/40   in video content extracting...

Training of adapted classifiers for video categorization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Training of adapted classifiers for video categorization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links