Learning concepts for video annotation

US 8,396,286 B1
Filed: 06/24/2010
Issued: 03/12/2013
Est. Priority Date: 06/25/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for learning concepts applicable to videos, the method comprising:

storing a set of concepts derived from textual metadata of a plurality of videos;

initializing a set of candidate classifiers, each candidate classifier associated with one of the concepts;

extracting features from the plurality of videos, including a set of training features from a training set of the videos and a set of validation features from a validation set of the videos;

learning accurate classifiers for the concepts by iteratively performing the steps of;

training the candidate classifiers based at least in part on the set of training features;

determining which of the trained candidate classifiers accurately classify videos, based at least in part on application of the trained candidate classifiers to the set of validation features;

applying the candidate classifiers determined to be accurate to ones of the features, thereby obtaining a set of scores, andadding the set of scores to the set of training features; and

storing the candidate classifiers determined to be accurate.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A concept learning module trains video classifiers associated with a stored set of concepts derived from textual metadata of a plurality of videos, the training based on features extracted from training videos. Each of the video classifiers can then be applied to a given video to obtain a score indicating whether or not the video is representative of the concept associated with the classifier. The learning process does not require any concepts to be known a priori, nor does it require a training set of videos having training labels manually applied by human experts. Rather, in one embodiment the learning is based solely upon the content of the videos themselves and on whatever metadata was provided along with the video, e.g., on possibly sparse and/or inaccurate textual metadata specified by a user of a video hosting service who submitted the video.

70 Citations

View as Search Results

20 Claims

1. A computer-implemented method for learning concepts applicable to videos, the method comprising:
- storing a set of concepts derived from textual metadata of a plurality of videos;
  
  initializing a set of candidate classifiers, each candidate classifier associated with one of the concepts;
  
  extracting features from the plurality of videos, including a set of training features from a training set of the videos and a set of validation features from a validation set of the videos;
  
  learning accurate classifiers for the concepts by iteratively performing the steps of;
  
  training the candidate classifiers based at least in part on the set of training features;
  
  determining which of the trained candidate classifiers accurately classify videos, based at least in part on application of the trained candidate classifiers to the set of validation features;
  
  applying the candidate classifiers determined to be accurate to ones of the features, thereby obtaining a set of scores, andadding the set of scores to the set of training features; and
  
  storing the candidate classifiers determined to be accurate.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, further comprising:
    - identifying, in a first iteration of the training;
      
      a trained classifier determined not to be accurate;
      
      retraining the trained classifier in a next iteration of the training, based at least in part on the added set of scores; and
      
      determining that the retrained classifier accurately classifies videos.
  - 3. The computer-implemented method of claim 1, further comprising:
    - identifying, in a first iteration of the training, a trained classifier determined to be accurate;
      
      retraining the trained classifier in a next iteration of the training, based at least in part on the added set of scores; and
      
      determining that the retrained classifier is more accurate than the trained classifier before the retraining.
  - 4. The computer-implemented method of claim 1, wherein the concepts are n-grams consisting of at most n words sequentially ordered within the textual metadata.
  - 5. The computer-implemented method of claim 1wherein the validation set of features comprises, for each of the concepts, a set of positive training features extracted from videos having the concept within their textual metadata, and a set of negative training features extracted from videos lacking the concept within their textual metadata;
    - andwherein determining which of the trained candidate classifiers accurately classify videos comprises;
      
      obtaining validation scores from the application of the trained candidate classifiers to the set of validation features; and
      
      for a first one of the validation scores produced by applying a first one of the candidate classifiers to a first set of the features extracted from a first one of the videos, wherein the first one of the candidate classifiers corresponds to a first one of the concepts;
      
      determining whether the first video represents the first concept, based at least in part on the first validation score, anddetermining whether the first set of the features belongs to the positive training features or to the negative training features.
  - 6. The computer-implemented method of claim 1, further comprising:
    - for a first one of the stored classifiers corresponding to a first one of the stored concepts;
      
      identifying, within textual metadata of a video, text corresponding to the first concept;
      
      identifying, within the added set of scores, a score obtained by applying the first one of the stored classifiers to the video; and
      
      responsive to the score indicating that the video does not represent the first concept, modifying the textual metadata.
  - 7. The computer-implemented method of claim 6, wherein modifying the textual metadata comprises removing the text corresponding to the first concept from the textual metadata.
  - 8. The computer-implemented method of claim 1, further comprising:
    - for a first one of the stored classifiers corresponding to a first one of the stored concepts;
      
      identifying, within the added set of scores, a score obtained from applying the first one of the stored classifiers to a first video; and
      
      responsive to the score indicating that the video represents the first concept, adding text corresponding to the first concept to the textual metadata.
  - 9. The computer-implemented method of claim 1, wherein training the candidate classifiers comprises applying an ensemble learning classifier to the training set of features.
  - 10. The computer-implemented method of claim 1, wherein none of the plurality of videos has, within textual metadata of the video, a training label from a predefined set of training concepts, the training label manually applied by a human expert.

11. A non-transitory computer-readable storage medium having executable computer program instructions embodied therein for learning concepts applicable to videos, actions of the computer program instructions comprising:
- storing a set of concepts derived from textual metadata of a plurality of videos;
  
  initializing a set of candidate classifiers, each candidate classifier associated with one of the concepts;
  
  extracting features from the plurality of videos, including a set of training features from a training set of the videos and a set of validation features from a validation set of the videos;
  
  learning accurate classifiers for the concepts by iteratively performing the steps of;
  
  training the candidate classifiers based at least in part on the set of training features;
  
  determining which of the trained candidate classifiers accurately classify videos, based at least in part on application of the trained candidate classifiers to the set of validation features;
  
  applying the candidate classifiers determined to be accurate to ones of the features, thereby obtaining a set of scores, andadding the set of scores to the set of training features; and
  
  storing the candidate classifiers determined to be accurate.
- View Dependent Claims (12, 13, 14)
- - 12. The non-transitory computer-readable storage medium of claim 11, the actions of the instructions further comprising:
    - identifying, in a first iteration of the training, a trained classifier determined not to be accurate;
      
      retraining the trained classifier in a next iteration of the training, based at least in part on the added set of scores; and
      
      determining that the retrained classifier accurately classifies videos.
  - 13. The non-transitory computer-readable storage medium of claim 11:
    - wherein the validation set of features comprises, for each of the concepts, a set of positive training features extracted from videos having the concept within their textual metadata, and a set of negative training features extracted from videos lacking the concept within their textual metadata; and
      
      wherein determining which of the trained candidate classifiers accurately classify videos comprises;
      
      obtaining validation scores from the application of the trained candidate classifiers to the set of validation features; and
      
      for a first one of the validation scores produced by applying a first one of the candidate classifiers to a first set of the features extracted from a first one of the videos, wherein the first one of the candidate classifiers corresponds to a first one of the concepts;
      
      determining whether the first video represents the first concept, based at least in part on the first validation score, anddetermining whether the first set of the features belongs to the positive training features or to the negative training features.
  - 14. The non-transitory computer-readable storage medium of claim 11, further comprising:
    - for a first one of the stored classifiers corresponding to a first one of the stored concepts;
      
      identifying, within the added set of scores, a score obtained from applying the first one of the stored classifiers to a first video; and
      
      responsive to the score indicating that the video represents the first concept, adding text corresponding to the first concept to the textual metadata.

15. A computer system for learning concepts applicable to videos, the system comprising:
- a computer processor; and
  
  a computer program executable by the computer processor and performing actions comprising;
  
  storing a set of concepts derived from textual metadata of a plurality of videos;
  
  initializing a set of candidate classifiers, each candidate classifier associated with one of the concepts;
  
  extracting features from the plurality of videos, including a set of training features from a training set of the videos and a set of validation features from a validation set of the videos;
  
  learning accurate classifiers for the concepts by iteratively performing the steps of;
  
  training the candidate classifiers based at least in part on the set of training features;
  
  determining which of the trained candidate classifiers accurately classify videos, based at least in part on application of the trained candidate classifiers to the set of validation features;
  
  applying the candidate classifiers determined to be accurate to ones of the features, thereby obtaining a set of scores, andadding the set of scores to the set of training features; and
  
  storing the candidate classifiers determined to be accurate.
- View Dependent Claims (16, 17)
- - 16. The computer system of claim 15, the actions further comprising:
    - identifying, in a first iteration of the training, a trained classifier determined not to be accurate;
      
      retraining the trained classifier in a next iteration of the training, based at least in part on the added set of scores; and
      
      determining that the retrained classifier accurately classifies videos.
  - 17. The computer system of claim 15, wherein the concepts are n-grams consisting of at most n words sequentially ordered within the textual metadata.

18. A computer-implemented method for learning concepts applicable to videos, the method comprising:
- extracting a set of concepts from textual metadata of a plurality of videos;
  
  initializing a set of candidate classifiers, each candidate classifier associated with one of the concepts;
  
  extracting a feature vector from each of the plurality of videos, including a set of training feature vectors from a training set of the videos and a set of validation feature vectors from a validation set of the videos;
  
  learning accurate classifiers for the concepts by iteratively performing the steps of;
  
  training the candidate classifiers based at least in part on the validation feature vectors;
  
  determining which of the trained candidate classifiers accurately classify videos and which of the trained candidate classifiers do not accurately classify videos, based at least in part on application of the trained candidate classifiers to the validation feature vectors;
  
  applying the candidate classifiers determined to be accurate to ones of the feature vectors, thereby obtaining a set of scores for each of the ones of the feature vectors;
  
  for each of the ones of the feature vectors, adding the corresponding set of scores to the feature vector, thereby obtaining an augmented feature vector; and
  
  for one of the candidate classifiers determined not to accurately classify videos;
  
  retraining the candidate classifier in a later iteration based at least in part on ones of the augmented feature vectors, anddetermining that the retrained candidate classifier accurately classifies videos;
  
  storing the candidate classifiers determined to be accurate in association with their associated concepts; and
  
  storing the augmented feature vectors in association with the videos from which they were originally extracted.
- View Dependent Claims (19, 20)
- - 19. The computer-implemented method of claim 18, wherein the concepts are n-grams consisting of at most n words sequentially ordered within the textual metadata.
  - 20. The computer-implemented method of claim 18, wherein:
    - the set of validation feature vectors comprises, for each of the concepts, a set of positive training feature vectors extracted from videos having the concept within their textual metadata, and a set of negative training feature vectors extracted from videos lacking the concept within their textual metadata; and
      
      determining which of the trained candidate classifiers accurately classify videos comprises;
      
      obtaining validation scores from the application of the trained candidate classifiers to the set of validation feature vectors; and
      
      for a first one of the validation scores produced by applying a first one of the candidate classifiers to a first feature vector extracted from a first one of the videos, the first one of the candidate classifiers corresponding to a first one of the concepts;
      
      determining whether the first video represents the first concept, based at least in part on the first validation score, anddetermining whether the first feature vector belongs to the positive training feature vectors or to the negative training feature vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aradhye, Hrishikesh, Toderici, George, Yagnik, Jay
Primary Examiner(s)
Alavi, Amir

Application Number

US12/822,727
Time in Patent Office

992 Days
Field of Search

382/156, 382/157, 382/159, 382/170, 382/173, 382/190, 382224-227, 382/289, 345420-426, 706/20, 707/1, 707/104.1, 707/E17.001
US Class Current

382/159
CPC Class Codes

G06F 18/217   Validation; Performance eva...

G06V 20/41   Higher-level, semantic clus...

G06V 20/70   Labelling scene content, e....

G06V 30/1916   Validation; Performance eva...

Learning concepts for video annotation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

70 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Learning concepts for video annotation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

70 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links