Automatic large scale video object recognition

US 8,254,699 B1
Filed: 02/02/2009
Issued: 08/28/2012
Est. Priority Date: 02/02/2009
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for generating a classification model of visual objects present in visual content items stored in a visual content repository, each visual content item having a textual description, the method comprising:

for each of a plurality of object names, automatically selecting a plurality of visual content items from the visual content repository, extracting feature vectors from the visual content items, and performing a number of dimensionality reduction rounds on the feature vectors, each round producing reduced feature vectors as input for the next round, thereby producing multiple sets of reduced feature vectors for each object name;

for each object name, performing consistency learning on the sets of reduced feature vectors, until one of the sets of reduced feature vectors for the object name has a minimum measure of similarity to the other feature vectors associated with the object name; and

storing as the classification model for each object name, the set of reduced feature vectors which have the minimum measure of similarity.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An object recognition system performs a number of rounds of dimensionality reduction and consistency learning on visual content items such as videos and still images, resulting in a set of feature vectors that accurately predict the presence of a visual object represented by a given object name within an visual content item. The feature vectors are stored in association with the object name which they represent and with an indication of the number of rounds of dimensionality reduction and consistency learning that produced them. The feature vectors and the indication can be used for various purposes, such as quickly determining a visual content item containing a visual representation of a given object name.

88 Citations

View as Search Results

43 Claims

1. A computer implemented method for generating a classification model of visual objects present in visual content items stored in a visual content repository, each visual content item having a textual description, the method comprising:
- for each of a plurality of object names, automatically selecting a plurality of visual content items from the visual content repository, extracting feature vectors from the visual content items, and performing a number of dimensionality reduction rounds on the feature vectors, each round producing reduced feature vectors as input for the next round, thereby producing multiple sets of reduced feature vectors for each object name;
  
  for each object name, performing consistency learning on the sets of reduced feature vectors, until one of the sets of reduced feature vectors for the object name has a minimum measure of similarity to the other feature vectors associated with the object name; and
  
  storing as the classification model for each object name, the set of reduced feature vectors which have the minimum measure of similarity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the number of dimensionality reductions performed on the feature vectors for an object name in order to reach the minimum measure of similarity varies with respect to different ones of the object names.
  - 3. The method of claim 1, wherein the classification model for an object name includes an indicator of the number of dimensionality reduction rounds performed on its learned feature vectors.
  - 4. The method of claim 1, wherein the plurality of visual content items are automatically selected from the visual content repository based at least in part on a relationship between the object name and the textual descriptions of the visual content items.
  - 5. The method of claim 1, further comprising:
    - receiving a visual content item for which no classification model has yet been stored;
      
      performing a plurality of dimensionality reduction and consistency learning rounds on the received visual content item, each round resulting in a set of feature vectors associated with the visual content item;
      
      identifying object names of the plurality of object names having classification models similar to the feature vectors associated with the received visual content item;
      
      producing probabilities that the received visual content item contains visual representations corresponding to the object names; and
      
      storing the probabilities in a recognition repository in association with their respective object names and with the received visual content item.
  - 6. The method of claim 5, wherein producing probabilities that the received visual content item contains visual representations corresponding to the object names comprises:
    - for each object name of the plurality of identified object names;
      
      identifying the object name'"'"'s learned feature vectors that correspond to the indicator of the number of dimensionality reductions for the object name;
      
      identifying the received visual content item'"'"'s feature vectors that correspond to the indicator of the number of dimensionality reductions for the object name; and
      
      comparing the identified feature vectors for the object name to the identified feature vectors for the received visual content item, thereby producing a probability that the received visual content item contains a visual representation corresponding to the object name.
  - 7. The method of claim 5, further comprising determining, for an object name, a plurality of visual content items in the visual content repository having the highest probabilities of containing a visual representation of the object name, the determining based at least in part on the probabilities of the recognition repository.
  - 8. The method of claim 5, further comprising:
    - identifying a plurality of object names having the highest probabilities of having a visual representation within a first visual content item in the visual content repository; and
      
      revising a list of labels within metadata associated with the first visual content item, based at least in part on the identified plurality of object names.
  - 9. The method of claim 1, wherein a classification model is considered to have the minimum measure of similarity to the feature vectors associated with the received visual content item if the classification model'"'"'s feature vectors have been stored in the same cluster of feature vectors as the feature vectors associated with the received visual content item, according to a feature vector clustering algorithm.
  - 10. The method of claim 1, wherein the set of object names comprises at least 50,000 entries.
  - 11. The method of claim 1, further comprising extracting the plurality of object names from one of a group consisting of a lexical database and a search engine index.
  - 12. The method of claim 1, wherein the textual descriptions of the visual content items are related to the object names by semantic similarity.
  - 13. The method of claim 1, wherein the textual descriptions of the visual content items literally contain the object names.
  - 14. The method of claim 1, wherein performing consistency learning comprises computing a measure of similarity for a feature vector based at least in part on comparisons between the feature vector and other feature vectors, wherein matches between the feature vector and other feature vectors for the same object name increase the score, and matches between the feature vector and feature vectors for different object names decrease the score.

15. An object recognition system for generating a classification model for recognizing a visual object, the system comprising:
- an object name repository storing a plurality of object names;
  
  a visual content repository storing a plurality of visual content items;
  
  a recognition repository storing associations of object names with feature vectors and with a number of dimensionality reduction rounds;
  
  an analysis module adapted to;
  
  for each of a plurality of object names form the object name repository, automatically select a plurality of visual content items from the visual content repository, extract feature vectors from the visual content items, and perform a number of dimensionality reduction rounds on the feature vectors, each round producing reduced feature vectors as input for the next round, thereby producing multiple sets of reduced feature vectors for each object name;
  
  for each object name, perform consistency learning on the sets of reduced feature vectors, until one of the sets of reduced feature vectors for the object name has a minimum measure of similarity to the other feature vectors associated with the object name; and
  
  store as the classification model for each object name, the set of reduced feature vectors which have the minimum measure of similarity.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The system of claim 15, wherein the number of dimensionality reductions performed on the feature vectors for an object name in order to reach the minimum measure of similarity varies with respect to different ones of the object names.
  - 17. The system of claim 15, wherein the classification model for an object name includes an indicator of the number of dimensionality reduction rounds performed on its learned feature vectors.
  - 18. The system of claim 15, wherein the plurality of visual content items are automatically selected from the visual content repository based at least in part on a relationship between the object name and the textual descriptions of the visual content items.
  - 19. The system of claim 15, the analysis module further adapted to:
    - receive a visual content item for which no classification model has yet been stored;
      
      perform a plurality of dimensionality reduction and consistency learning rounds on the received visual content item, each round resulting in a set of feature vectors associated with the visual content item;
      
      identify object names of the plurality of object names having classification models similar to the feature vectors associated with the received visual content item;
      
      produce probabilities that the received visual content item contains visual representations corresponding to the object names; and
      
      store the probabilities in the recognition repository in association with their respective object names and with the received visual content item.
  - 20. The system of claim 19, wherein producing probabilities that the received visual content item contains visual representations corresponding to the object names comprises:
    - for each object name of the plurality of identified object names;
      
      identifying the object name'"'"'s learned feature vectors that correspond to the indicator of the number of dimensionality reductions for the object name;
      
      identifying the received visual content item'"'"'s feature vectors that correspond to the indicator of the number of dimensionality reductions for the object name; and
      
      comparing the identified feature vectors for the object name to the identified feature vectors for the received visual content item, thereby producing a probability that the received visual content item contains a visual representation corresponding to the object name.
  - 21. The system of claim 19, further comprising an object request module that determines, for an object name, a plurality of visual content items in the visual content repository having the highest probabilities of containing a visual representation of the object name, the determining based at least in part on the probabilities of the recognition repository.
  - 22. The system of claim 19, the actions of the analysis module further comprising:
    - identifying a plurality of object names having the highest probabilities of having a visual representation within a first visual content item in the visual content repository; and
      
      revising a list of labels within metadata associated with the first visual content item, based at least in part on the identified plurality of object names.
  - 23. The system of claim 15, wherein a classification model is considered to have the minimum measure of similarity to the feature vectors associated with the received visual content item if the classification model'"'"'s feature vectors have been stored in the same cluster of feature vectors as the feature vectors associated with the received visual content item, according to a feature vector clustering algorithm.
  - 24. The system of claim 15, wherein the object name repository comprises at least 50,000 object names.
  - 25. The system of claim 15, the actions of the analysis module further comprising extracting the plurality of object names from one of a group consisting of a lexical database and a search engine index.
  - 26. The system of claim 15, wherein the textual descriptions of the visual content items are related to the object names by semantic similarity.
  - 27. The system of claim 15, wherein the image textual descriptions of the visual content items literally contain the object names.
  - 28. The system of claim 15, wherein performing consistency learning comprises computing a measure of similarity for a feature vector based at least in part on comparisons between the feature vector and other feature vectors, wherein matches between the feature vector and other feature vectors for the same object name increase the score, and matches between the feature vector and feature vectors for different object names decrease the score.

29. A non-transitory computer readable storage medium storing a computer program executable by a processor for generating a classification model of visual objects present in visual content items stored in a visual content repository, each visual content item having a textual description, the actions of the computer program comprising:
- for each of a plurality of object names, automatically selecting a plurality of visual content items from the visual content repository, extracting feature vectors from the visual content items, and performing a number of dimensionality reduction rounds on the feature vectors, each round producing reduced feature vectors as input for the next round, thereby producing multiple sets of reduced feature vectors for each object name;
  
  for each object name, performing consistency learning on the sets of reduced feature vectors, until one of the sets of reduced feature vectors for the object name has a minimum measure of similarity to the other feature vectors associated with the object name; and
  
  storing as the classification model for each object name, the set of reduced feature vectors which have the minimum measure of similarity.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 30. The non-transitory computer readable storage medium of claim 29, wherein the number of dimensionality reductions performed on the feature vectors for an object name in order to reach the minimum measure of similarity varies with respect to different ones of the object names.
  - 31. The non-transitory computer readable storage medium of claim 29, wherein the classification model for an object name includes an indicator of the number of dimensionality reduction rounds performed on its learned feature vectors.
  - 32. The non-transitory computer readable storage medium of claim 29, wherein the plurality of visual content items are automatically selected from the visual content repository based at least in part on a relationship between the object name and the textual descriptions of the visual content items.
  - 33. The non-transitory computer readable storage medium of claim 29, further comprising:
    - receiving a visual content item for which no classification model has yet been stored;
      
      performing a plurality of dimensionality reduction and consistency learning rounds on the received visual content item, each round resulting in a set of feature vectors associated with the visual content item;
      
      identifying object names of the plurality of object names having classification models similar to the feature vectors associated with the received visual content item;
      
      producing probabilities that the received visual content item contains visual representations corresponding to the object names; and
      
      storing the probabilities in a recognition repository in association with their respective object names and with the received visual content item.
  - 34. The non-transitory computer readable storage medium of claim 33, wherein producing probabilities that the received visual content item contains visual representations corresponding to the object names comprises:
    - for each object name of the plurality of identified object names;
      
      identifying the object name'"'"'s learned feature vectors that correspond to the indicator of the number of dimensionality reductions for the object name;
      
      identifying the received visual content item'"'"'s feature vectors that correspond to the indicator of the number of dimensionality reductions for the object name; and
      
      comparing the identified feature vectors for the object name to the identified feature vectors for the received visual content item, thereby producing a probability that the received visual content item contains a visual representation corresponding to the object name.
  - 35. The non-transitory computer readable storage medium of claim 33, further comprising determining, for an object name, a plurality of visual content items in the visual content repository having the highest probabilities of containing a visual representation of the object name, the determining based at least in part on the probabilities of the recognition repository.
  - 36. The non-transitory computer readable storage medium of claim 33, further comprising:
    - identifying a plurality of object names having the highest probabilities of having a visual representation within a first visual content item in the visual content repository; and
      
      revising a list of labels within metadata associated with the first visual content item, based at least in part on the identified plurality of object names.
  - 37. The non-transitory computer readable storage medium of claim 29, wherein a classification model is considered to have the minimum measure of similarity to the feature vectors associated with the received visual content item if the classification model'"'"'s feature vectors have been stored in the same cluster of feature vectors as the feature vectors associated with the received visual content item, according to a feature vector clustering algorithm.
  - 38. The non-transitory computer readable storage medium of claim 29, wherein the set of object names comprises at least 50,000 entries.
  - 39. The non-transitory computer readable storage medium of claim 29, further comprising extracting the plurality of object names from one of a group consisting of a lexical database and a search engine index.
  - 40. The non-transitory computer readable storage medium of claim 29, wherein the textual descriptions of the visual content items are related to the object names by semantic similarity.
  - 41. The non-transitory computer readable storage medium of claim 29, wherein the image textual descriptions of the visual content items literally contain the object names.
  - 42. The non-transitory computer readable storage medium of claim 29, wherein performing consistency learning comprises computing a measure of similarity for a feature vector based at least in part on comparisons between the feature vector and other feature vectors, wherein matches between the feature vector and other feature vectors for the same object name increase the score, and matches between the feature vector and feature vectors for different object names decrease the score.

43. A computer implemented method of identifying visual content items relevant to a query, the method comprising:
- storing a recognition repository having;
  
  a plurality of object names, anda plurality of associations between an object name, a visual content item, and a probability that the visual content item contains a visual representation corresponding to the object name;
  
  receiving a query comprising an object name; and
  
  identifying a plurality of visual content items having the highest probabilities of containing a visual representation of an object corresponding to the object name, based at least in part on the probabilities of the recognition repository.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Yagnik, Jay, Zhao, Ming
Primary Examiner(s)
CHAWAN, SHEELA C

Application Number

US12/364,390
Time in Patent Office

1,303 Days
Field of Search

382/181, 382/190, 382/195, 382/197, 382/224, 382/155, 382/100, 382/128, 382/191, 382/203, 382/173, 382/232, 382/236, 382/226, 382/107, 707/999, 707/E17.028, 706/12, 706/16, 706/19, 706/20, 706/45, 706/15, 375/E7.11, 375/240.14, 375/E7.026, 375/240, 348/699
US Class Current

382/224
CPC Class Codes

G06F 18/213   Feature extraction, e.g. by...

G06F 18/22   Matching criteria, e.g. pro...

G06V 10/761   Proximity, similarity or di...

Automatic large scale video object recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

88 Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic large scale video object recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

88 Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links