Statistical approach to large-scale image annotation

US 8,150,170 B2
Filed: 05/30/2008
Issued: 04/03/2012
Est. Priority Date: 05/30/2008
Status: Active Grant

First Claim

Patent Images

1. A method of annotating an image comprising:

extracting and indexing both visual features and textual information from a plurality of images and a two step probabilistic modeling technique comprising identifying words from the textual information as candidate annotations and annotating the image with the candidate annotations that have the highest average conditional probabilities;

hashing the plurality of visual features;

clustering the plurality of images based at least in part on hash values derived from the hashing, the clustering creating clustered images;

building one or more statistical language models based at least in part on the visual features and the textual information of the clustered images; and

annotating the image using one or more of the statistical language models.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Statistical approaches to large-scale image annotation are described. Generally, the annotation technique includes compiling visual features and textual information from a number of images, hashing the images visual features, and clustering the images based on their hash values. An example system builds statistical language models from the clustered images and annotates the image by applying one of the statistical language models.

Citations

16 Claims

1. A method of annotating an image comprising:
- extracting and indexing both visual features and textual information from a plurality of images and a two step probabilistic modeling technique comprising identifying words from the textual information as candidate annotations and annotating the image with the candidate annotations that have the highest average conditional probabilities;
  
  hashing the plurality of visual features;
  
  clustering the plurality of images based at least in part on hash values derived from the hashing, the clustering creating clustered images;
  
  building one or more statistical language models based at least in part on the visual features and the textual information of the clustered images; and
  
  annotating the image using one or more of the statistical language models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method of annotating an image as recited in claim 1, wherein the plurality of images are gathered by crawling one or more large-scale image databases.
  - 3. A method of annotating an image as recited in claim 1, wherein hashing the plurality of visual features comprises a vector quantization process in which the visual features are transformed into a binary string.
  - 4. A method of annotating an image as recited in claim 1, wherein the images with the same hash code are grouped into clusters.
  - 5. A method of annotating an image as recited in claim 1, wherein the one or more statistical language models is a unigram model that calculates a probability that a word is associated with the image based at least in part on (1) a visual similarity between the image and the clustered images and (2) a prior probability of the clustered images.
  - 6. A method of annotating an image as recited in claim 1, wherein the one or more statistical language models is a bigram model that calculates an average conditional probability that a second word is associated with the clustered images given a first word already associated with the clustered images.
  - 7. A method of annotating an image as recited in claim 1, further comprising extracting visual information from the plurality of images by using a gray block methodology.
  - 8. A method of annotating an image as recited in claim 7, wherein the gray block methodology comprises:
    - partitioning the image into equal size blocks,measuring an average luminescence for each block, andrepresenting the image as a vector.
  - 9. A method of annotating an image as recited in claim 7, further comprising reducing the visual information of the plurality of images by employing a projection matrix.

10. A computer readable storage medium comprising computer executable instructions that when executed by a processor perform a method comprising:
- crawling a large-scale image database to gather a plurality of images;
  
  compiling visual features and textual information from the plurality of images;
  
  extracting visual information from the plurality of images by using a gray block methodology;
  
  reducing the visual information by employing a projection matrix;
  
  hashing the reduced visual information, and clustering the plurality of images based on a hash value;
  
  building one or more statistical language models based on the clustered images; and
  
  annotating a query image using one or more of the statistical language models comprising a bigram model that calculates an average conditional probability that a second word is associated with the clustered images given a first word already associated with the clustered images.
- View Dependent Claims (11, 12, 13)
- - 11. A computer readable storage medium as recited in claim 10, wherein hashing the reduced visual information comprises a vector quantization process in which the visual features are transformed into a binary string.
  - 12. A computer readable storage medium as recited in claim 10, wherein the images with the same hash code are grouped into clusters.
  - 13. A computer readable storage medium as recited in claim 10, wherein the query image is previously associated with textual information, and the image is annotated by a two step probabilistic modeling technique comprising:
    - identifying words from the textual information as candidate annotations; and
      
      annotating the image with the candidate annotations that have the highest average conditional probabilities.

14. A computer readable storage medium comprising:
- a digital image;
  
  a textual annotation corresponding to the digital image; and
  
  executable instructions that when executed by a processor, associate the textual annotation with the digital image by;
  
  compiling visual features and textual information from a plurality of images;
  
  extracting visual information from the plurality of images by using a gray block methodology;
  
  hashing the plurality of visual features, wherein hashing the reduced visual information comprises a vector quantization process in which the visual features are transformed into a binary string;
  
  clustering the plurality of images based on the hash value;
  
  building one or more statistical language models based on the clustered images; and
  
  annotating the image using one or more of the statistical language models comprising a unigram model that calculates a probability that a word is associated with the image based at least in part on (1) a visual similarity between the image and the clustered images and (2) a prior probability of the clustered images.
- View Dependent Claims (15, 16)
- - 15. A computer readable storage medium as recited in claim 14, wherein the plurality of images are gathered by crawling one or more large-scale image databases.
  - 16. A computer readable storage medium as recited in claim 14, wherein the annotating comprises a two step probabilistic modeling technique including:
    - identifying words from the textual information as candidate annotations; and
      
      annotating the image with the candidate annotations that have the highest average conditional probabilities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Mingjing, Rui, Xiaoguang
Primary Examiner(s)
Wu, Jingge

Application Number

US12/130,943
Publication Number

US 20090297050A1
Time in Patent Office

1,404 Days
Field of Search

382224-231
US Class Current

382/230
CPC Class Codes

G06V 20/35   Categorising the entire sce...

G06V 20/70   Labelling scene content, e....

G06V 2201/10   Recognition assisted with m...

Statistical approach to large-scale image annotation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Statistical approach to large-scale image annotation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links