Method and apparatus for automatically annotating images

US 8,065,313 B2
Filed: 07/24/2006
Issued: 11/22/2011
Est. Priority Date: 07/24/2006
Status: Active Grant

First Claim

Patent Images

1. A method for automatically annotating a video in a computer system, comprising:

receiving a video comprising a plurality of frames;

obtaining images contained in two or more representative frames from the video;

for each of the images,iteratively extracting image features from the image on different spatial scales, wherein the image features comprise visual characteristics associated with tiles of different sizes within the image;

matching the extracted image features to known image features;

identifying other images with similar image features using one or more combinations of the matched image features;

obtaining text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located;

identifying one or more intersecting keywords in the text associated with the other images; and

annotating the image with the intersecting keywords using the computer system;

analyzing the keywords for the images to determine a common set of keywords; and

annotating the video using the common set of keywords.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One embodiment of the present invention provides a system that automatically annotates an image. During operation, the system receives the image. Next, the system extracts image features from the image. The system then identifies other images which have similar image features. The system next obtains text associated with the other images, and identifies intersecting keywords in the obtained text. Finally, the system annotates the image with the intersecting keywords.

Citations

38 Claims

1. A method for automatically annotating a video in a computer system, comprising:
- receiving a video comprising a plurality of frames;
  
  obtaining images contained in two or more representative frames from the video;
  
  for each of the images,iteratively extracting image features from the image on different spatial scales, wherein the image features comprise visual characteristics associated with tiles of different sizes within the image;
  
  matching the extracted image features to known image features;
  
  identifying other images with similar image features using one or more combinations of the matched image features;
  
  obtaining text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located;
  
  identifying one or more intersecting keywords in the text associated with the other images; and
  
  annotating the image with the intersecting keywords using the computer system;
  
  analyzing the keywords for the images to determine a common set of keywords; and
  
  annotating the video using the common set of keywords.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein iteratively extracting the image features from the image involves:
    - partitioning the image into the tiles of different sizes; and
      
      extracting the image features from the tiles.
  - 3. The method of claim 2, wherein for each given tile size in the different sizes, partitioning the image into the tiles involves partitioning the image into tiles of the given tile size.
  - 4. The method of claim 1, further comprising combining the matched image features to form one or more image-feature combinations for the image.
  - 5. The method of claim 4, wherein identifying the other images with the similar image features involves identifying similar image-feature combinations in the other images.
  - 6. The method of claim 1, wherein iteratively extracting the image features from the image involves one or more of:
    - generating color histograms;
      
      generating orientation histograms;
      
      using a direct cosine transform (DCT) technique;
      
      using a principal component analysis (PCA) technique; and
      
      using a Gabor wavelet technique.
  - 7. The method of claim 1, wherein the image features are defined in one or more of the following terms:
    - shapes;
      
      colors; and
      
      textures.
  - 8. The method of claim 1, wherein identifying the other images with the similar image features involves searching through images on Internet.
  - 9. The method of claim 1, wherein identifying the other images with the similar image features involves using probability models.

10. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for automatically annotating a video, the method comprising:
- receiving a video comprising a plurality of frames;
  
  obtaining images contained in two or more representative frames from the video;
  
  for each of the images,iteratively extracting image features from the image on different spatial scales, wherein the image features comprise visual characteristics associated with tiles of different sizes within the image;
  
  matching the extracted image features to known image features;
  
  identifying other images with similar image features using one or more combinations of the matched image features;
  
  obtaining text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located;
  
  identifying one or more intersecting keywords in the text associated with the other images; and
  
  annotating the image with the intersecting keywords;
  
  analyzing the keywords for the images to determine a common set of keywords; and
  
  annotating the video using the common set of keywords.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The non-transitory computer-readable storage medium of claim 10, wherein iteratively extracting the image features from the image involves:
    - partitioning the image into the tiles of different sizes; and
      
      extracting the image features from the tiles.
  - 12. The non-transitory computer-readable storage medium of claim 11, wherein for each given tile size in the different sizes, partitioning the image into the tiles involves partitioning the image into tiles of the given tile size.
  - 13. The non-transitory computer-readable storage medium of claim 10, wherein the method further comprises combining the matched image features to form one or more image-feature combinations for the image.
  - 14. The non-transitory computer-readable storage medium of claim 13, wherein identifying the other images with the similar image features involves identifying similar image-feature combinations in the other images.
  - 15. The non-transitory computer-readable storage medium of claim 10, wherein iteratively extracting the image features from the image involves one or more of:
    - generating color histograms;
      
      generating orientation histograms;
      
      using a direct cosine transform (DCT) technique;
      
      using a principal component analysis (PCA) technique; and
      
      using a Gabor wavelet technique.
  - 16. The non-transitory computer-readable storage medium of claim 10, wherein the image features are defined in one or more of the following terms:
    - shapes;
      
      colors; and
      
      textures.
  - 17. The non-transitory computer-readable storage medium of claim 10, wherein identifying the other images with the similar image features involves searching through images on Internet.
  - 18. The non-transitory computer-readable storage medium of claim 10, wherein identifying the other images with the similar image features involves using probability models.

19. A computer system that automatically annotates an image, comprising:
- a processor;
  
  a memory;
  
  an obtaining mechanism configured to obtain images contained in two or more representative frames from a video that comprises a plurality of frames;
  
  wherein the computer system is configured to process each of the images obtained from the representative frames using the following mechanisms;
  
  an extraction mechanism configured to iteratively extract image features from a image on different spatial scales, wherein the image features comprise visual characteristics associated with different sizes within the image;
  
  a matching mechanism configured to match the extracted image features to known image features;
  
  an identification mechanism configured to;
  
  identify other images with similar image features using one or more combinations of the matched image features; and
  
  obtain text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located;
  
  identify one or more intersecting keywords in the text associated with the other images; and
  
  an annotation mechanism configured to annotate the image with the intersecting keywords;
  
  an analysis mechanism configured to analyze the keywords for the images to determine a common set of keywords, wherein the annotation mechanism is configured to annotate the video using the common set of keywords.
- View Dependent Claims (20, 21, 22)
- - 20. The computer system of claim 19, wherein the extraction mechanism is configured to:
    - partition the image into the tiles of different sizes; and
      
      toextract the image features from the tiles.
  - 21. The computer system of claim 19, wherein the matching mechanism is configured to combine the matched image features to form one or more image-feature combinations for the image.
  - 22. The computer system of claim 21, wherein the identification mechanism is configured to identify similar image-feature combinations in the other images.

23. A method for automatically annotating a composite visual medium in a computer system, comprising:
- annotating two or more visual mediums from within a composite visual medium that comprises a plurality of visual mediums by, for each of the visual mediums;
  
  iteratively extracting features from the visual medium on different spatial scales, wherein the features comprise visual characteristics associated with tiles of different sizes within the visual medium;
  
  matching the extracted features to known features;
  
  identifying other visual media with similar features using one or more combinations of the matched features;
  
  obtaining text associated with the other visual media, wherein obtaining the text associated with the other visual media comprises obtaining text that surrounds each visual media in a web page in which the visual media is located;
  
  identifying one or more intersecting keywords in the text associated with the other visual media; and
  
  annotating the visual medium with the intersecting keywords using the computer system;
  
  analyzing the keywords for the two or more visual mediums to determine a common set of keywords; and
  
  annotating the composite visual medium using the common set of keywords.
- View Dependent Claims (24, 25, 26, 27, 28, 29)
- - 24. The method of claim 23, wherein iteratively extracting the features from the visual medium involves:
    - partitioning an image within the visual medium into tiles of different sizes; and
      
      extracting image features from the tiles.
  - 25. The method of claim 24, wherein for each given tile size in the different sizes, partitioning the image into the tiles involves partitioning the image into the tiles of the given tile size.
  - 26. The method of claim 23, wherein the method further comprises combining the matched features to form one or more feature combinations for the visual medium.
  - 27. The method of claim 26, wherein identifying the other visual media with similar features involves identifying the one or more feature combinations in the other visual media.
  - 28. The method of claim 23, wherein iteratively extracting the features from the visual medium involves one of more of:
    - generating color histograms;
      
      generating orientation histograms;
      
      using a direct cosine transform (DCT) technique;
      
      using a principal component analysis (PCA) technique; and
      
      using a Gabor wavelet technique.
  - 29. The method of claim 23, wherein the features are defined in one or more of the following terms:
    - shapes;
      
      colors; and
      
      textures.

30. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for automatically annotating a composite visual medium, the method comprising:
- annotating two or more visual mediums from within a composite visual medium that comprises a plurality of visual mediums by, for each of the visual mediums;
  
  iteratively extracting features from a received the visual medium on different spatial scales, wherein the features comprise visual characteristics associated with tiles of different sizes in an image within the visual medium;
  
  matching the extracted features to known features;
  
  identifying other visual media with similar features using one or more combinations of the matched image features;
  
  obtaining text associated with the other visual media, wherein obtaining the text associated with the other visual media comprises obtaining text that surrounds each visual media in a web page in which the visual media is located;
  
  identifying one or more intersecting keywords in the text associated with the other visual media; and
  
  annotating the visual medium with the intersecting keywords;
  
  analyzing the keywords for the two or more visual mediums to determine a common set of keywords; and
  
  annotating the composite visual medium using the common set of keywords.

31. A method for automatically annotating a video in a computer system, comprising:
- iteratively extracting video features from a received video on different spatial scales, wherein the video features comprise visual characteristics associated with the tiles of different sizes in an image within the video;
  
  matching the extracted video features to known video features;
  
  identifying other videos with similar video features to the extracted video features using one or more combinations of the matched image features;
  
  obtaining text associated with the other videos, wherein obtaining the text associated with the other videos comprises obtaining text that surrounds each video in a web page in which the video is located;
  
  identifying one or more intersecting keywords in the text associated with the other videos; and
  
  annotating the received video with the intersecting keywords using the computer system.
- View Dependent Claims (32, 33, 34, 35, 36, 37)
- - 32. The method of claim 31, wherein iteratively extracting the video features from the received video involves:
    - partitioning the image within the video into the tiles of different sizes; and
      
      extracting the image features from the tiles.
  - 33. The method of claim 32, wherein for each given tile size in the different sizes, partitioning the image within the video into the tiles involves partitioning the image within the video into the tiles of the given tile size.
  - 34. The method of claim 31, wherein the method further comprises combining the matched video features to form one or more video-feature combinations for the received video.
  - 35. The method of claim 34, wherein identifying the other videos with the similar video features involves identifying similar video-feature combinations in the other videos.
  - 36. The method of claim 31, wherein iteratively extracting the video features from the received video involves one or more of:
    - generating color histograms;
      
      generating orientation histograms;
      
      using a direct cosine transform (DCT) technique;
      
      using a principal component analysis (PCA) technique; and
      
      using a Gabor wavelet technique.
  - 37. The method of claim 31, wherein the video features are defined in one or more of the following terms:
    - shapes;
      
      colors; and
      
      textures.

38. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for automatically annotating a video, the method comprising:
- iteratively extracting video features from a received video on different spatial scales, wherein the video features comprise visual characteristics associated with tiles of different sizes in an image within the video;
  
  matching the extracted video features to known video features;
  
  identifying other videos with similar video features to the extracted video features using one or more combinations of the matched image features;
  
  obtaining text associated with the other videos, wherein obtaining the text associated with the other videos comprises obtaining text that surrounds each video in a web page in which the video is located;
  
  identifying one or more intersecting keywords in the text associated with the other videos; and
  
  annotating the received video with the intersecting keywords.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Yagnik, Jay N.
Primary Examiner(s)
Ali; Mohammad
Assistant Examiner(s)
Corbo; Griselle

Application Number

US11/492,485
Publication Number

US 20080021928A1
Time in Patent Office

1,947 Days
Field of Search

707/104.1
US Class Current

707/758
CPC Class Codes

G06F 40/169   Annotation, e.g. comment da...

G06V 20/10   Terrestrial scenes scenes u...

G06V 20/70   Labelling scene content, e....

Method and apparatus for automatically annotating images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for automatically annotating images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links