Method and apparatus for automatically annotating images
First Claim
Patent Images
1. A method for automatically annotating a video in a computer system, comprising:
- receiving a video comprising a plurality of frames;
obtaining images contained in two or more representative frames from the video;
for each of the images,iteratively extracting image features from the image on different spatial scales, wherein the image features comprise visual characteristics associated with tiles of different sizes within the image;
matching the extracted image features to known image features;
identifying other images with similar image features using one or more combinations of the matched image features;
obtaining text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located;
identifying one or more intersecting keywords in the text associated with the other images; and
annotating the image with the intersecting keywords using the computer system;
analyzing the keywords for the images to determine a common set of keywords; and
annotating the video using the common set of keywords.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that automatically annotates an image. During operation, the system receives the image. Next, the system extracts image features from the image. The system then identifies other images which have similar image features. The system next obtains text associated with the other images, and identifies intersecting keywords in the obtained text. Finally, the system annotates the image with the intersecting keywords.
-
Citations
38 Claims
-
1. A method for automatically annotating a video in a computer system, comprising:
-
receiving a video comprising a plurality of frames; obtaining images contained in two or more representative frames from the video; for each of the images, iteratively extracting image features from the image on different spatial scales, wherein the image features comprise visual characteristics associated with tiles of different sizes within the image; matching the extracted image features to known image features; identifying other images with similar image features using one or more combinations of the matched image features; obtaining text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located; identifying one or more intersecting keywords in the text associated with the other images; and annotating the image with the intersecting keywords using the computer system; analyzing the keywords for the images to determine a common set of keywords; and annotating the video using the common set of keywords. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for automatically annotating a video, the method comprising:
-
receiving a video comprising a plurality of frames; obtaining images contained in two or more representative frames from the video; for each of the images, iteratively extracting image features from the image on different spatial scales, wherein the image features comprise visual characteristics associated with tiles of different sizes within the image; matching the extracted image features to known image features; identifying other images with similar image features using one or more combinations of the matched image features; obtaining text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located; identifying one or more intersecting keywords in the text associated with the other images; and annotating the image with the intersecting keywords; analyzing the keywords for the images to determine a common set of keywords; and annotating the video using the common set of keywords. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer system that automatically annotates an image, comprising:
-
a processor; a memory; an obtaining mechanism configured to obtain images contained in two or more representative frames from a video that comprises a plurality of frames; wherein the computer system is configured to process each of the images obtained from the representative frames using the following mechanisms; an extraction mechanism configured to iteratively extract image features from a image on different spatial scales, wherein the image features comprise visual characteristics associated with different sizes within the image; a matching mechanism configured to match the extracted image features to known image features; an identification mechanism configured to; identify other images with similar image features using one or more combinations of the matched image features; and obtain text associated with the other images, wherein obtaining the text associated with the other images comprises obtaining text that surrounds each image in a web page in which the image is located; identify one or more intersecting keywords in the text associated with the other images; and an annotation mechanism configured to annotate the image with the intersecting keywords; an analysis mechanism configured to analyze the keywords for the images to determine a common set of keywords, wherein the annotation mechanism is configured to annotate the video using the common set of keywords. - View Dependent Claims (20, 21, 22)
-
-
23. A method for automatically annotating a composite visual medium in a computer system, comprising:
-
annotating two or more visual mediums from within a composite visual medium that comprises a plurality of visual mediums by, for each of the visual mediums; iteratively extracting features from the visual medium on different spatial scales, wherein the features comprise visual characteristics associated with tiles of different sizes within the visual medium; matching the extracted features to known features; identifying other visual media with similar features using one or more combinations of the matched features; obtaining text associated with the other visual media, wherein obtaining the text associated with the other visual media comprises obtaining text that surrounds each visual media in a web page in which the visual media is located; identifying one or more intersecting keywords in the text associated with the other visual media; and annotating the visual medium with the intersecting keywords using the computer system; analyzing the keywords for the two or more visual mediums to determine a common set of keywords; and annotating the composite visual medium using the common set of keywords. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for automatically annotating a composite visual medium, the method comprising:
-
annotating two or more visual mediums from within a composite visual medium that comprises a plurality of visual mediums by, for each of the visual mediums; iteratively extracting features from a received the visual medium on different spatial scales, wherein the features comprise visual characteristics associated with tiles of different sizes in an image within the visual medium; matching the extracted features to known features; identifying other visual media with similar features using one or more combinations of the matched image features; obtaining text associated with the other visual media, wherein obtaining the text associated with the other visual media comprises obtaining text that surrounds each visual media in a web page in which the visual media is located; identifying one or more intersecting keywords in the text associated with the other visual media; and annotating the visual medium with the intersecting keywords; analyzing the keywords for the two or more visual mediums to determine a common set of keywords; and annotating the composite visual medium using the common set of keywords.
-
-
31. A method for automatically annotating a video in a computer system, comprising:
-
iteratively extracting video features from a received video on different spatial scales, wherein the video features comprise visual characteristics associated with the tiles of different sizes in an image within the video; matching the extracted video features to known video features; identifying other videos with similar video features to the extracted video features using one or more combinations of the matched image features; obtaining text associated with the other videos, wherein obtaining the text associated with the other videos comprises obtaining text that surrounds each video in a web page in which the video is located; identifying one or more intersecting keywords in the text associated with the other videos; and annotating the received video with the intersecting keywords using the computer system. - View Dependent Claims (32, 33, 34, 35, 36, 37)
-
-
38. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for automatically annotating a video, the method comprising:
-
iteratively extracting video features from a received video on different spatial scales, wherein the video features comprise visual characteristics associated with tiles of different sizes in an image within the video; matching the extracted video features to known video features; identifying other videos with similar video features to the extracted video features using one or more combinations of the matched image features; obtaining text associated with the other videos, wherein obtaining the text associated with the other videos comprises obtaining text that surrounds each video in a web page in which the video is located; identifying one or more intersecting keywords in the text associated with the other videos; and annotating the received video with the intersecting keywords.
-
Specification