Autocaptioning of images

US 9,317,531 B2
Filed: 10/18/2012
Issued: 04/19/2016
Est. Priority Date: 10/18/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A system, comprising:

a set of information modules, individual information modules configured to operate on an image or metadata associated with the image, the set of information modules including;

a scene analysis module configured to identify a scenario of the image, the scenario involving a human and a non-human object, anda proxemics module configured to receive the scenario identified by the scene analysis module and utilize the scenario to identify a relative relationship between the human and the non-human object; and

,a set of sentence generation modules, individual sentence generation modules configured to produce a sentence caption for the image that reflects the scenario identified by the scene analysis module and the relative relationship between the human and the non-human object identified by the proxemics module; and

,a processing device that executes computer-executable instructions associated with at least the set of sentence generation modules.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The description relates to sentence autocaptioning of images. One example can include a set of information modules and a set of sentence generation modules. The set of information modules can include individual information modules configured to operate on an image or metadata associated with the image to produce image information. The set of sentence generation modules can include individual sentence generation modules configured to operate on the image information to produce a sentence caption for the image.

46 Citations

View as Search Results

20 Claims

1. A system, comprising:
- a set of information modules, individual information modules configured to operate on an image or metadata associated with the image, the set of information modules including;
  
  a scene analysis module configured to identify a scenario of the image, the scenario involving a human and a non-human object, anda proxemics module configured to receive the scenario identified by the scene analysis module and utilize the scenario to identify a relative relationship between the human and the non-human object; and
  
  ,a set of sentence generation modules, individual sentence generation modules configured to produce a sentence caption for the image that reflects the scenario identified by the scene analysis module and the relative relationship between the human and the non-human object identified by the proxemics module; and
  
  ,a processing device that executes computer-executable instructions associated with at least the set of sentence generation modules.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the set of information modules further includes a face recognition module, a scene analysis module, a GPS-to-location module, and a time stamp module.
  - 3. The system of claim 1, wherein the scenario identified by the scene analysis module includes locations of the human and the non-human object, and the proxemics module is further configured to:
    - determine orientations of the human and the non-human object based on the locations, the orientations comprising the relative relationship between the human and the non-human object.
  - 4. The system of claim 1, wherein the image is a video frame from a video and wherein the set of sentence generation modules is configured to consider sentence captions generated for other video frames from the video to produce the sentence caption for the image.
  - 5. The system of claim 1, further comprising an information fuser configured to:
    - receive image information from the individual information modules, the image information including the scenario and the relative relationship; and
      
      evaluate the image information and provide the evaluated image information to the set of sentence generation modules.
  - 6. The system of claim 1, further comprising an evaluator configured to receive a sentence from each of the individual sentence generation modules and to select the sentence caption for the image from the received sentences based upon context provided by related images.
  - 7. The system of claim 6, wherein the evaluator is further configured to receive a sentence caption selection from a user, and to use a template of the sentence caption selection as a negative parameter to influence selection of subsequent templates.

8. A computer-readable storage media having instructions stored thereon that when executed by a computing device cause the computing device to perform acts, comprising:
- obtaining an image comprising image data and associated metadata;
  
  producing information about the image using the image data and the associated metadata;
  
  receive a label from a user, the label corresponding to an individual non-human element that is visible in the image;
  
  automatically generating multiple sentence captions or sentence fragment captions for the image from the information and the label of the corresponding individual non-human element in the image;
  
  presenting a display of the multiple sentence captions or the sentence fragment captions for the user; and
  
  ,utilizing a user selection of an individual sentence caption or sentence fragment caption to automatically generate a subsequent sentence caption for a subsequent image.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer-readable storage media of claim 8, wherein the obtaining comprises capturing the image or where the obtaining comprises obtaining the image from a device that captured the image.
  - 10. The computer-readable storage media of claim 9, wherein the device comprises the computing device.
  - 11. The computer-readable storage media of claim 8, wherein the producing comprises evaluating pixel data of the image, metadata related to the image, and other data, and wherein the other data relates to other elements manually labeled by the user in other images.
  - 12. The computer-readable storage media of claim 8, wherein the image comprises a video frame and wherein the information conveys a temporal relationship of the video frame to other video frames.
  - 13. The computer-readable storage media of claim 8, wherein the device is not the computing device.

14. A computing device, comprising:
- an image sensor configured to capture an image comprising image data;
  
  a processor configured to associate metadata with the image;
  
  an information fuser configured to;
  
  determine weighted reliabilities of portions of the metadata, the weighted reliabilities being particular to the image, andfilter the metadata for the image based on the weighted reliabilities that are particular to the image;
  
  a set of sentence generation modules configured to generate sentences for the image from at least some of the image data and the filtered metadata;
  
  an evaluator configured to evaluate the sentences generated by the set of sentence generation modules and to select an individual sentence as a sentence caption for the image; and
  
  ,a display configured to present the image and the sentence caption.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computing device of claim 14, further comprising a set of information modules configured to provide the metadata to the information fuser, wherein the weighted reliabilities are associated with individual information modules that provide the corresponding portions of the metadata.
  - 16. The computing device of claim 14, wherein the evaluator is configured to select the individual sentence based on sentence styles of the sentences.
  - 17. The computing device of claim 14, wherein the evaluator is configured to select the individual sentence by comparing the sentences to a threshold.
  - 18. The computing device of claim 14, wherein the evaluator is further configured to provide feedback to the set of sentence generation modules regarding the selected individual sentence.
  - 19. The computing device of claim 14, further comprising an individual sentence generation module configured to generate at least one of the sentences based upon user defined preference and user feedback.
  - 20. The computing device of claim 14, wherein the evaluator is configured to select the individual sentence based on previously selected sentence captions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Baker, Simon, Ramnath, Krishnan
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US13/654,419
Publication Number

US 20140114643A1
Time in Patent Office

1,279 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/58   Retrieval characterised by ...

G06F 16/78   Retrieval characterised by ...

G06F 40/169   Annotation, e.g. comment da...

G06T 11/60   Editing figures and text; C...

G06T 7/70   Determining position or ori...

Autocaptioning of images

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

46 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Autocaptioning of images

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links