Natural language image tags

US 9,141,335 B2
Filed: 11/21/2012
Issued: 09/22/2015
Est. Priority Date: 09/18/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

displaying an image by a display device;

defining at least a portion of the image displayed based on a gesture, the gesture identified from one or more touch inputs detected using touchscreen functionality of the display device;

receiving a processed natural language input subsequent to displaying the image, the processed natural language input processed from audio data that is based at least on a speech input from a user;

locating one or more items in text received in the processed natural language input;

tagging the portion of the image defined by the gesture with the one or more items of the text received in the processed natural language input, the tag effective to enable identification of the portion from an entirety of the image; and

editing the portion of the image defined by the gesture and the processed natural language input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Natural language image tags are described. In one or more implementations, at least a portion of an image displayed by a display device is defined based on a gesture. The gesture is identified from one or more touch inputs detected using touchscreen functionality of the display device. Text received in a natural language input is located and used to tag the portion of the image using one or more items of the text received in the natural language input.

Citations

20 Claims

1. A method comprising:
- displaying an image by a display device;
  
  defining at least a portion of the image displayed based on a gesture, the gesture identified from one or more touch inputs detected using touchscreen functionality of the display device;
  
  receiving a processed natural language input subsequent to displaying the image, the processed natural language input processed from audio data that is based at least on a speech input from a user;
  
  locating one or more items in text received in the processed natural language input;
  
  tagging the portion of the image defined by the gesture with the one or more items of the text received in the processed natural language input, the tag effective to enable identification of the portion from an entirety of the image; and
  
  editing the portion of the image defined by the gesture and the processed natural language input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method as described in claim 1, wherein the editing the portion of the image is performed subsequent to receiving a subsequent processed natural language input without repeating performance of the gesture.
  - 3. A method as described in claim 1, wherein the gesture is formed from a series of the one or more touch inputs that define at least part of a boundary of the portion of the image.
  - 4. A method as described in claim 1, wherein the defining includes identifying a base of the image that is to be subject of further processing by an object identification module to determine a boundary of the portion.
  - 5. A method as described in claim 4, wherein the object identification module employs one or more facial recognition algorithms to determine the boundary of the portion.
  - 6. A method as described in claim 4, wherein the object identification module employs one or more algorithms to identify landmarks to determine the boundary of the portion.
  - 7. A method as described in claim 4, wherein the base is identified using a tap involved in the gesture.
  - 8. A method as described in claim 1, wherein the one or more items are identified from the text as proper names.
  - 9. A method as described in claim 1, wherein the text is received in the processed natural language input in conjunction with performance of the gesture.

10. A method comprising:
- receiving a processed natural language input converted from audio data using a speech-to-text engine, the processed natural language input processed from the audio data, the audio data based on at least a speech input from a user; and
  
  responsive to a determination that the processed natural language input includes a tag corresponding to a portion of an image, the tag effective to enable identification of the portion from an entirety of the image, and specifies one or more image editing operations;
  
  identifying the portion of the image that corresponds to the tag; and
  
  initiating performance of the one or more image editing operations on the portion of the image based on the tag and the processed natural language input.
- View Dependent Claims (11, 12, 13, 14)
- - 11. A method as described in claim 10, wherein the portion of the image is tagged responsive to a gesture identified from one or more touch inputs and another processed natural language input received from the user.
  - 12. A method as described in claim 11, wherein a boundary of the portion of the image is defined responsive to execution of an object detection algorithm.
  - 13. A method as described in claim 10, wherein the tag is a proper name assigned to the portion of the image.
  - 14. A method as described in claim 10, wherein the processed natural language input specifies a plurality of said image editing operations and the initiating is performed for the plurality of said image editing operations.

15. A system comprising:
- a speech-to-text engine configured to convert audio data captured by one or more audio-capture devices into a processed natural language input comprising text, the processed natural language input processed from the audio data, the audio data based on at least a speech input from a user;
  
  a gesture module configured to recognize a gesture from one or more touch inputs detected using one or more touch sensors, the gesture involving a portion of an image displayed by a display device, the portion comprising less than an entirety of the image;
  
  an object identification module configured to identify one or more objects in the image corresponding to the portion including a boundary of the identified one or more objects, respectively; and
  
  a natural language processing module configured to;
  
  identify a name from the processed natural language input;
  
  initiate operation of the object identification module to identify at least one said object in the image corresponding to the portion that corresponds to the name; and
  
  tag the identified object in the image corresponding to the portion using the name such that a subsequent processed natural language input that includes the name and specifies an editing operation is usable to initiate performance of the editing operation using the identified object corresponding to the portion, the tag effective to enable identification of the portion from the entirety of the image for the editing operation, the editing operation performed on the portion of the image based on the tag and the subsequent processed natural language input.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. A system as described in claim 15, wherein the gesture is formed from a series of the one or more touch inputs that define at least part of a boundary of a portion of the image, the portion including the at least one said object.
  - 17. A system as described in claim 15, wherein the gesture identifies a base of the image that is to be subject of the operation of the object identification module to identify the at least one said object.
  - 18. A system as described in claim 15, wherein the object identification module is configured to employ one or more facial recognition algorithms to determine the boundary.
  - 19. A system as described in claim 15, wherein the object identification module is configured to employ one or more algorithms to identify landmarks to determine the boundary.
  - 20. A system as described in claim 15, wherein the processed natural language input comprises text received from a user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Wilensky, Gregg D., Chang, Walter W., Dontcheva, Lubomira A., Laput, Gierad P., Agarwala, Aseem O.
Primary Examiner(s)
Pappas, Claire X
Assistant Examiner(s)
ADEDIRAN, ABDUL -SAMAD A

Application Number

US13/683,466
Publication Number

US 20140078076A1
Time in Patent Office

1,035 Days
Field of Search

345/173, 345/179, 345/619, 715/233, 715/234, 715/255, 715/721, 463/33, 434/98
US Class Current

1/1
CPC Class Codes

G06F 16/5866   using information manually ...

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/04845   for image manipulation, e.g...

G06F 3/04883   for inputting data by handw...

G06F 3/167   Audio in a user interface, ...

G06F 40/00   Handling natural language d...

G06T 11/60   Editing figures and text; C...

Natural language image tags

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Natural language image tags

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links