Natural Language Image Tags

US 20140078076A1
Filed: 11/21/2012
Published: 03/20/2014
Est. Priority Date: 09/18/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

defining at least a portion of an image displayed by a display device based on a gesture, the gesture identified from one or more touch inputs detected using touchscreen functionality of the display device;

locating text received in a natural language input; and

tagging the portion of the image using one or more items of the text received in the natural language input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Natural language image tags are described. In one or more implementations, at least a portion of an image displayed by a display device is defined based on a gesture. The gesture is identified from one or more touch inputs detected using touchscreen functionality of the display device. Text received in a natural language input is located and used to tag the portion of the image using one or more items of the text received in the natural language input.

Citations

20 Claims

1. A method comprising:
- defining at least a portion of an image displayed by a display device based on a gesture, the gesture identified from one or more touch inputs detected using touchscreen functionality of the display device;
  
  locating text received in a natural language input; and
  
  tagging the portion of the image using one or more items of the text received in the natural language input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method as described in claim 1, wherein the tagging of the portion of the image is usable to identify the portion in relation to performance of an operation using a subsequent natural language input without repeating performance of the gesture.
  - 3. A method as described in claim 1, wherein the gesture is formed from a series of the one or more touch inputs that define at least part of a boundary of the portion of the image.
  - 4. A method as described in claim 1, wherein the defining includes identifying a base of the image that is to be subject of further processing by an object identification module to determine a boundary of the portion.
  - 5. A method as described in claim 4, wherein the object identification module employs one or more facial recognition algorithms to determine the boundary of the portion.
  - 6. A method as described in claim 4, wherein the object identification module employs one or more algorithms to identify landmarks to determine the boundary of the portion.
  - 7. A method as described in claim 4, wherein the base is identified using a tap involved in the gesture.
  - 8. A method as described in claim 1, wherein the one or more items are identified from the text as proper names.
  - 9. A method as described in claim 1, wherein the text is received in the natural language input in conjunction with performance of the gesture.

10. A method comprising:
- receiving a natural language input converted from audio data using a speech-to-text engine; and
  
  responsive to a determination that the natural language input includes a tag and specifies one or more image editing operations;
  
  identifying at least a portion of an image that corresponds to the tag; and
  
  initiating performance of the one or more image editing operations on at least the portion of the image.
- View Dependent Claims (11, 12, 13, 14)
- - 11. A method as described in claim 10, wherein the portion of the image is tagged responsive to a gesture identified from one or more touch inputs and a natural language input received from a user.
  - 12. A method as described in claim 11, wherein a boundary of the portion of the image is defined responsive to execution of an object detection algorithm.
  - 13. A method as described in claim 10, wherein the tag is a proper name assigned to the portion of the image.
  - 14. A method as described in claim 10, wherein the natural language input specifies a plurality of said image editing operations and the initiating is performed for the plurality of said image editing operations.

15. A system comprising:
- a speech-to-text engine configured to convert audio data captured by one or more audio-capture devices into a natural language input comprising text;
  
  a gesture module configured to recognize a gesture from one or more touch inputs detected using one or more touch sensors, the gesture involving an image displayed by a display device;
  
  an object identification module configured to identify one or more objects in the image including a boundary of the identified one or more objects, respectively; and
  
  a natural language processing module configured to;
  
  identify a name from the natural language input;
  
  initiate operation of the object identification module to identify at least one said object in the image that corresponds to the name; and
  
  tag the identified object in the image using the name such that a subsequent natural language input that includes the proper and specifies an operation is usable to initiate performance of the operation using the identified object.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. A system as described in claim 15, wherein the object identification module is configured to operate to identify the at least one said object based at least in part on the gesture.
  - 17. A system as described in claim 16, wherein the gesture is formed from a series of the one or more touch inputs that define at least part of a boundary of a portion of the image, the portion including the at least one said object.
  - 18. A system as described in claim 16, wherein the gesture identifies a base of the image that is to be subject of the operation of the object identification module to identify the at least one said object.
  - 19. A system as described in claim 15, wherein the object identification module is configured to employ one or more facial recognition algorithms to determine the boundary.
  - 20. A system as described in claim 15, wherein the object identification module is configured to employ one or more algorithms to identify landmarks to determine the boundary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Wilensky, Gregg D., Chang, Walter W., Dontcheva, Lubomira A., Laput, Gierad P., Agarwala, Aseem O.

Granted Patent

US 9,141,335 B2
Time in Patent Office

Days
Field of Search
US Class Current

345/173
CPC Class Codes

G06F 16/5866   using information manually ...

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/04845   for image manipulation, e.g...

G06F 3/04883   for inputting data by handw...

G06F 3/167   Audio in a user interface, ...

G06F 40/00   Handling natural language d...

G06T 11/60   Editing figures and text; C...

Natural Language Image Tags

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Natural Language Image Tags

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links