System and method for initiating multi-modal speech recognition using a long-touch gesture

US 10,276,158 B2
Filed: 10/31/2014
Issued: 04/30/2019
Est. Priority Date: 10/31/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and

when the single touch on the display has a duration longer than a threshold duration;

identifying, based at least in part on a pronoun in the speech, a first set of coordinates having a first meaning for a first object;

identifying, based at least in part on the pronoun in the speech, a second set of coordinates having a second meaning for a second object;

associating the first object and the second object with the pronoun in the speech, to yield an association; and

performing an action based on the speech and the association.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer-readable storage devices are disclosed for multi-modal interactions with a system via a long-touch gesture on a touch-sensitive display. A system operating per this disclosure can receive a multi-modal input comprising speech and a touch on a display, wherein the speech comprises a pronoun. When the touch on the display has a duration longer than a threshold duration, the system can identify an object within a threshold distance of the touch, associate the object with the pronoun in the speech, to yield an association, and perform an action based on the speech and the association.

Citations

15 Claims

1. A method comprising:
- receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and
  
  when the single touch on the display has a duration longer than a threshold duration;
  
  identifying, based at least in part on a pronoun in the speech, a first set of coordinates having a first meaning for a first object;
  
  identifying, based at least in part on the pronoun in the speech, a second set of coordinates having a second meaning for a second object;
  
  associating the first object and the second object with the pronoun in the speech, to yield an association; and
  
  performing an action based on the speech and the association.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the pronoun comprises one of I, you, he, she, her, him, they, them, their, my, me, it, we, who, us, what, which, whose, whom, himself, herself, itself, myself, someone, anybody, anyone, ours, this, some, none, whichever, those, that, these, neither, nothing, one, each, everyone, everybody, everything, all, some, and most.
  - 3. The method of claim 1, wherein the pronoun is implied in the speech.
  - 4. The method of claim 1, wherein the threshold duration is based on a context for the single touch on the display.
  - 5. The method of claim 1, wherein the threshold duration is based on a recognition certainty of a command recognized in the speech.
  - 6. The method of claim 1, wherein the speech of the multi-modal input is received simultaneously with initiation of the single touch on the display.
  - 7. The method of claim 1, wherein the speech of the multi-modal input is received after a duration of the single touch on the display is determined to meet a long touch threshold.
  - 8. The method of claim 1, wherein the speech of the multi-modal input is received after a duration of the single touch on the display is determined to meet a press and hold threshold.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and
  
  when the single touch on the display has a duration longer than a threshold duration;
  
  identifying, based at least in part on a pronoun in the speech, a first set of coordinates having a first meaning for a first object;
  
  identifying, based at least in part on the pronoun in the speech, a second set of coordinates having a second meaning for a second object;
  
  associating the first object and the second object with the pronoun in the speech, to yield an association; and
  
  performing an action based on the speech and the association.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, wherein the threshold duration is based on a context for the single touch on the display.
  - 11. The system of claim 9, wherein the threshold duration is based on a recognition certainty of a command recognized in the speech.
  - 12. The system of claim 9, wherein the speech of the multi-modal input is received simultaneously with initiation of the single touch on the display.
  - 13. The system of claim 9, wherein the speech of the multi-modal input is received after a duration of the single touch on the display is determined to meet a long touch threshold.
  - 14. The system of claim 9, wherein the speech of the multi-modal input is received after a duration of the single touch on the display is determined to meet a press and hold threshold.

15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and
  
  when the single touch on the display has a duration longer than a threshold duration;
  
  identifying, based at least in part on the pronoun in the speech, a first set of coordinates having a first meaning for a first object;
  
  identifying, based at least in part on a pronoun in the speech, a second set of coordinates having a second meaning for a second object;
  
  associating the first object and the second object with the pronoun in the speech, to yield an association; and
  
  performing an action based on the speech and the association.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Vasilieff, Brant J., Ehlen, Patrick, Johnston, Michael J.
Primary Examiner(s)
Roberts, Shaun

Application Number

US14/529,766
Publication Number

US 20160124706A1
Time in Patent Office

1,642 Days
Field of Search

704270, 704275
US Class Current
CPC Class Codes

G06F 3/04842   Selection of displayed obje...

G06F 3/0488   using a touch-screen or dig...

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

System and method for initiating multi-modal speech recognition using a long-touch gesture

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for initiating multi-modal speech recognition using a long-touch gesture

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links