System and method for initiating multi-modal speech recognition using a long-touch gesture
First Claim
Patent Images
1. A method comprising:
- receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and
when the single touch on the display has a duration longer than a threshold duration;
identifying, based at least in part on a pronoun in the speech, a first set of coordinates having a first meaning for a first object;
identifying, based at least in part on the pronoun in the speech, a second set of coordinates having a second meaning for a second object;
associating the first object and the second object with the pronoun in the speech, to yield an association; and
performing an action based on the speech and the association.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer-readable storage devices are disclosed for multi-modal interactions with a system via a long-touch gesture on a touch-sensitive display. A system operating per this disclosure can receive a multi-modal input comprising speech and a touch on a display, wherein the speech comprises a pronoun. When the touch on the display has a duration longer than a threshold duration, the system can identify an object within a threshold distance of the touch, associate the object with the pronoun in the speech, to yield an association, and perform an action based on the speech and the association.
-
Citations
15 Claims
-
1. A method comprising:
-
receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and when the single touch on the display has a duration longer than a threshold duration; identifying, based at least in part on a pronoun in the speech, a first set of coordinates having a first meaning for a first object; identifying, based at least in part on the pronoun in the speech, a second set of coordinates having a second meaning for a second object; associating the first object and the second object with the pronoun in the speech, to yield an association; and performing an action based on the speech and the association. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and when the single touch on the display has a duration longer than a threshold duration; identifying, based at least in part on a pronoun in the speech, a first set of coordinates having a first meaning for a first object; identifying, based at least in part on the pronoun in the speech, a second set of coordinates having a second meaning for a second object; associating the first object and the second object with the pronoun in the speech, to yield an association; and performing an action based on the speech and the association. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving a multi-modal input comprising speech and a single touch on a display, the single touch being at a single point; and when the single touch on the display has a duration longer than a threshold duration; identifying, based at least in part on the pronoun in the speech, a first set of coordinates having a first meaning for a first object; identifying, based at least in part on a pronoun in the speech, a second set of coordinates having a second meaning for a second object; associating the first object and the second object with the pronoun in the speech, to yield an association; and performing an action based on the speech and the association.
-
Specification