Multimodal input system
First Claim
1. In a computing environment, a method performed at least in part on at least one processor, the method comprising:
- receiving sets of input data corresponding to a plurality of input modalities, the received sets of input data including a first set of input data and a second set of input data, the first set of input data being associated with a first input modality from the plurality of input modalities, the second set of input data being associated with a second input modality from the plurality of input modalities, the first input modality being a speech or text input modality and the second input modality being a gesture input modality;
selecting the first set of input data and the second set of input data;
identifying, within a dictionary, speech or text input for the first set of input data and a gesture for the second set of input data to determine a meaning of a combination of the first and second set of input data; and
providing output data for input by a program, the output data corresponding to the meaning of the first and second set of input data.
1 Assignment
0 Petitions
Accused Products
Abstract
The subject disclosure relates to user input into a computer system, and a technology by which one or more users interact with a computer system via a combination of input modalities. When the input data of two or more input modalities are related, they are combined to interpret an intended meaning of the input. For example, speech when combined with one input gesture has one intended meaning, e.g., convert the speech to verbatim text for consumption by a program, while the exact speech when combined with a different input gesture has a different meaning, e.g., convert the speech to a command that controls the operation of that same program.
16 Citations
20 Claims
-
1. In a computing environment, a method performed at least in part on at least one processor, the method comprising:
-
receiving sets of input data corresponding to a plurality of input modalities, the received sets of input data including a first set of input data and a second set of input data, the first set of input data being associated with a first input modality from the plurality of input modalities, the second set of input data being associated with a second input modality from the plurality of input modalities, the first input modality being a speech or text input modality and the second input modality being a gesture input modality; selecting the first set of input data and the second set of input data; identifying, within a dictionary, speech or text input for the first set of input data and a gesture for the second set of input data to determine a meaning of a combination of the first and second set of input data; and providing output data for input by a program, the output data corresponding to the meaning of the first and second set of input data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
a plurality of input devices that provide data corresponding to input modalities; one or more processors programmed to; receive, from the input devices, sets of input data corresponding to the input modalities, the received sets of input data including a first set of input data and a second set of input data, the first set of input data being associated with a first input modality from the plurality of input modalities, the second set of input data being associated with a second input modality from the plurality of input modalities, the first input modality being a speech or text input modality and the second input modality being a gesture input modality; select the first set of input data and the second set of input data; identify, within a dictionary, speech or text input for the first set of input data and a gesture for the second set of input data to determine a meaning of a combination of the first and second set of input data; and provide output data for input by a program, the output data corresponding to the meaning of the first and second set of input data. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17)
-
18. One or more computer-readable storage devices having computer-executable instructions, that cause a processor to perform operations comprising:
-
receiving sets of input data corresponding to a plurality of input modalities, the received sets of input data including a first set of input data and a second set of input data, the first set of input data being associated with a first input modality from the plurality of input modalities, the second set of input data being associated with a second input modality from the plurality of input modalities, the first input modality being a speech or text input modality and the second input modality being a gesture input modality; selecting the first set of input data and the second set of input data; identifying, within a dictionary, speech or text input for the first set of input data and a gesture for the second set of input data to determine a meaning of a combination of the first and second set of input data; and providing output data for input by a program, the output data corresponding to the meaning of the first and second set of input data. - View Dependent Claims (19, 20)
-
Specification