System and method for processing multi-modal device interactions in a natural language voice services environment
First Claim
1. A method for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, comprising:
- detecting at one or more electronic device, a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type;
extracting, at a processor, context information relating to the multi-modal input, from the first input and from the second inputdetermining, at the processor, a request from the first input or the second input;
processing, at the processor, the request based on the extracted context information relating to the multi-modal input;
generating at least one transaction lead based on the extracted context information of the multi-modal input;
receiving at least one further input relating to the generated at least one transaction lead; and
processing a transaction click-through in response to receiving the at least one further input.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
561 Citations
26 Claims
-
1. A method for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, comprising:
-
detecting at one or more electronic device, a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type; extracting, at a processor, context information relating to the multi-modal input, from the first input and from the second input determining, at the processor, a request from the first input or the second input; processing, at the processor, the request based on the extracted context information relating to the multi-modal input; generating at least one transaction lead based on the extracted context information of the multi-modal input; receiving at least one further input relating to the generated at least one transaction lead; and processing a transaction click-through in response to receiving the at least one further input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, wherein the system comprises a processing device configured to:
-
receive from one or more electronic devices a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type; extract context information relating to the multi-modal input, from the first input and from the second input; determine a request based on the non-voice input or the natural language utterance; process the request based on the extracted context information relating to the multi-modal input; and generate at least one transaction lead based on the extracted context information of the multi-modal input; receive at least one further input relating to the generated at least one transaction lead; and process a transaction click-through in response to receiving the at least one further input. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
Specification