System and method for processing multi-modal device interactions in a natural language voice services environment
First Claim
1. A method for processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- detecting a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input;
obtaining an indication of a first time at which the non-voice input was received by the non-voice input component;
obtaining an indication of a second time at which the natural language utterance was received by the voice input component;
determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and
responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, performing the following steps;
determining first context information relating to the non-voice input;
determining second context information relating to the natural language utterance;
determining an intent of the multi-modal user interaction based on the first context information and the second context information;
identifying a transaction lead based on the determined intent; and
transmitting the identified transaction lead to a user via the one or more electronic devices.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
935 Citations
30 Claims
-
1. A method for processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
-
detecting a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input; obtaining an indication of a first time at which the non-voice input was received by the non-voice input component; obtaining an indication of a second time at which the natural language utterance was received by the voice input component; determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, performing the following steps; determining first context information relating to the non-voice input; determining second context information relating to the natural language utterance; determining an intent of the multi-modal user interaction based on the first context information and the second context information; identifying a transaction lead based on the determined intent; and transmitting the identified transaction lead to a user via the one or more electronic devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system of processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the system comprising:
-
one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to; detect a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input; obtain an indication of a first time at which the non-voice input was received by the non-voice input component; obtain an indication of a second time at which the natural language utterance was received by the voice input component; determine that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, perform the following steps; determine first context information relating to the non-voice input; determine second context information relating to the natural language utterance; determine an intent of the multi-modal user interaction based on the first context information and the second context information; identify a transaction lead based on the determined intent; and transmit the identified transaction lead to a user via the one or more electronic devices. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification