System and method for processing multi-modal device interactions in a natural language voice services environment
First Claim
1. A method for facilitating natural language processing of user inputs via multiple input modes where each user input alone may be insufficient to completely and/or accurately determine a user request intended by a user, the method being implemented by a computer system that includes one or more physical processors executing computer program instructions which, when executed, perform the method, the method comprising:
- receiving, at the computer system, a first user input of a user from a first input device via a first input mode, wherein the first user input is generated responsive to the user interacting with the first input device in a manner corresponding to the first input mode to provide the first user input;
receiving, at the computer system, a second user input of the user from a second input device via a second input mode, wherein the second user input is generated responsive to the user interacting with the second input device in a manner corresponding to the second input mode to provide the second user input, wherein the first user input and the second user input are related to one another, and wherein one of the first user input or the second user input comprises a voice input received from at least one of the first input device or the second input device via a voice input mode, and the other one of the first user input or the second user input comprises a non-voice input received from at least one of the first input device or the second input device via a non-voice input mode;
determining, by the computer system, based on the second user input, context information for interpreting the first user input, wherein the context information identifies a first item of a first item type;
determining, by the computer system, further context information based on the first user input, wherein the further context information identifies a second item of a second item type that is related to the first item of the first item type;
generating, by the computer system, a query based on the context information and the further context information to obtain one or more intermediary results, wherein the generated query comprises a query related to the second item of the second item type;
determining, by the computer system, a user request based on the one or more intermediary results;
providing, by the computer system, a response to the user request; and
providing, by the computer system, based on at least one of the context information for interpreting the first user input or the further context information, an advertisement for presentation to the user.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
709 Citations
20 Claims
-
1. A method for facilitating natural language processing of user inputs via multiple input modes where each user input alone may be insufficient to completely and/or accurately determine a user request intended by a user, the method being implemented by a computer system that includes one or more physical processors executing computer program instructions which, when executed, perform the method, the method comprising:
-
receiving, at the computer system, a first user input of a user from a first input device via a first input mode, wherein the first user input is generated responsive to the user interacting with the first input device in a manner corresponding to the first input mode to provide the first user input; receiving, at the computer system, a second user input of the user from a second input device via a second input mode, wherein the second user input is generated responsive to the user interacting with the second input device in a manner corresponding to the second input mode to provide the second user input, wherein the first user input and the second user input are related to one another, and wherein one of the first user input or the second user input comprises a voice input received from at least one of the first input device or the second input device via a voice input mode, and the other one of the first user input or the second user input comprises a non-voice input received from at least one of the first input device or the second input device via a non-voice input mode; determining, by the computer system, based on the second user input, context information for interpreting the first user input, wherein the context information identifies a first item of a first item type; determining, by the computer system, further context information based on the first user input, wherein the further context information identifies a second item of a second item type that is related to the first item of the first item type; generating, by the computer system, a query based on the context information and the further context information to obtain one or more intermediary results, wherein the generated query comprises a query related to the second item of the second item type; determining, by the computer system, a user request based on the one or more intermediary results; providing, by the computer system, a response to the user request; and providing, by the computer system, based on at least one of the context information for interpreting the first user input or the further context information, an advertisement for presentation to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for facilitating natural language processing of user inputs via multiple input modes where each user input alone may be insufficient to completely and/or accurately determine a user request intended by a user, the system comprising:
one or more physical processors programmed with computer program instructions which, when executed, cause the one or more physical processors to; receive a first user input of a user from a first input device via a first input mode, wherein the first user input is generated responsive to the user interacting with the first input device in a manner corresponding to the first input mode to provide the first user input; receive a second user input of the user from a second input device via a second input mode, wherein the second user input is generated responsive to the user interacting with the second input device in a manner corresponding to the second input mode to provide the second user input, wherein the first user input and the second user input are related to one another, and wherein one of the first user input or the second user input comprises a voice input received from at least one of the first input device or the second input device via a voice input mode, and the other one of the first user input or the second user input comprises a non-voice input received from at least one of the first input device or the second input device via a non-voice input mode; determine, based on the second user input, context information for interpreting the first user input, wherein the context information identifies a first item of a first item type; determine further context information based on the first user input, wherein the further context information identifies a second item of a second item type that is related to the first item of the first item type; generate a query based on the context information and the further context information to obtain one or more intermediary results, wherein the generated query comprises a query related to the second item of the second type; determine a user request based on the one or more intermediary results; provide a response to the user request; and provide, based on at least one of the context information for interpreting the first user input or the further context information, an advertisement for presentation to the user. - View Dependent Claims (16, 17, 18, 19, 20)
Specification