System and method for processing multi-modal device interactions in a natural language voice services environment
First Claim
1. A computer-implemented method for processing a natural language utterance via multiple input modes, the method being implemented by a computer system that includes one or more processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
- receiving, by the one or more physical processors, a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance;
processing, by the one or more physical processors, the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word;
identifying, by the one or more physical processors based on the one or more recognized words, a query;
identifying, by the one or more physical processors based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word;
determining, by the one or more physical processors based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises;
identifying the reference word;
identifying, based on the reference word context, a product or service to which the reference word refers; and
determining an meaning of the reference word based on the identification of the product or service; and
generating, by the one or more physical processors based on the one or more interpretations, a response to the query.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
-
Citations
16 Claims
-
1. A computer-implemented method for processing a natural language utterance via multiple input modes, the method being implemented by a computer system that includes one or more processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
-
receiving, by the one or more physical processors, a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance; processing, by the one or more physical processors, the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word; identifying, by the one or more physical processors based on the one or more recognized words, a query; identifying, by the one or more physical processors based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word; determining, by the one or more physical processors based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises; identifying the reference word; identifying, based on the reference word context, a product or service to which the reference word refers; and determining an meaning of the reference word based on the identification of the product or service; and generating, by the one or more physical processors based on the one or more interpretations, a response to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A device for processing a natural language utterance via multiple input modes, comprising:
one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to; receive a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance; process the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word; identify, based on the one or more recognized words, a query; identify, based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word; determine, based on the context information, one or more interpretations of the one or more words, wherein determining the one or more interpretations comprises; identifying the reference word; identifying, based on the reference word context, a product or service to which the reference word refers; and determining an meaning of the reference word based on the identification of the product or service; and generate, based on the one or more interpretations, a response to the query.
-
13. A non-transitory computer-readable medium for processing a natural language utterance via multiple input modes, the non-transitory computer-readable medium comprising one or more instructions that, when executed by one or more processors, cause the one or more processors to:
-
receive a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance; process the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word; identify, based on the one or more recognized words, a query; identify, based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word; determine, based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises; identifying the reference word; identifying, based on the reference word context, a product or service to which the reference word refers; and determining an meaning of the reference word based on the identification of the product or service; and generate, based on the one or more interpretations, a response to the query.
-
-
14. A computer-implemented method of processing a natural language utterance via multiple input modes, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising:
-
receiving, by the one or more physical processors, a first user input via a voice input mode and a second user input via a non-voice input mode, wherein the first user input includes a natural language utterance, and the second user input includes a non-voice input relating to the natural language utterance; processing, by the one or more physical processors, the natural language utterance to recognize one or more words of the natural language utterance, wherein the one or more recognized words include a reference word; identifying, by the one or more physical processors based on the one or more recognized words, a command; identifying, by the one or more physical processors based on the non-voice input, context information for the one or more recognized words, wherein the context information indicates context for the reference word; determining, by the one or more physical processors based on the context information, one or more interpretations of the one or more recognized words, wherein determining the one or more interpretations comprises; identifying the reference word; identifying, based on the reference word context, a product or service to which the reference word refers; and determining an meaning of the reference word based on the identification of the product or service; and generating, by the one or more physical processors based on the one or more interpretations, a command signal associated with the command. - View Dependent Claims (15, 16)
-
Specification