System and method of supporting adaptive misrecognition in conversational speech
First Claim
1. A system for processing natural language utterances, comprising:
- a multimodal device configured to;
receive a natural language utterance; and
subsequently receive a follow-up multimodal input;
a speech recognition engine configured to recognize one or more words from the natural language utterance;
a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words, and further configured to generate a request based on the interpretation of the natural language utterance;
a domain agent configured to process the generated request; and
an adaptive misrecognition engine configured to;
monitor one or more actions associated with the domain agent processing the request; and
determine that the interpretation of the natural language utterance was incorrect if one or more of the monitored actions include the follow-up multimodal input being presented proximate in time to the prior natural language utterance.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.
673 Citations
50 Claims
-
1. A system for processing natural language utterances, comprising:
-
a multimodal device configured to; receive a natural language utterance; and subsequently receive a follow-up multimodal input; a speech recognition engine configured to recognize one or more words from the natural language utterance; a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words, and further configured to generate a request based on the interpretation of the natural language utterance; a domain agent configured to process the generated request; and an adaptive misrecognition engine configured to; monitor one or more actions associated with the domain agent processing the request; and determine that the interpretation of the natural language utterance was incorrect if one or more of the monitored actions include the follow-up multimodal input being presented proximate in time to the prior natural language utterance. - View Dependent Claims (2, 3)
-
-
4. A method for processing natural language utterances, comprising:
-
receiving a natural language utterance at a multimodal device; subsequently receiving a follow-up multimodal input at the multimodal device; recognizing one or more words from the natural language utterance using a speech recognition engine coupled to the multimodal device; generating an interpretation of the natural language utterance from the one or more recognized words using a parser coupled to the multimodal device, wherein the parser further generates a request based on the interpretation of the natural language utterance; invoking a domain agent configured to process the generated request; monitoring one or more actions associated with the domain agent processing the request using an adaptive misrecognition engine; and determining that the interpretation of the natural language utterance was incorrect if one or more of the monitored actions include the follow-up multimodal input being presented proximate in time to the prior natural language utterance.
-
-
5. A system for processing natural language utterances, comprising:
-
a multimodal device configured to receive a natural language utterance; a speech recognition engine configured to recognize one or more words from the natural language utterance; a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words, and further configured to generate a request based on the interpretation of the natural language utterance; a domain agent configured to process the generated request; an adaptive misrecognition engine configured to; monitor one or more actions associated with the domain agent processing the request; determine whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions; and generate an unrecognized event if the interpretation of the natural language utterance is determined to be incorrect; and an analyzer configured to; analyze the unrecognized event to determine a frequency of incorrect interpretations for the request; and determine one or more tuning parameters for at least one of the speech recognition engine or the parser based on the frequency of incorrect interpretations for the request, wherein the tuning parameters are used to improve interpretations of subsequent natural language utterances relating to the request. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A system for processing natural language utterances, comprising:
-
a multimodal device configured to receive a natural language utterance; a speech recognition engine configured to recognize one or more words from the natural language utterance; a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words, and further configured to generate a request based on the interpretation of the natural language utterance; a domain agent configured to process the generated request; and an adaptive misrecognition engine configured to; monitor one or more actions associated with the domain agent processing the request; and determine that the interpretation of the natural language utterance was incorrect if one or more of the monitored actions include a user overriding the request in a time that is shorter than an expected time for processing the request.
-
-
30. A method for processing natural language utterances, comprising:
-
receiving a natural language utterance at a multimodal device; recognizing one or more words from the natural language utterance using a speech recognition engine coupled to the multimodal device; generating an interpretation of the natural language utterance from the one or more recognized words using a parser coupled to the multimodal device, wherein the parser further generates a request based on the interpretation of the natural language utterance; invoking a domain agent configured to process the generated request; monitoring one or more actions associated with the domain agent processing the request using an adaptive misrecognition engine; determining whether the interpretation of the natural language utterance is correct or incorrect based on the actions monitored using the adaptive misrecognition engine; generating an unrecognized event if the adaptive misrecognition engine determines that the interpretation of the natural language utterance is incorrect; analyzing the unrecognized event to determine a frequency of incorrect interpretations for the request; and determining one or more tuning parameters for at least one of the speech recognition engine or the parser based on the frequency of incorrect interpretations for the request, wherein the tuning parameters are used to improve interpretations of subsequent natural language utterances relating to the request. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
-
50. A method for processing natural language utterances, comprising:
-
receiving a natural language utterance at a multimodal device; recognizing one or more words from the natural language utterance using a speech recognition engine coupled to the multimodal device; generating an interpretation of the natural language utterance from the one or more recognized words using a parser coupled to the multimodal device, wherein the parser further generates a request based on the interpretation of the natural language utterance; invoking a domain agent configured to process the generated request; monitoring one or more actions associated with the domain agent processing the request using an adaptive misrecognition engine; and determining that the interpretation of the natural language utterance was incorrect if one or more of the monitored actions include a user overriding the request in a time that is shorter than an expected time for processing the request.
-
Specification