System and method of supporting adaptive misrecognition conversational speech
First Claim
1. A system for processing natural language utterances, comprising:
- a multimodal device configured to receive a natural language utterance;
a speech recognition engine configured to recognize one or more words from the natural language utterance;
a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words and generate a request based on the interpretation of the natural language utterance;
a domain agent configured to process the generated request;
an adaptive misrecognition engine configured to monitor one or more actions associated with the domain agent processing the request, determine whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, and generate an unrecognized event in response to determining that the interpretation of the natural language utterance is incorrect; and
an analyzer configured to;
track an interaction pattern with the system over time for a user that provided the natural language utterance;
generate a personalized cognitive model for the user based on the interaction pattern tracked for the user;
use the personalized cognitive model to predict the one or more actions associated with the domain agent processing the request; and
update the personalized cognitive model based on a frequency of incorrect interpretations for the request.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.
857 Citations
44 Claims
-
1. A system for processing natural language utterances, comprising:
-
a multimodal device configured to receive a natural language utterance; a speech recognition engine configured to recognize one or more words from the natural language utterance; a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words and generate a request based on the interpretation of the natural language utterance; a domain agent configured to process the generated request; an adaptive misrecognition engine configured to monitor one or more actions associated with the domain agent processing the request, determine whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, and generate an unrecognized event in response to determining that the interpretation of the natural language utterance is incorrect; and an analyzer configured to; track an interaction pattern with the system over time for a user that provided the natural language utterance; generate a personalized cognitive model for the user based on the interaction pattern tracked for the user; use the personalized cognitive model to predict the one or more actions associated with the domain agent processing the request; and update the personalized cognitive model based on a frequency of incorrect interpretations for the request. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for processing natural language utterances, comprising:
-
a multimodal device configured to receive a natural language utterance; a speech recognition engine configured to recognize one or more words from the natural language utterance; a parser configured to generate a plurality of interpretations of the natural language utterance and generate a request based on a best interpretation selected from the plurality of interpretations of the natural language utterance; a domain agent configured to process the generated request; an adaptive misrecognition engine configured to monitor one or more actions associated with the domain agent processing the request, determine whether the best interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, and generate an unrecognized event in response to determining that the best interpretation of the natural language utterance is incorrect; and an analyzer configured to; track an interaction pattern with the system over time for a user that provided the natural language utterance; generate a personalized cognitive model for the user based on the interaction pattern tracked for the user; and use the personalized cognitive model to predict the one or more actions associated with the domain agent processing the request and select a next best interpretation from the plurality of interpretations in response to the adaptive misrecognition engine determining that the best interpretation selected by the parser is incorrect.
-
-
25. A system for processing natural language utterances, comprising:
-
a multimodal device configured to receive a natural language utterance; a speech recognition engine configured to recognize one or more words from the natural language utterance; a parser configured to generate an interpretation of the natural language utterance from the one or more recognized words and generate a request based on the interpretation of the natural language utterance; a domain agent configured to process the generated request; an adaptive misrecognition engine configured to monitor one or more actions associated with the domain agent processing the request, determine whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, and generate an unrecognized event in response to determining that the interpretation of the natural language utterance is incorrect; and an analyzer configured to; track interaction patterns with the system over time for a plurality of users, including a user that provided the natural language utterance; generate a generalized cognitive model for the plurality of users, wherein the generalized cognitive model includes a statistical abstract that corresponds to the interaction patterns tracked for the plurality of users; and update the generalized cognitive model based on a frequency of incorrect interpretations for the request.
-
-
26. A method for processing natural language utterances, comprising:
-
receiving a natural language utterance at a multimodal device; recognizing one or more words from the natural language utterance using a speech recognition engine coupled to the multimodal device; generating an interpretation of the natural language utterance from the one or more recognized words using a parser coupled to the multimodal device, wherein the parser generates a request based on the interpretation of the natural language utterance; invoking a domain agent configured to process the generated request; monitoring one or more actions associated with the domain agent processing the request using an adaptive misrecognition engine; determining, at the adaptive misrecognition engine, whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, wherein the adaptive misrecognition engine generates an unrecognized event in response determining that the interpretation of the natural language utterance is incorrect; tracking an interaction pattern over time for a user that provided the natural language utterance using an analyzer associated with the adaptive misrecognition engine; generating, at the analyzer, a personalized cognitive model for the user based on the interaction pattern tracked for the user; using the personalized cognitive model to predict the one or more actions associated with the domain agent processing the request; and updating the personalized cognitive model using the analyzer based on a frequency of incorrect interpretations for the request. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. A method for processing natural language utterances, comprising:
-
receiving a natural language utterance at a multimodal device; recognizing one or more words from the natural language utterance using a speech recognition engine coupled to the multimodal device; generating a plurality of interpretations of the natural language utterance from the one or more recognized words using a parser coupled to the multimodal device, wherein the parser generates a request based on a best interpretation selected from the plurality of interpretations of the natural language utterance; invoking a domain agent configured to process the generated request; monitoring one or more actions associated with the domain agent processing the request using an adaptive misrecognition engine; determining, at the adaptive misrecognition engine, whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, wherein the adaptive misrecognition engine generates an unrecognized event in response determining that the interpretation of the natural language utterance is incorrect; tracking an interaction pattern over time for a user that provided the natural language utterance using an analyzer associated with the adaptive misrecognition engine; generating, at the analyzer, a personalized cognitive model for the user based on the interaction pattern tracked for the user; and using the personalized cognitive model to predict the one or more actions associated with the domain agent processing the request and select a next best interpretation from the plurality of interpretations in response to the adaptive misrecognition engine determining that the best interpretation selected by the parser is incorrect.
-
-
44. A method for processing natural language utterances, comprising:
-
receiving a natural language utterance at a multimodal device; recognizing one or more words from the natural language utterance using a speech recognition engine coupled to the multimodal device; generating an interpretation of the natural language utterance from the one or more recognized words using a parser coupled to the multimodal device, wherein generating the interpretation of the natural language utterance includes the parser generating a request based on the interpretation of the natural language utterance; invoking a domain agent configured to process the generated request; monitoring one or more actions associated with the domain agent processing the request using an adaptive misrecognition engine; determining, at the adaptive misrecognition engine, whether the interpretation of the natural language utterance is correct or incorrect based on the one or more monitored actions, wherein the adaptive misrecognition engine generates an unrecognized event in response determining that the interpretation of the natural language utterance is incorrect; tracking interaction patterns over time for a plurality of users, including a user that provided the natural language utterance, using an analyzer associated with the adaptive misrecognition engine; generating, at the analyzer, a generalized cognitive model for the plurality of users, wherein the generalized cognitive model includes a statistical abstract that corresponds to the interaction patterns tracked for the plurality of users; and updating the generalized cognitive model using the analyzer based on a frequency of incorrect interpretations for the request.
-
Specification