MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS
First Claim
1. A mobile device for processing multi-modal natural language inputs, comprising:
- a conversational voice user interface that receives a multi-modal natural language input from a user, the multi-modal natural language input including a natural language utterance and a non-speech input, the conversational voice user interface coupled to a transcription module that transcribes the non-speech input to create a non-speech-based transcription;
a conversational speech analysis engine that identifies the user that provided the multi-modal natural language input, the conversational speech analysis engine using a speech recognition engine and a semantic knowledge-based model to create a speech-based transcription of the natural language utterance, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the mobile device, a general cognitive model derived from one or more prior interactions between a plurality of users and the mobile device, and an environmental model derived from an environment of the identified user and the mobile device;
a merging module that merges the speech-based transcription and the non-speech-based transcription to create a merged transcription;
a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the merged transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; and
a response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request.
5 Assignments
0 Petitions
Accused Products
Abstract
A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.
673 Citations
32 Claims
-
1. A mobile device for processing multi-modal natural language inputs, comprising:
-
a conversational voice user interface that receives a multi-modal natural language input from a user, the multi-modal natural language input including a natural language utterance and a non-speech input, the conversational voice user interface coupled to a transcription module that transcribes the non-speech input to create a non-speech-based transcription; a conversational speech analysis engine that identifies the user that provided the multi-modal natural language input, the conversational speech analysis engine using a speech recognition engine and a semantic knowledge-based model to create a speech-based transcription of the natural language utterance, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the mobile device, a general cognitive model derived from one or more prior interactions between a plurality of users and the mobile device, and an environmental model derived from an environment of the identified user and the mobile device; a merging module that merges the speech-based transcription and the non-speech-based transcription to create a merged transcription; a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the merged transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; and a response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for processing multi-modal natural language inputs, comprising:
-
a plurality of mobile devices that support multi-modal natural language interactions with a user; a context manager communicatively coupled to the plurality of mobile devices, wherein the context manager synchronizes a semantic knowledge-based model and a context stack among the plurality of mobile devices, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the user and one or more of the mobile devices, a general cognitive model derived from one or more prior interactions between a plurality of users and one or more of the mobile devices, and an environmental model derived from an environment of the user and one or more of the mobile devices; a conversational voice user interface communicatively coupled to one or more of the plurality of mobile devices, wherein the conversational voice user interface receives a multi-modal natural language input from the user that includes at least a natural language utterance; a conversational speech analysis engine that identifies the user that provided the multi-modal input, the conversational speech analysis engine using a speech recognition engine and the semantic knowledge-based model to create a speech-based transcription of the natural language utterance; a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the speech-based transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; and a response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request. - View Dependent Claims (16, 17, 18)
-
-
19. A method for processing multi-modal natural language inputs, comprising:
-
receiving a multi-modal natural language input at a conversational voice user interface, the multi-modal input including a natural language utterance and a non-speech input provided by a user, wherein a transcription module coupled to the conversational voice user interface transcribes the non-speech input to create a non-speech-based transcription; identifying the user that provided the multi-modal input; creating a speech-based transcription of the natural language utterance using a speech recognition engine and a semantic knowledge-based model, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the conversational voice user interface, a general cognitive model derived from one or more prior interactions between a plurality of users and the conversational voice user interface, and an environmental model derived from an environment of the identified user and the conversational voice user interface; merging the speech-based transcription and the non-speech-based transcription to create a merged transcription; identifying one or more entries in a context stack matching information contained in the merged transcription; determining a most likely context for the multi-modal input based on the identified entries; identifying a domain agent associated with the most likely context for the multi-modal input; communicating a request to the identified domain agent; and generating a response to the user from content provided by the identified domain agent as a result of processing the request. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification