Agent architecture for determining meanings of natural language utterances
First Claim
1. A system for processing natural language utterances, comprising:
- a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to;
receive a first natural language utterance;
determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value;
obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination;
look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling;
update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words;
receive a second natural language utterance that comprises a question;
generate a digitized speech signal from the second natural language utterance;
recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables;
tag the one or more words in the second natural language utterance with a user identity determined from voice characteristics associated with the digitized speech signal and one or more user profiles;
determine a context of the question in the second natural language utterance;
select one of the plurality of domain agents based on the context of the question;
generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question;
invoke the selected domain agent to cause the selected domain agent to process the request; and
receive a response to the request from the selected domain agent.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
698 Citations
50 Claims
-
1. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; generate a digitized speech signal from the second natural language utterance; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; tag the one or more words in the second natural language utterance with a user identity determined from voice characteristics associated with the digitized speech signal and one or more user profiles; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request; and receive a response to the request from the selected domain agent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
37. A system for processing natural language utterances, comprising:
one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to; receive a natural language utterance; determine that one or more words of the natural language utterance were unrecognized or incorrectly recognized in response to a recognition associated with the natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; identify, in one or more dictionary and phrase tables, one or more words that correspond to the one or more unrecognized or incorrectly recognized words based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables with respect to the one or more corresponding words based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a subsequent natural language utterance; generate a digitized speech signal from the subsequent natural language utterance; recognize one or more words in the subsequent natural language utterance based on a pronunciation associated with the one or more words in the subsequent natural language utterance using the one or more updated dictionary and phrase tables; and tag the one or more words in the subsequent natural language utterance with a user identity determined from voice characteristics associated with the digitized speech signal and one or more user profiles.
-
38. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a command; generate a digitized speech signal from the second natural language utterance; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; tag the one or more words in the second natural language utterance with a user identity determined from voice characteristics associated with the digitized speech signal and one or more user profiles; determine a context of the command in the second natural language utterance; select one of the plurality of domain agents based on the context of the command; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the command; and invoke the selected domain agent to cause the selected domain agent to process the request.
-
39. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request; generate one or more response utterances to present a result generated from the selected domain agent, wherein the selected domain agent is configured to select a presentation personality to format the result and use a template associated with the selected presentation personality to generate a response string that includes the generated result, wherein the template is associated with a sarcastic personality, a humorous personality, a sympathetic personality, or an irritable personality; and output the one or more response utterances via a speaker.
-
40. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request, wherein to process the request, the plurality of domain agents are each configured to send multiple duplicate queries to multiple local or remote information sources in response to determining that the request includes the question, and asynchronously evaluate responses associated with the multiple local or remote information sources; and receive a response to the request from the selected domain agent.
-
41. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request; and receive a response to the request from the selected domain agent, wherein the plurality of domain agents are part of an agent architecture that the computing device has access to, wherein the agent architecture further includes a system agent configured to (i) provide default functionality and services available to the plurality of domain agents, (ii) manage one or more criteria handlers that the computing device is configured to use to determine the context, wherein the one or more criteria handlers associated with the system agent are available to the system agent and the plurality of domain agents, and wherein the plurality of domain agents use different grammars.
-
42. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; receive a grammar from the selected domain agent; evaluate the determined context and the question using the grammar, wherein the request includes all tokens that are required to format the question based on the grammar; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question and one or more tokens that are optional to format the question based on the grammar; invoke the selected domain agent to cause the selected domain agent to process the request; and receive a response to the request from the selected domain agent.
-
43. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request; and receive a response to the request from the selected domain agent, wherein the plurality of domain agents are part of an agent architecture that the computing device has access to, wherein the agent architecture further includes (i) a system agent configured to provide default functionality and services available to the plurality of domain agents, and (ii) an update manager configured to (a) manage updates relating to one or more of the system agent, the plurality of domain agents, the agent library, one or more databases available to the agent architecture, or entries in the one or more dictionary and phrase tables, and (b) uninstall one or more of the plurality of domain agents that are unused pursuant to a license with a third party to manage the updates relating to the plurality of domain agents, and wherein the plurality of domain agents use different grammars.
-
44. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request; and receive a response to the request from the selected domain agent, wherein the plurality of domain agents are part of an agent architecture that the computing device has access to, wherein the agent architecture further includes (i) a system agent configured to provide default functionality and services available to the plurality of domain agents, and (ii) an update manager configured to manage updates relating to one or more of the system agent, the plurality of domain agents, the agent library, one or more databases available to the agent architecture, or entries in the one or more dictionary and phrase tables, wherein the updates include one or more of a new domain agent, additional domain knowledge associated with one or more of the plurality of domain agents, new keywords associated with one or more of the plurality of domain agents, preferred information sources associated with one or more of the plurality of domain agents, updated domain information associated with one or more of the plurality of domain agents, or updated content associated with one or more of the plurality of domain agents, and wherein the plurality of domain agents use different grammars, wherein the system agent is further configured to (i) use a network interface to locate the new domain agent in response to determining that none of the plurality of domain agents currently loaded in the agent architecture are suitable to process the request, and (ii) cause the update manager to load the new domain agent located via the network interface pursuant to the terms and conditions of the license and invoke the new domain agent to process the request.
-
45. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a question; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the question in the second natural language utterance by assigning a score to each of a plurality of candidate contexts based on age of the candidate context; select one of the plurality of domain agents based on the context of the question; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request, wherein to process the request, the plurality of domain agents are each configured to send multiple duplicate queries to multiple local or remote information sources in response to determining that the request includes the question, and asynchronously evaluate responses associated with the multiple local or remote information sources; and receive a response to the request from the selected domain agent.
-
46. A system for processing natural language utterances, comprising:
one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to; receive a natural language utterance; determine that one or more words of the natural language utterance were unrecognized or incorrectly recognized in response to a recognition associated with the natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; identify, in one or more dictionary and phrase tables, one or more words that correspond to the one or more unrecognized or incorrectly recognized words based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables with respect to the one or more corresponding words based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a subsequent natural language utterance that comprises a question; recognize one or more words in the subsequent natural language utterance based on a pronunciation associated with the one or more words in the subsequent natural language utterance using the one or more updated dictionary and phrase tables; determine a context of the question in the subsequent natural language utterance; select one of a plurality of domain agents based on the context; generate a request associated with the subsequent natural language utterance based on the one or more words in the subsequent natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; invoke the selected domain agent to cause the selected domain agent to process the request; generate one or more response utterances to present a result generated from the selected domain agent, wherein the selected domain agent is configured to select a presentation personality to format the result and use a template associated with the selected presentation personality to generate a response string that includes the generated result, wherein the template is associated with a sarcastic personality, a humorous personality, a sympathetic personality, or an irritable personality; and output the one or more response utterances via a speaker.
-
47. A system for processing natural language utterances, comprising:
one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to; receive a natural language utterance; determine that one or more words of the natural language utterance were unrecognized or incorrectly recognized in response to a recognition associated with the natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; identify, in one or more dictionary and phrase tables, one or more words that correspond to the one or more unrecognized or incorrectly recognized words based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables with respect to the one or more corresponding words based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a subsequent natural language utterance that comprises a question; recognize one or more words in the subsequent natural language utterance based on a pronunciation associated with the one or more words in the subsequent natural language utterance using the one or more updated dictionary and phrase tables; determine a context of the question in the subsequent natural language utterance; select one of a plurality of domain agents based on the context; generate a request associated with the subsequent natural language utterance based on the one or more words in the subsequent natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; and invoke the selected domain agent to cause the selected domain agent to process the request, wherein to process the request, the plurality of domain agents are each configured to send multiple duplicate queries to multiple local or remote information sources in response to determining that the request includes the question, and asynchronously evaluate responses associated with the multiple local or remote information sources.
-
48. A system for processing natural language utterances, comprising:
one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to; receive a natural language utterance; determine that one or more words of the natural language utterance were unrecognized or incorrectly recognized in response to a recognition associated with the natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; identify, in one or more dictionary and phrase tables, one or more words that correspond to the one or more unrecognized or incorrectly recognized words based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables with respect to the one or more corresponding words based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a subsequent natural language utterance that comprises a question; recognize one or more words in the subsequent natural language utterance based on a pronunciation associated with the one or more words in the subsequent natural language utterance using the one or more updated dictionary and phrase tables; determine a context of the question in the subsequent natural language utterance; select one of a plurality of domain agents based on the context; receive a grammar from the selected domain agent; evaluate the determined context and the question using the grammar, wherein the request includes all tokens that are required to format the question based on the grammar; generate a request associated with the subsequent natural language utterance based on the one or more words in the subsequent natural language utterance and the grammar, wherein the request includes the question and one or more tokens that are optional to format the question based on the grammar; and invoke the selected domain agent to cause the selected domain agent to process the request.
-
49. A system for processing natural language utterances, comprising:
one or more physical processors programmed to execute one or more computer program instructions which, when executed, cause the one or more physical processors to; receive a natural language utterance; determine one or more words of the natural language utterance were unrecognized or incorrectly recognized in response to a recognition associated with the natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; identify, in one or more dictionary and phrase tables, one or more words that correspond to the one or more unrecognized or incorrectly recognized words based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables with respect to the one or more corresponding words based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a subsequent natural language utterance that comprises a question; recognize one or more words in the subsequent natural language utterance based on a pronunciation associated with the one or more words in the subsequent natural language utterance using the one or more updated dictionary and phrase tables; determine a context of the question in the subsequent natural language utterance by assigning a score to each of a plurality of candidate contexts based on age of the candidate context; select one of a plurality of domain agents based on the context; generate a request associated with the subsequent natural language utterance based on the one or more words in the subsequent natural language utterance and a grammar used by the selected domain agent, wherein the request includes the question; and invoke the selected domain agent to cause the selected domain agent to process the request, wherein to process the request, the plurality of domain agents are each configured to send multiple duplicate queries to multiple local or remote information sources in response to determining that the request includes the question, and asynchronously evaluate responses associated with the multiple local or remote information sources.
-
50. A system for processing natural language utterances, comprising:
a computing device having access to a plurality of domain agents associated with a plurality of different domains, and programmed to execute one or more computer program instructions which, when executed, cause the computing device to; receive a first natural language utterance; determine that the first natural language utterance contains one or more words that were unrecognized or incorrectly recognized in response to a recognition associated with the first natural language utterance having a confidence level below a predetermined value; obtain a phonetic alphabet spelling associated with the one or more unrecognized or incorrectly recognized words in response to the determination; look up the one or more unrecognized or incorrectly recognized words in one or more dictionary and phrase tables based on the phonetic alphabet spelling; update the one or more dictionary and phrase tables based on a pronunciation associated with the one or more unrecognized or incorrectly recognized words; receive a second natural language utterance that comprises a command; recognize one or more words in the second natural language utterance based on a pronunciation associated with the one or more words using the one or more dictionary and phrase tables; determine a context of the command in the second natural language utterance; select one of the plurality of domain agents based on the context of the command; receive a grammar from the selected domain agent; evaluate the determined context and the command using the grammar, wherein the request includes all tokens that are required to format the command based on the grammar; generate a request associated with the second natural language utterance based on the one or more words in the second natural language utterance and a grammar used by the selected domain agent, wherein the request includes the command and one or more tokens that are optional to format the command based on the grammar; and invoke the selected domain agent to cause the selected domain agent to process the request.
Specification