Systems and methods for processing natural language speech utterances with context-specific domain agents

US 8,015,006 B2
Filed: 05/30/2008
Issued: 09/06/2011
Est. Priority Date: 06/03/2002
Status: Active Grant

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A method for processing natural language speech utterances with context-specific domain agents, comprising:

receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request;

recognizing, at a speech recognition engine coupled to the processing device, one or more words or phrases contained in the utterance using information in one or more dictionary and phrase tables, wherein recognizing the one or more words or phrases contained in the utterance includes;

dynamically updating the information in the one or more dictionary and phrase tables based on a dynamic set of prior probabilities or fuzzy possibilities;

determining an identity associated with a user that spoke the utterance based on voice characteristics associated with the utterance; and

associating the one or more recognized words or phrases and a pronunciation associated with the one or more recognized words or phrases with the determined identity and the request contained in the utterance in response to the one or more recognized words or phrases satisfying a predetermined confidence level;

parsing, at a parser coupled to the processing device, the one or more recognized words or phrases to determine a meaning associated with the utterance and a context associated with the request contained in the utterance, wherein the one or more recognized words or phrases are further associated with the determined context in response to the one or more recognized words or phrases satisfying the predetermined confidence level;

formulating, at the parser, the request contained in the utterance in accordance with a grammar used by a domain agent associated with the determined context;

processing the formulated request with the domain agent associated with the determined context to generate a response to the utterance; and

presenting the generated response to the utterance via the speech unit.

View all claims

5 Assignments

Timeline View

Assignment View

Litigations

1 Petition

Accused Products

Abstract

Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.

856 Citations

11 Claims

1. A method for processing natural language speech utterances with context-specific domain agents, comprising:
- receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request;
  
  recognizing, at a speech recognition engine coupled to the processing device, one or more words or phrases contained in the utterance using information in one or more dictionary and phrase tables, wherein recognizing the one or more words or phrases contained in the utterance includes;
  
  dynamically updating the information in the one or more dictionary and phrase tables based on a dynamic set of prior probabilities or fuzzy possibilities;
  
  determining an identity associated with a user that spoke the utterance based on voice characteristics associated with the utterance; and
  
  associating the one or more recognized words or phrases and a pronunciation associated with the one or more recognized words or phrases with the determined identity and the request contained in the utterance in response to the one or more recognized words or phrases satisfying a predetermined confidence level;
  
  parsing, at a parser coupled to the processing device, the one or more recognized words or phrases to determine a meaning associated with the utterance and a context associated with the request contained in the utterance, wherein the one or more recognized words or phrases are further associated with the determined context in response to the one or more recognized words or phrases satisfying the predetermined confidence level;
  
  formulating, at the parser, the request contained in the utterance in accordance with a grammar used by a domain agent associated with the determined context;
  
  processing the formulated request with the domain agent associated with the determined context to generate a response to the utterance; and
  
  presenting the generated response to the utterance via the speech unit.
- View Dependent Claims (2, 3)
- - 2. The method according to claim 1, wherein parsing the one or more recognized words or phrases to determine the context associated with the request includes:
    - matching the one or more recognized words or phrases to a set of keywords associated with the dynamic set of prior probabilities or fuzzy possibilities;
      
      scoring multiple possible contexts for the one or more recognized words or phrases matched to the set of keywords using a profile associated with the determined user identity, data content associated with the domain agent, or a context stack that includes one or more recent contexts; and
      
      selecting one of the scored multiple possible contexts having a highest score that satisfies the predetermined confidence level to be the determined context.
  - 3. The method according to claim 2, wherein parsing the one or more recognized words or phrases to determine the context associated with the request further includes:
    - requesting the user to verify the one or more recognized words or phrases in response to determining that none of the scored multiple possible contexts satisfy the predetermined confidence level; and
      
      determining the context based on information that the user provides in one or more subsequent natural language speech utterances.

4. A method for processing natural language speech utterances with context-specific domain agents, comprising:
- receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request;
  
  recognizing, at a speech recognition engine coupled to the processing device, one or more words or phrases contained in the utterance using information in one or more dictionary and phrase tables, wherein recognizing the one or more words or phrases contained in the utterance includes;
  
  requesting a user that spoke the utterance to spell the one or more words or phrases in response to the one or more recognized words or phrases failing to satisfy a predetermined confidence level; and
  
  updating the information in the one or more dictionary and phrase tables based on a phonetic alphabet spelling associated with the one or more words or phrases, wherein the user provides the phonetic alphabet spelling in one or more subsequent natural language speech utterances;
  
  parsing, at a parser coupled to the processing device, the phonetic alphabet spelling associated with the one or more words or phrases to determine a meaning associated with the utterance and a context associated with the request contained in the utterance;
  
  formulating, at the parser, the request contained in the utterance in accordance with a grammar used by a domain agent associated with the determined context;
  
  processing the formulated request with the domain agent associated with the determined context to generate a response to the utterance; and
  
  presenting the generated response to the utterance via the speech unit.

5. A method for processing natural language speech utterances with context-specific domain agents, comprising:
- receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request;
  
  recognizing, at a speech recognition engine coupled to the processing device, one or more words or phrases contained in the utterance using information in one or more dictionary and phrase tables;
  
  parsing, at a parser coupled to the processing device, information relating to the utterance to determine a meaning associated with the utterance and a context associated with the request contained in the utterance, wherein the parsed information includes the one or more recognized words or phrases;
  
  formulating, at the parser, the request contained in the utterance in accordance with a grammar used by a domain agent associated with the determined context, wherein formulating the request in accordance with the grammar used by the domain agent includes;
  
  determining one or more required values and one or more optional values associated with formulating the request in the grammar used by the domain agent;
  
  extracting one or more criteria and one or more parameters from one or more keywords contained in the one or more recognized words or phrases, wherein the parser extracts the one or more criteria and the one or more parameters using procedures sensitive to the determined context;
  
  inferring one or more further criteria and one or more further parameters associated with the request using a dynamic set of prior probabilities or fuzzy possibilities; and
  
  transforming the one or more extracted criteria, the one or more extracted parameters, the one or more inferred criteria, and the one or more inferred parameters into one or more tokens having a format compatible with the grammar used by the domain agent, wherein the one or more tokens include all the required values and one or more of the optional values associated with formulating the request in the grammar used by the domain agent;
  
  processing the formulated request with the domain agent associated with the determined context to generate a response to the utterance; and
  
  presenting the generated response to the utterance via the speech unit.

6. A method for processing natural language speech utterances with context-specific domain agents, comprising:
- receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request, wherein the request includes at least one command and at least one question;
  
  parsing, at a parser coupled to the processing device, information relating to the utterance to determine a meaning associated with the utterance and a context associated with the request contained in the utterance;
  
  formulating, at the parser, the request contained in the utterance in accordance with a grammar used by a domain agent associated with the determined context;
  
  processing the formulated request with the domain agent associated with the determined context to generate a response to the utterance, wherein processing the formulated request with the domain agent includes;
  
  directing the at least one command to one or more local or remote processing devices to execute the at least one command; and
  
  querying one or more local or network information sources to resolve information requested in the at least one question, wherein querying the one or more local or network information sources further includes;
  
  submitting multiple asynchronous queries to the one or more local or network information sources, wherein the multiple queries include one or more duplicate queries submitted to distinct information sources;
  
  asynchronously receiving multiple results to the multiple asynchronous queries from the one or more local or network information sources; and
  
  scoring a relevance associated with the multiple asynchronously received results using a dynamic set of prior probabilities or fuzzy possibilities to determine one or more best responses to the at least one question, wherein scoring the relevance associated with the multiple asynchronously received results to determine the one or more best responses includes;
  
  determining one or more tokens required to formulate a response to the at least one question;
  
  applying one or more scraping criteria to the multiple asynchronously received results to extract one or more values for the asynchronously received results;
  
  evaluating the one or more extracted values using the dynamic set of prior probabilities or fuzzy possibilities to resolve ambiguous, incomplete, or conflicting information associated with the one or more extracted values; and
  
  selecting one or more best values for the one or more required tokens from the one or more evaluated values, wherein the one or more best responses include the one or more best values for the one or more required tokens; and
  
  presenting the generated response to the utterance via the speech unit, wherein the response presented via the speech unit includes results associated with the one or more local or remote processing device executing the at least one command and the one or more best responses to the at least one question.
- View Dependent Claims (7, 8, 9)
- - 7. The method according to claim 6, wherein scoring the relevance associated with the multiple asynchronously received results to determine the one or more best responses further includes:
    - determining that the dynamic set of prior probabilities or fuzzy possibilities failed to resolve the ambiguous, incomplete, or conflicting information associated with the one or more extracted values; and
      
      resolving the ambiguous, incomplete, or conflicting information associated with the one or more extracted values using one or more subsequently received results associated with one or more the multiple asynchronous queries that remain pending.
  - 8. The method according to claim 6, wherein scoring the relevance associated with the multiple asynchronously received results to determine the one or more best responses further includes:
    - determining that the dynamic set of prior probabilities or fuzzy possibilities failed to resolve the ambiguous, incomplete, or conflicting information associated with the one or more extracted values; and
      
      submitting one or more additional queries to the local or network information sources to resolve the ambiguous, incomplete, or conflicting information associated with the one or more extracted values, wherein the domain agent infers the local or network information sources associated with the one or more additional queries based on the multiple asynchronous results to the multiple asynchronous queries that have already been received.
  - 9. The method according to claim 6, wherein scoring the relevance associated with the multiple asynchronously received results to determine the one or more best responses further includes:
    - determining that the dynamic set of prior probabilities or fuzzy possibilities failed to resolve the ambiguous, incomplete, or conflicting information associated with the one or more extracted values; and
      
      requesting a user that spoke the utterance to provide additional information relating to the request, wherein the domain agent uses the additional information relating to the request to resolve the ambiguous, incomplete, or conflicting information associated with the one or more extracted values.

10. A method for processing natural language speech utterances with context-specific domain agents, comprising:
- receiving, at a speech unit coupled to a processing device, a natural language speech utterance that contains a request;
  
  parsing, at a parser coupled to the processing device, information relating to the utterance to determine a meaning associated with the utterance and a context associated with the request contained in the utterance;
  
  formulating, at the parser, the request contained in the utterance in accordance with a grammar used by a domain agent associated with the determined context;
  
  processing the formulated request with the domain agent associated with the determined context to generate a response to the utterance; and
  
  presenting the generated response to the utterance via the speech unit, wherein presenting the generated response includes;
  
  selecting, by the domain agent, a format template to use in presenting the generated response;
  
  selecting, by the domain agent, a personality to use in presenting the generated response;
  
  determining, by the domain agent, an order few to use in presenting one or more tokens contained in the generated response; and
  
  performing, by the domain agent, one or more variable substitutions and transformations on the one or more tokens contained in the generated response to vary a terminology used in presenting the generated response.
- View Dependent Claims (11)
- - 11. The method according to claim 10, wherein presenting the generated response further includes formatting the one or more tokens contained in the generated response in accordance with the selected format template, wherein a text to speech engine reads the one or more formatted tokens in accordance with the selected personality to create a system-generated speech utterance that contains the generated response to present via the speech unit.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Dialect, LLC
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Locke, David, Kennewick, Michael R. Jr., Freeman, Tom, Kennewick, Michael R. Sr., Kennewick, Robert A., Kennewick, Richard
Primary Examiner(s)
Lerner; Martin

Application Number

US12/130,397
Publication Number

US 20080235023A1
Time in Patent Office

1,194 Days
Field of Search

704/236, 704/246, 704/257, 704/270, 704/244, 707/709, 707/723, 707/728, 707/771
US Class Current

704/236
CPC Class Codes

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Y10S 707/99933   Query processing, i.e. sear...

Systems and methods for processing natural language speech utterances with context-specific domain agents

First Claim

5 Assignments

Litigations

1 Petition

Accused Products

Abstract

856 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for processing natural language speech utterances with context-specific domain agents

First Claim

5 Assignments

Subscription Required

Subscription Required

Litigations

1 Petition

Subscription Required

Accused Products

Subscription Required

Abstract

856 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others