Systems and methods for responding to natural language speech utterance

US 8,326,634 B2
Filed: 02/02/2011
Issued: 12/04/2012
Est. Priority Date: 08/05/2005
Status: Active Grant

First Claim

Patent Images

1. A system for multi-pass speech recognition, comprising:

an input device configured to receive a natural language utterance; and

a multi-pass speech recognition module configured to transcribe the natural language utterance, wherein to transcribe the natural language utterance, the multi-pass speech recognition module is further configured to;

use a dictation grammar to transcribe the natural language utterance in response to a platform associated with the multi-pass speech recognition module having the dictation grammar available;

oruse a virtual dictation grammar to transcribe the natural language utterance in response to the platform associated with the multi-pass speech recognition module not having the dictation grammar available.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.

Citations

36 Claims

1. A system for multi-pass speech recognition, comprising:
- an input device configured to receive a natural language utterance; and
  
  a multi-pass speech recognition module configured to transcribe the natural language utterance, wherein to transcribe the natural language utterance, the multi-pass speech recognition module is further configured to;
  
  use a dictation grammar to transcribe the natural language utterance in response to a platform associated with the multi-pass speech recognition module having the dictation grammar available;
  
  oruse a virtual dictation grammar to transcribe the natural language utterance in response to the platform associated with the multi-pass speech recognition module not having the dictation grammar available.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein the multi-pass speech recognition module is further configured to dynamically constrain a vocabulary of words associated with the virtual dictation grammar to include one or more decoy words for out-of-vocabulary words based on one or more prior utterances that were successfully transcribed.
  - 3. The system of claim 1, further comprising an agent associated with a context matching a command or request associated with the transcribed natural language utterance, wherein the agent is configured to:
    - process the command or request to generate a response to the natural language utterance; and
      
      update a context stack with information associated with the matching context, the generated command or request, or the generated response to enable one or more follow-up commands or requests associated with the matching context, the generated command or request, or the generated response.
  - 4. The system of claim 1, wherein the multi-pass speech recognition module is configured to use the dictation grammar or the virtual dictation grammar to completely or partially transcribe the natural language utterance.

5. A system for multi-pass speech recognition, comprising:
- an input device configured to receive a natural language utterance; and
  
  a multi-pass speech recognition module configured to;
  
  determine whether a platform associated with the multi-pass speech recognition module has a dictation grammar available or a virtual dictation grammar available; and
  
  use the dictation grammar or the virtual dictation grammar to transcribe the natural language utterance based on whether the platform has the dictation grammar available or the virtual dictation grammar available.

6. A method for multi-pass speech recognition, comprising:
- receiving a natural language utterance at an input device; and
  
  transcribing the natural language utterance with a multi-pass speech recognition module, wherein transcribing the natural language utterance with the multi-pass speech recognition module includes;
  
  using a dictation grammar to transcribe the natural language utterance in response to determining that a platform associated with the multi-pass speech recognition module has the dictation grammar available;
  
  orusing a virtual dictation grammar to transcribe the natural language utterance in response to determining that the platform associated with the multi-pass speech recognition module does not have the dictation grammar available.
- View Dependent Claims (7, 8, 9)
- - 7. The method of claim 6, further comprising dynamically constraining a vocabulary of words associated with the virtual dictation grammar to include one or more decoy words for out-of-vocabulary words based on one or more prior utterances that the multi-pass speech recognition module successfully transcribed.
  - 8. The method of claim 6, further comprising:
    - identifying a context matching a command or request associated with the transcribed natural language utterance;
      
      processing the command or request at an agent associated with the identified context, wherein the agent processes the command or request to generate a response to the natural language utterance; and
      
      updating a context stack with information associated with the identified context, the generated command or request, or the generated response, wherein the agent updates the context stack with the information associated with the identified context, the generated command or request, or the generated response to enable one or more follow-up commands or requests associated with the identified context, the generated command or request, or the generated response.
  - 9. The method of claim 6, wherein transcribing the natural language utterance with the multi-pass speech recognition module includes the multi-pass speech recognition module using the dictation grammar or the virtual dictation grammar to completely or partially transcribe the natural language utterance.

10. A method for multi-pass speech recognition, comprising:
- receiving a natural language utterance at an input device;
  
  determining whether a platform associated with a multi-pass speech recognition module has a dictation grammar available or a virtual dictation grammar available; and
  
  transcribing the natural language utterance with the multi-pass speech recognition module, wherein the multi-pass speech recognition module uses the dictation grammar or the virtual dictation grammar to transcribe the natural language utterance based on whether the platform has the dictation grammar available or the virtual dictation grammar available.

11. A system for knowledge-enhanced speech recognition, comprising:
- a context stack configured to store one or more expected contexts associated with a natural language utterance; and
  
  a knowledge-enhanced speech recognition engine, wherein the knowledge-enhanced speech recognition engine includes one or more processors configured to;
  
  access the one or more expected contexts stored in the context stack in response to one or more active grammars in a context description grammar failing to completely match information associated with the natural language utterance;
  
  compare the information associated with the natural language utterance to one or more context specific matchers to determine a most likely context associated with the natural language utterance from the one or more expected contexts stored in the context stack; and
  
  use one or more grammar expression entries in the context description grammar to generate a command or request associated with the most likely context.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, wherein the one or more processors are further configured to communicate the generated command or request to an agent configured to process the generated command or request in the most likely context and generate a response to the natural language utterance.
  - 13. The system of claim 11, wherein the one or more processors are further configured to determine an intent or correct a recognition associated with the natural language utterance based on the most likely context.
  - 14. The system of claim 11, wherein the speech recognition engine is configured to dynamically constrain a vocabulary of words associated with a virtual dictation grammar to include one or more decoy words for out-of-vocabulary words based on one or more prior utterances that were successfully transcribed by the speech recognition engine.
  - 15. The system of claim 11, wherein the one or more processors are configured to use a dictation grammar or a virtual dictation grammar to completely or partially transcribe the natural language utterance.

16. A system for knowledge-enhanced speech recognition comprising:
- a context stack configured to store one or more expected contexts associated with a natural language utterance;
  
  a knowledge-enhanced speech recognition engine, wherein the knowledge-enhanced speech recognition engine includes one or more processors configured to;
  
  access the one or more expected contexts stored in the context stack in response to one or more active grammars in a context description grammar failing to completely match information associated with the natural language utterance;
  
  compare the information associated with the natural language utterance to one or more context specific matchers to determine a most likely context associated with the natural language utterance from the one or more expected contexts stored in the context stack; and
  
  use one or more, grammar expression entries in the context description grammar to generate a command or request associated with the most likely context; and
  
  an agent configured to;
  
  process the generated command or request in the most likely context to generate a response to the natural language utterance; and
  
  update an ordered list associated with the one or more expected contexts in the context stack with information associated with one or more of the most likely context, the generated command or request, or the generated response to enable one or more follow-up commands or requests associated with the most likely context, the generated command or request, or the generated response.

17. A system for knowledge-enhanced speech recognition, comprising:
- a context stack configured to store one or more expected contexts associated with a natural language utterance; and
  
  a knowledge-enhanced speech recognition engine, wherein the knowledge-enhanced speech recognition engine includes one or more processors configured to;
  
  access the one or more expected contexts stored in the context stack in response to one or more active grammars in a context description grammar failing to completely match information associated with the natural language utterance;
  
  compare the information associated with the natural language utterance to one or more context specific matchers to determine a most likely context associated with the natural language utterance from the one or more expected contexts stored in the context stack; and
  
  use one or more grammar expression entries in the context description grammar to generate a command or request associated with the most likely context,wherein the information compared to the one or more context specific matchers includes phonetic information associated with the natural language utterance or text combinations from a transcription associated with the natural language utterance.
- View Dependent Claims (18, 19)
- - 18. The system of claim 17, wherein the speech recognition engine is configured to dynamically constrain a vocabulary of words associated with a virtual dictation grammar to include one or more decoy words for out-of-vocabulary words based on one or more prior utterances that were successfully transcribed by the speech recognition engine.
  - 19. The system of claim 17, wherein the one or more processors are configured to use a dictation grammar or a virtual dictation grammar to completely or partially transcribe the natural language utterance.

20. A method for knowledge-enhanced speech recognition, comprising:
- storing one or more expected contexts in a context stack, wherein a knowledge-enhanced speech recognition engine that includes one or more processors accesses the one or more expected contexts in the context stack in response to one or more active grammars in a context description grammar failing to completely match information associated with the natural language utterance;
  
  comparing the information associated with the natural language utterance to one or more context specific matchers to determine a most likely context associated with the natural language utterance, wherein the knowledge-enhanced speech recognition engine determines the most likely context from the one or more expected contexts in the context stack; and
  
  using one or more grammar expression entries in the context description grammar to generate a command or request associated with the most likely context.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The method of claim 20, further comprising communicating the generated command or request to an agent that processes the generated command or request in the most likely context and generates a response to the natural language utterance.
  - 22. The method of claim 20, wherein the knowledge-enhanced speech recognition engine determines an intent or corrects a recognition associated with the natural language utterance based on the most likely context.
  - 23. The method of claim 20 further comprising:
    - dynamically constraining a vocabulary of words associated with a virtual dictation grammar to include one or more decoy words for out-of-vocabulary words based on one or more prior utterances that were successfully transcribed by the speech recognition engine.
  - 24. The method of claim 20 further comprising:
    - using a dictation grammar or a virtual dictation grammar to completely or partially transcribe the natural language utterance.

25. A method for knowledge-enhanced speech recognition, further comprising:
- storing one or more expected contexts in a context stack, wherein a knowledge-enhanced speech recognition engine that includes one or more processors accesses the one or more expected contexts in the context stack in response to one or more active grammars in a context description grammar failing to completely match information associated with the natural language utterance;
  
  comparing the information associated with the natural language utterance to one or more context specific matchers to determine a most likely context associated with the natural language utterance, wherein the knowledge-enhanced speech recognition engine determines the most likely context from the one or more expected contexts in the context stack;
  
  using one or more grammar expression entries in the context description grammar to generate a command or request associated with the most likely context;
  
  processing the generated command or request with an agent associated with the most likely context, wherein the agent processes the generated command or request to generate a response to the natural language utterance; and
  
  updating an ordered list associated with the one or more expected contexts in the context stack with information associated with one or more of the most likely context, the generated command or request, or the generated response, wherein the agent updates the ordered list to enable one or more follow-up commands or requests associated with the most likely context, the generated command or request, or the generated response.

26. A method for knowledge-enhanced speech recognition, comprising:
- storing one or more expected contexts in a context stack, wherein a knowledge-enhanced speech recognition engine that includes one or more processors accesses the one or more expected contexts in the context stack in response to one or more active grammars in a context description grammar failing to completely match information associated with the natural language utterance;
  
  comparing the information associated with the natural language utterance to one or more context specific matchers to determine a most likely context associated with the natural language utterance, wherein the knowledge-enhanced speech recognition engine determines the most likely context from the one or more expected contexts in the context stack; and
  
  using one or more grammar expression entries in the context description grammar to generate a command or request associated with the most likely context,wherein the information compared to the one or more context specific matchers includes phonetic information associated with the natural language utterance or text combinations from a transcription associated with the natural language utterance.
- View Dependent Claims (27, 28)
- - 27. The method of claim 26 further comprising:
    - dynamically constraining a vocabulary of words associated with a virtual dictation grammar to include one or more decoy words for out-of-vocabulary words based on one or more prior utterances that were successfully transcribed by the speech recognition engine.
  - 28. The method of claim 26 further comprising:
    - using a dictation grammar or a virtual dictation grammar to completely or partially transcribe the natural language utterance.

29. A system for synchronizing context across multiple electronic devices, comprising:
- one or more processors configured to;
  
  subscribe a first electronic device to one or more context events;
  
  receive a context change event from a second electronic device; and
  
  inform the first electronic device of the context change event to synchronize a context across the first electronic device and the second electronic device; and
  
  a registration module configured to;
  
  register a library specifically associated with the first electronic device to subscribe the first electronic device to the one or more context events; and
  
  remove the library specifically associated with the first electronic device to unsubscribe the first electronic device from the one or more context events.
- View Dependent Claims (30, 31, 32)
- - 30. The system of claim 29, wherein the one or more processors are further configured to:
    - receive a subsequent context change event from the first electronic device; and
      
      inform the second electronic device of the subsequent context change event in response to the second electronic device having a subscription to the subsequent context change event.
  - 31. The system of claim 29, further comprising a context tracking module configured to track the context change event received from the second electronic device and track the context synchronized across the first electronic device and the second electronic device.
  - 32. The system of claim 29, wherein the one or more processors are configured to inform the first electronic device of the context change event in response to the one or more subscribed context events including the context change event.

33. A method for synchronizing context across multiple electronic devices, comprising:
- subscribing a first electronic device to one or more context events;
  
  receiving a context change event from a second electronic device;
  
  informing the first electronic device of the context change event to synchronize a context across the first electronic device and the second electronic device;
  
  registering a library specifically associated with the first electronic device to subscribe the first electronic device to the one or more context events; and
  
  removing the library specifically associated with the first electronic device to unsubscribe the first electronic device from the one or more context events.
- View Dependent Claims (34, 35, 36)
- - 34. The method of claim 33, further comprising:
    - receiving a subsequent context change event from the first electronic device; and
      
      informing the second electronic device of the subsequent context change event in response to the second electronic device having a subscription to the subsequent context change event.
  - 35. The method of claim 33, further comprising tracking the context change event received from the second electronic device and the context synchronized across the first electronic device and the second electronic device at a context tracking module.
  - 36. The method of claim 33, wherein a context manager informs the first electronic device of the context change event in response to the one or more subscribed context events including the context change event.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dialect, LLC
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Di Cristo, Philippe, Ke, Min, Kennewick, Robert A., Armstrong, Lynn Elise
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US13/019,834
Publication Number

US 20110131045A1
Time in Patent Office

671 Days
Field of Search

None
US Class Current

704/270.1
CPC Class Codes

G06F 16/3329   Natural language query form...

G06F 16/335   Filtering based on addition...

G06F 40/30   Semantic analysis

G06F 40/35   Discourse or dialogue repre...

G10L 15/18   using natural language mode...

G10L 15/1822   Parsing for meaning underst...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Systems and methods for responding to natural language speech utterance

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for responding to natural language speech utterance

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links