CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

US 20120179471A1
Filed: 01/06/2012
Published: 07/12/2012
Est. Priority Date: 01/07/2011
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:

receiving, by the electronic device, input audio comprising speech;

determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer;

determining whether the recognition grammar includes at least one generic speech node;

determining that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and

sending at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.

Citations

20 Claims

1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
- receiving, by the electronic device, input audio comprising speech;
  
  determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer;
  
  determining whether the recognition grammar includes at least one generic speech node;
  
  determining that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and
  
  sending at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - generating a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar; and
      
      wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer an accuracy above a threshold.
  - 3. The method of claim 1, further comprising:
    - closing a network connection to the network device in response to determining that recognition by the remote speech recognizer is not desired.
  - 4. The method of claim 1, further comprising:
    - determining that at least one element of the recognition grammar has not been completed based on an analysis of the input speech;
      
      determining whether any of the uncompleted elements of the recognition grammar includes at least one generic speech node; and
      
      transmitting at least a portion of the input audio to the network device in response to determining that at least one of the uncompleted elements in the recognition grammar includes at least one generic speech node.
  - 5. The method of claim 1, wherein determining that at least a portion of the input audio matches a recognition grammar comprises determining that at least a portion of the input audio matches a voice command.
  - 6. The method of claim 5, wherein the voice command relates to sending a text message and wherein the recognition grammar is a text message grammar that includes at least one generic speech node corresponding to text inserted into a body of the text message.
  - 7. The method of claim 5, wherein the voice command relates to performing a web-based search and wherein the recognition grammar is a search grammar that includes at least one generic speech node corresponding to web-search parameters.
  - 8. The method of claim 1, further comprising:
    - receiving additional input audio comprising speech;
      
      updating at least one node in the recognition grammar based on the additional input audio; and
      
      re-evaluating the recognition grammar to determine if any of the nodes in the recognition grammar includes at least one generic speech node.
  - 9. The method of claim 8, further comprising:
    - closing a network connection to the network device in response to determining that none of the nodes in the re-evaluated recognition grammar includes at least one generic speech node.
  - 10. The method of claim 1, wherein the at least one generic speech node corresponds to input audio that may include free-flow dictation and/or web-search parameters.

11. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by at least one processor on an electronic device in a distributed speech recognition system comprising the electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, perform a method comprising:
- receiving, by the electronic device, input audio comprising speech;
  
  determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer;
  
  determining whether the recognition grammar includes at least one generic speech node;
  
  determining that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and
  
  sending at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The computer-readable storage medium of claim 11, wherein the method further comprises:
    - generating a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar; and
      
      wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer with an accuracy above a threshold.
  - 13. The computer-readable storage medium of claim 11, wherein the method further comprises:
    - closing a network connection to the network device in response to determining that recognition by the remote speech recognizer is not desired.
  - 14. The computer-readable storage medium of claim 11, wherein the method further comprises:
    - determining that at least one element of the recognition grammar has not been completed based on an analysis of the input speech;
      
      determining whether any of the uncompleted elements of the recognition grammar includes at least one generic speech node; and
      
      transmitting at least a portion of the input audio to the network device in response to determining that at least one of the uncompleted elements in the recognition grammar includes at least one generic speech node.
  - 15. The computer-readable storage medium of claim 11, wherein determining that at least a portion of the input audio matches a recognition grammar comprises determining that at least a portion of the input audio matches a voice command.
  - 16. The computer-readable storage medium of claim 11, wherein the method further comprises:
    - receiving additional input audio comprising speech;
      
      updating at least one node in the recognition grammar based on the additional input audio; and
      
      re-evaluating the recognition grammar to determine if any of the nodes in the recognition grammar includes at least one generic speech node.

17. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
- an embedded speech recognizer configured to receive input audio comprising speech; and
  
  at least one processor programmed to;
  
  determine that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer;
  
  determine whether the recognition grammar includes at least one generic speech node;
  
  determine that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and
  
  send at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired.
- View Dependent Claims (18, 19, 20)
- - 18. The electronic device of claim 17, wherein the at least one processor is further programmed to:
    - generate a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar; and
      
      wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer with an accuracy above a threshold.
  - 19. The electronic device of claim 17, wherein the at least one processor is further programmed to:
    - determine that at least one element of the recognition grammar has not been completed based on an analysis of the input speech;
      
      determine whether any of the uncompleted elements of the recognition grammar includes at least one generic speech node; and
      
      transmit at least a portion of the input audio to the network device in response to determining that at least one of the uncompleted elements in the recognition grammar includes at least one generic speech node.
  - 20. The distributed speech recognition system of claim 17, wherein determining that at least a portion of the input audio matches a recognition grammar comprises determining that at least a portion of the input audio matches a voice command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Newman, Michael, Gillet, Anthony, Krowitz, David Mark, Edgington, Michael D.

Granted Patent

US 8,898,065 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270.1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 17/00   Speaker identification or v...

CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links