CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

US 20120179457A1
Filed: 01/06/2012
Published: 07/12/2012
Est. Priority Date: 01/07/2011
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition in a distributed speech recognition system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:

receiving, by the electronic device, input audio comprising speech;

transmitting at least a portion of the input audio to the network device for processing by the remote speech recognizer;

processing, by the embedded speech recognizer, at least a portion of the input audio to produce a local speech recognition result; and

performing a partial action, based, at least in part, on the local speech recognition result.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.

Citations

20 Claims

1. A method of performing speech recognition in a distributed speech recognition system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
- receiving, by the electronic device, input audio comprising speech;
  
  transmitting at least a portion of the input audio to the network device for processing by the remote speech recognizer;
  
  processing, by the embedded speech recognizer, at least a portion of the input audio to produce a local speech recognition result; and
  
  performing a partial action, based, at least in part, on the local speech recognition result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising:
    - receiving from the network device a remote speech recognition result; and
      
      performing a full action that completes the partial action, wherein performing the full action is based, at least in part, on the remote speech recognition result received from the network device.
  - 3. The method of claim 1, wherein the partial action is initiated prior to receiving the remote speech recognition result from the network device.
  - 4. The method of claim 1, wherein transmitting at least a portion of the input audio to the network device comprises transmitting a processed version of the at least a portion of the input audio to the network device.
  - 5. The method of claim 4, wherein the processed version is a compressed version of the at least a portion of the input audio.
  - 6. The method of claim 1, further comprising:
    - determining a confidence value associated with the local speech recognition result; and
      
      performing a full action that completes the partial action in response to determining that the confidence value is greater than a predetermined threshold value.
  - 7. The method of claim 1, further comprising:
    - determining a confidence value associated with the local speech recognition result; and
      
      performing the partial action in response to determining that the confidence value is below a predetermined threshold value.
  - 8. The method of claim 1, wherein performing a partial action comprises starting an application on the electronic device.
  - 9. The method of claim 8, wherein the application is configured to send a communication including text and wherein performing the partial action further comprises:
    - filling in at least one first field in the communication based, at least in part, on the local speech recognition result.
  - 10. The method of claim 9, further comprising:
    - receiving from the network device a remote speech recognition result; and
      
      filling in at least one second field in the communication based, at least in part, on the remote speech recognition result.
  - 11. The method of claim 1, further comprising:
    - determining whether a full action to complete the partial action can be performed without receiving a speech recognition result from the network device; and
      
      performing the full action in response to determining that the full action can be performed without receiving a speech recognition result from the network device.
  - 12. The method of claim 1, further comprising:
    - determining that the input audio does not include enough information to perform a full action to complete the partial action; and
      
      prompting the user of the electronic device to provide additional information in response to determining that the input audio does not include enough information to perform the full action.
  - 13. The method of claim 5, wherein starting an application on the electronic device comprises starting an application selected from the group consisting of an text-messaging application, an email application, a phone call application, and a web-based application.

14. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by at least one processor on an electronic device in a distributed speech recognition system comprising the electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, perform a method comprising:
- receiving, by the electronic device, input audio comprising speech;
  
  transmitting at least a portion of the input audio to the network device for processing by the remote speech recognizer;
  
  processing, by the embedded speech recognizer, at least a portion of the input audio to produce a local speech recognition result; and
  
  performing a partial action, based, at least in part, on the local speech recognition result.
- View Dependent Claims (15, 16, 17)
- - 15. The computer-readable storage medium of claim 14, further comprising:
    - receiving from the network device a remote speech recognition result; and
      
      performing a full action to complete the partial action, wherein performing the full action is based, at least in part, on the remote speech recognition result received from the network device.
  - 16. The computer-readable storage medium of claim 14, further comprising:
    - determining a confidence value associated with the local speech recognition result; and
      
      performing a full action to complete the partial action in response to determining that the confidence value is greater than a predetermined threshold value.
  - 17. The computer-readable storage medium of claim 14, wherein performing the partial action comprises starting an application on the electronic device.

18. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
- at least one storage device configured to store one or more applications;
  
  an embedded speech recognizer configured to;
  
  receive input audio comprising speech;
  
  transmit at least a portion of the input audio to the network device for processing by the remote speech recognizer;
  
  process at least a portion of the input audio to produce a local speech recognition result; and
  
  perform a partial action, based, at least in part, on the local speech recognition result.
- View Dependent Claims (19, 20)
- - 19. The electronic device of claim 18, wherein the embedded speech recognizer is configured to:
    - receive from the network device a remote speech recognition result; and
      
      perform a full action to complete the partial action, wherein performing the full action is based, at least in part, on the remote speech recognition result received from the network device.
  - 20. The electronic device of claim 19, wherein performing the partial action comprises:
    - starting at least one of the one or more applications stored on the at least one storage device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Newman, Michael, Gillet, Anthony, Krowitz, David Mark, Edgington, Michael D.

Granted Patent

US 9,953,653 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/201
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 17/00   Speaker identification or v...

CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links