Configurable speech recognition system using multiple recognizers

US 8,930,194 B2
Filed: 01/06/2012
Issued: 01/06/2015
Est. Priority Date: 01/07/2011
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition in a distributed speech recognition system comprising an electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, the method comprising:

receiving, by the electronic device, input audio uninterrupted by one or more prompts output from the electronic device, wherein the input audio comprises input speech;

identifying multiple types of information in the input speech;

determining whether speech recognition by the remote speech recognizer is desired, wherein the determining is based, at least in part, on the identified types of information in the input speech; and

in response to determining that speech recognition by the remote speech recognizer is desired, processing a first portion of the input speech by the embedded speech recognizer and sending a second portion of the input speech to the network device for recognition by the remote speech recognizer.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.

Citations

20 Claims

1. A method of performing speech recognition in a distributed speech recognition system comprising an electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, the method comprising:
- receiving, by the electronic device, input audio uninterrupted by one or more prompts output from the electronic device, wherein the input audio comprises input speech;
  
  identifying multiple types of information in the input speech;
  
  determining whether speech recognition by the remote speech recognizer is desired, wherein the determining is based, at least in part, on the identified types of information in the input speech; and
  
  in response to determining that speech recognition by the remote speech recognizer is desired, processing a first portion of the input speech by the embedded speech recognizer and sending a second portion of the input speech to the network device for recognition by the remote speech recognizer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the second portion of the input speech is processed prior to being sent to the network device.
  - 3. The method of claim 1, wherein the embedded speech recognizer is configured to perform speech recognition on at least one first type of information and the remote speech recognizer is configured to perform speech recognition on at least one second type of information, and wherein determining whether speech recognition by the remote speech recognizer is desired comprises:
    - determining whether the input speech includes input audio corresponding to the at least one second type of information.
  - 4. The method of claim 1, wherein identifying multiple types of information in the input speech comprises:
    - processing the input speech by the embedded speech recognizer to determine the first portion of the input audio as corresponding to a first type of information and the second portion of the input audio as corresponding to a second type of information.
  - 5. The method of claim 4, further comprising:
    - determining whether the embedded speech recognizer is configured to recognize the first type of information and/or the second type of information with an accuracy above a threshold; and
      
      determining that speech recognition by the remote speech recognition is desired in response to determining that the embedded speech recognizer is not configured to recognize the first type of information and/or the second type of information with an accuracy above the threshold.
  - 6. The method of claim 1, further comprising:
    - recognizing the entire input speech by the embedded speech recognizer in response to determining that speech recognition by the remote speech recognizer is not desired; and
      
      performing at least one action based, at least in part, on a local speech recognition result produced by the embedded speech recognizer.
  - 7. The method of claim 6, wherein the at least one action is selected from the group consisting of initiating a phone call, sending a communication including text, and performing a web-based operation.
  - 8. The method of claim 1, wherein the multiple types of information are selected from the group consisting of voice command information, name information, free-flow dictation, and web-search parameters.
  - 9. The method of claim 1, wherein at least one of the multiple types of information in the input audio is identified as free-flow dictation and/or web-search parameters, and wherein the method further comprises:
    - determining that speech recognition by the remote speech recognizer is desired in response to identifying at least one of the multiple types of information in the input audio as free-flow dictation and/or web-search parameters.

10. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by at least one processor on an electronic device in a distributed speech recognition system comprising the electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, perform a method comprising:
- receiving, by the electronic device, input audio uninterrupted by one or more prompts output from the electronic device, wherein the input audio comprises input speech;
  
  identifying multiple types of information in the input speech;
  
  determining whether speech recognition by the remote speech recognizer is desired, wherein the determining is based, at least in part, on the identified types of information in the input speech; and
  
  in response to determining that speech recognition by the remote speech recognizer is desired, processing a first portion of the input speech by the embedded speech recognizer and sending a second portion of the input speech to the network device for recognition by the remote speech recognizer.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer-readable storage medium of claim 10, wherein the second portion of the input speech is processed prior to being sent to the network device.
  - 12. The computer-readable storage medium of claim 10, wherein identifying multiple types of information in the input speech comprises:
    - processing the input speech by the embedded speech recognizer to determine the first portion of the input audio as corresponding to a first type of information and the second portion of the input audio corresponding to a second type of information.
  - 13. The computer-readable storage medium of claim 12, wherein the method further comprises:
    - determining whether the embedded speech recognizer is configured to recognize the first type of information and/or the second type of information with an accuracy above a threshold; and
      
      determining that speech recognition by the remote speech recognition is desired in response to determining that the embedded speech recognizer is not configured to recognize the first type of information and/or the second type of information with an accuracy above the threshold.
  - 14. The computer-readable storage medium of claim 10, wherein the method further comprises:
    - recognizing the entire input speech by the embedded speech recognizer in response to determining that speech recognition by the remote speech recognizer is not desired; and
      
      performing at least one action based, at least in part, on a local speech recognition result produced by the embedded speech recognizer.
  - 15. The computer-readable storage medium of claim 10, wherein at least one of the multiple types of information in the input audio is identified as free-flow dictation and/or web-search parameters, wherein the method further comprises:
    - determining that speech recognition by the remote speech recognizer is desired in response to identifying at least one of the multiple types of information in the input audio as free-flow dictation and/or web-search parameters.

16. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
- an embedded speech recognizer configured to receive input audio uninterrupted by one or more prompts output from the electronic device, wherein the input audio comprises input speech; and
  
  at least one processor programmed to;
  
  identify multiple types of information in the input speech;
  
  determine whether speech recognition by the remote speech recognizer is desired, wherein the determining is based, at least in part, on the identified types of information in the input speech; and
  
  in response to determining that speech recognition by the remote speech recognizer is desired, process a first portion of the input speech by the embedded speech recognizer and send a second portion of the input speech to the network device for recognition by the remote speech recognizer.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The electronic device of claim 16, wherein the second portion of the input speech is processed prior to being sent to the network device.
  - 18. The electronic device of claim 16, wherein identifying multiple types of information in the input speech comprises processing the input speech by the embedded speech recognizer to determine the first portion of the input audio as corresponding to a first type of information and the second portion of the input audio as corresponding to a second type of information and wherein the processor is further programmed to:
    - determine whether the embedded speech recognizer is configured to recognize the first type of information and/or the second type of information with an accuracy above a threshold; and
      
      determine that speech recognition by the remote speech recognition is desired in response to determining that the embedded speech recognizer is not configured to recognize the first type of information and/or the second type of information with an accuracy above the threshold.
  - 19. The electronic device of claim 16, wherein the embedded speech recognizer is configured to recognize the entire input speech in response to determining that speech recognition by the remote speech recognizer is not desired;
    - andwherein the at least one processor is further programmed to perform at least one action based, at least in part, on a local speech recognition result produced by the embedded speech recognizer.
  - 20. The distributed speech recognition system of claim 16, wherein at least one of the multiple types of information in the input audio is identified as free-flow dictation and/or web-search parameters, wherein the at least one processor is further programmed to:
    - determine that speech recognition by the remote speech recognizer is desired in response to identifying at least one of the multiple types of information in the input audio as free-flow dictation and/or web-search parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Newman, Michael, Gillet, Anthony, Krowitz, David Mark, Edgington, Michael D.
Primary Examiner(s)
Abebe, Daniel D

Application Number

US13/345,238
Publication Number

US 20120179464A1
Time in Patent Office

1,096 Days
Field of Search

704/275
US Class Current

704/275
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 17/00   Speaker identification or v...

Configurable speech recognition system using multiple recognizers

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Configurable speech recognition system using multiple recognizers

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links