Configurable speech recognition system using multiple recognizers

US 10,049,669 B2
Filed: 01/06/2012
Issued: 08/14/2018
Est. Priority Date: 01/07/2011
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:

receiving, by the electronic device, input audio comprising speech;

accessing, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device;

processing, by the embedded speech recognizer prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device;

determining based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and

sending, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.

Citations

22 Claims

1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
- receiving, by the electronic device, input audio comprising speech;
  
  accessing, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device;
  
  processing, by the embedded speech recognizer prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device;
  
  determining based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and
  
  sending, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein processing at least a portion of the input audio comprises:
    - identifying a type of voice command associated with the input audio;
      
      selecting a speech recognition grammar and/or a recognition vocabulary based, at least in part, on the identified type of voice command; and
      
      wherein processing the at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected speech recognition grammar and/or the selected recognition vocabulary.
  - 3. The method of claim 1, further comprising:
    - selecting a recognition vocabulary based, at least in part, on the personal information; and
      
      wherein processing at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected recognition vocabulary.
  - 4. The method of claim 3, wherein the personal information includes a recent call list stored on the electronic device and wherein the selected recognition vocabulary includes words associated with the recent call list.
  - 5. The method of claim 1, further comprising:
    - receiving a remote speech recognition result from the network device; and
      
      performing at least one action based, at least in part, on the remote speech recognition result.
  - 6. The method of claim 1, wherein the personal information is selected from the group consisting of a recent call list, a task list, and calendar information.
  - 7. The method of claim 1, further comprising:
    - determining that the user of the electronic device is requesting to initiate a call; and
      
      wherein processing the at least a portion of the input audio comprises processing at least a portion of the input audio based, at least in part, on a recognition vocabulary associated with entries in a contact list stored on the electronic device.
  - 8. The method of claim 1, further comprising:
    - determining that the user of the electronic device is requesting to initiate a web-based search;
      
      selecting in response to the determining that the user of the electronic device is requesting to initiate a web-based search, a search grammar; and
      
      wherein processing the at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected search grammar.
  - 9. The method of claim 1, wherein processing the at least a portion of the input audio comprises:
    - configuring the embedded speech recognizer based, at least in part, on the personal information; and
      
      performing at least one action based, at least in part, on the recognized speech produced by the embedded speech recognizer.
  - 10. The method of claim 9, wherein the at least one action is selected from the group consisting of initiating a call, sending a communication including text, and performing a web-based search.
  - 11. The method of claim 1, further comprising:
    - sending, in response to determining to send at least a portion of the input audio to the network device, the recognized speech corresponding to the at least a portion of the input audio to the network device.
  - 12. The method of claim 1, wherein the at least some of the accessed personal information stored on the electronic device that is sent to the network device includes information associated with a contact list.

13. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by at least one processor on an electronic device in a distributed speech recognition system comprising the electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, perform a method comprising:
- receiving, by the electronic device, input audio comprising speech;
  
  accessing, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device;
  
  processing, by the embedded speech recognizer prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device;
  
  determining based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and
  
  sending, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The computer-readable storage medium of claim 13, wherein processing the at least a portion of the input audio comprises:
    - identifying a type of voice command associated with the input audio;
      
      selecting a speech recognition grammar and/or a recognition vocabulary based, at least in part, on the identified type of voice command; and
      
      wherein processing the at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected speech recognition grammar and/or the selected recognition vocabulary.
  - 15. The computer-readable storage medium of claim 13, further comprising:
    - selecting a recognition vocabulary based, at least in part, on the personal information; and
      
      wherein processing the at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected recognition vocabulary.
  - 16. The computer-readable storage medium of claim 13, further comprising:
    - determining that the user of the electronic device is requesting to initiate a web-based search;
      
      selecting in response to the determining that the user of the electronic device is requesting to initiate a web-based search, a search grammar; and
      
      wherein processing the at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected search grammar.
  - 17. The computer-readable storage medium of claim 13, wherein processing the at least a portion of the input audio comprises:
    - configuring the embedded speech recognizer based, at least in part, on the personal information; and
      
      performing at least one action based, at least in part, on the recognized speech produced by the embedded speech recognizer.

18. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
- at least one storage device configured to store personal information associated with the electronic device and/or a user of the electronic device; and
  
  an embedded speech recognizer configured to;
  
  receive input audio comprising speech;
  
  access, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device;
  
  process, prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device; and
  
  at least one processor programmed to;
  
  determine based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and
  
  send, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The electronic device of claim 18, wherein processing at least a portion of the input audio comprises:
    - identifying a type of voice command associated with the input audio;
      
      selecting a speech recognition grammar and/or a recognition vocabulary based, at least in part, on the identified type of voice command; and
      
      processing the at least a portion of the input audio using the selected speech recognition grammar and/or the selected recognition vocabulary.
  - 20. The electronic device of claim 18, wherein the at least one processor is further programmed to select a recognition vocabulary based, at least in part, on the personal information;
    - andwherein processing the at least a portion of the input audio comprises processing the at least a portion of the input audio using the selected recognition vocabulary.
  - 21. The electronic device of claim 18, wherein the at least one processor is further programmed to:
    - receive a remote speech recognition result from the network device; and
      
      perform at least one action based, at least in part, on the remote speech recognition result.
  - 22. The electronic device of claim 18, wherein the at least one processor is further programmed to:
    - configure the embedded speech recognizer based, at least in part, on the personal information; and
      
      perform at least one action based, at least in part, on the recognized speech produced by the embedded speech recognizer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Newman, Michael, Gillet, Anthony, Krowitz, David Mark, Edgington, Michael D.
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
Villena, Mark

Application Number

US13/345,173
Publication Number

US 20120179463A1
Time in Patent Office

2,412 Days
Field of Search

704231, 7042701, 704275
US Class Current
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 17/00   Speaker identification or v...

Configurable speech recognition system using multiple recognizers

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Configurable speech recognition system using multiple recognizers

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links