Allocation of speech recognition tasks and combination of results thereof

US 8,589,156 B2
Filed: 07/12/2004
Issued: 11/19/2013
Est. Priority Date: 07/12/2004
Status: Active Grant

First Claim

Patent Images

1. A system for using multiple speech recognizers, the system comprising:

an allocation determination mechanism to determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech, wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server,wherein said allocation determination mechanism is to use a threshold set on a vocabulary size to determine the complexity level of the speech,a task allocation mechanism to allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on a determination by the allocation determination mechanism; and

a combination mechanism to receive results from the multiple speech recognizers and combine the results into a single result,wherein the results from each of the multiple speech recognizers include recognized words and a confidence score for each of the recognized words, andwherein, to combine the results, the combination mechanism is to compare the results from the multiple speech recognizers on a word-to-word basis and select a word from one of the multiple speech recognizers as a recognized word for the single result based on the confidence score of that word.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, computer-readable medium, and computer-implemented system for optimizing allocation of speech recognition tasks among multiple speech recognizers and combining recognizer results is described. An allocation determination is performed to allocate speech recognition among multiple speech recognizers using at least one of an accuracy-based allocation mechanism, a complexity-based allocation mechanism, and an availability-based allocation mechanism. The speech recognition is allocated among the speech recognizers based on the determined allocation. Recognizer results received from multiple speech recognizers in accordance with the speech recognition task allocation are combined.

Citations

19 Claims

1. A system for using multiple speech recognizers, the system comprising:
- an allocation determination mechanism to determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech, wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server,wherein said allocation determination mechanism is to use a threshold set on a vocabulary size to determine the complexity level of the speech,a task allocation mechanism to allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on a determination by the allocation determination mechanism; and
  
  a combination mechanism to receive results from the multiple speech recognizers and combine the results into a single result,wherein the results from each of the multiple speech recognizers include recognized words and a confidence score for each of the recognized words, andwherein, to combine the results, the combination mechanism is to compare the results from the multiple speech recognizers on a word-to-word basis and select a word from one of the multiple speech recognizers as a recognized word for the single result based on the confidence score of that word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the allocation determination mechanism is further to determine the allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers.
  - 3. The system of claim 1, wherein the combination mechanism is further to use multiple confusion matrices, each corresponding to an audio environment type at the mobile device, to combine the results received from the multiple speech recognizers.
  - 4. The system of claim 3, further comprising:
    - an audio environment determination mechanism to determine an environment condition of the mobile device, and (ii) based on the determined environment condition, select one of multiple confusion matrices for the mobile-device-based speech recognizer for use by the combination mechanism in combining the results.
  - 5. The system of claim 4, wherein said audio environment determination mechanism is to determine a signal to noise ratio of the speech.
  - 6. The system of claim 1, wherein the threshold for complexity is further based on a number of times a user of the mobile device has to repeat what was spoken.
  - 7. The system of claim 1, wherein the allocation determination mechanism is further to determine the allocation of the speech recognition tasks based on an accuracy requirement of a transaction attempted, and a noise level of the speech.
  - 8. The system of claim 1, wherein each of recognized words in the results from the multiple speech recognizers further includes a weighting factor for the word, andwherein the combination mechanism is further to select a word from one of the multiple speech recognizers as a recognized word for the single result based on the weighting factor of that word.
  - 9. The system of claim 8, wherein, if a word from the mobile-device-based speech recognizer matches a word from the server-based speech recognizer, the combination mechanism is to select that word as a recognized word for the single result, andif a word from the mobile-device-based speech recognizer does not match a corresponding word from the server-based speech recognizer, the combination mechanism is to combine the confidence score and weighting factor of that word to generate a comparison value, and select one of the words based on the comparison values of the words.

10. A method of using multiple speech recognizers, said method comprising:
- determining an allocation of speech recognition tasks among the multiple speech recognizers based on a complexity level of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size, and wherein the multiple speech recognizers include a mobile-device-based speech recognizer on a mobile device and a server-based speech recognizer on a server;
  
  allocating the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation;
  
  receiving results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words; and
  
  combining the results to generate a single result, includingcomparing the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselecting a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word.
- View Dependent Claims (11, 12, 13)
- - 11. The method of claim 10, wherein determining the allocation of the speech recognition tasks is further based on at least one of a required accuracy of speech recognition output and an availability of the multiple speech recognizers.
  - 12. The method of claim 10, further comprising:
    - generating multiple confusion matrices based on different predetermined audio environment types for the mobile-device-based speech recognizer;
      
      determining an audio environment type at the mobile device; and
      
      selecting an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type.
  - 13. The method of claim 10, further comprising:
    - if the complexity of the speech is below the threshold, allocating the speech recognition tasks to the mobile-device-based speech recognizer, andif the results provided by the mobile-device-based speech recognizer are below a predetermined threshold, allocating the speech recognition tasks to the server-based speech recognizer for re-processing.

14. A non-transitory computer-readable medium, on which is stored machine executable instructions which when executed by a processor cause the processor to:
- determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size and wherein the multiple speech recognizers include a mobile-device-based speech recognizer on a mobile device and a server-based speech recognizer on a server;
  
  allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation;
  
  receive results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words; and
  
  combine the results to generate a single result, includingcompare the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselect a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word.
- View Dependent Claims (15, 16)
- - 15. The non-transitory computer-readable medium of claim 14, wherein the machine readable instructions, when executed by the processor, are further to cause the processor to determine the allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers.
  - 16. The non-transitory computer-readable medium of claim 14, further comprising instructions which, when executed by the processor, cause the processor to:
    - generate, for the mobile-device-based speech recognizer, multiple confusion matrices based on different predetermined audio environment types; and
      
      determine an audio environment type at the mobile device and select an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type.

17. A computer-implemented system for allocating speech recognition tasks among multiple speech recognizers, the system comprising:
- a processor; and
  
  a memory coupled to the processor, the memory having stored therein instructions causing the processor to;
  
  determine an allocation of the speech recognition tasks among multiple speech recognizers based on a complexity of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size, and wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server;
  
  allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation, andreceive results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words;
  
  combine the results to generate a single result, includingcompare the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselect a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word.
- View Dependent Claims (18, 19)
- - 18. The system of claim 17, wherein the instructions, when executed, are further to cause the processor to determine an allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers.
  - 19. The system of claim 17, further comprising instructions which, when executed by the processor, cause the processor to:
    - generate, for the mobile-device-based speech recognizer, multiple confusion matrices based on different predetermined audio environment types; and
      
      determine an audio environment type at the mobile device and select an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Valtrus Innovations Limited (f/k/a Dolya Holdco 9 Limited) (Key Patent Innovations Limited)
Original Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Inventors
Burke, Paul M., Yacoub, Sherif
Primary Examiner(s)
Godbold, Douglas

Application Number

US10/888,593
Publication Number

US 20060009980A1
Time in Patent Office

3,417 Days
Field of Search

704/231, 704/233, 704/236, 704/251, 704/270, 455/563
US Class Current

704/231
CPC Class Codes

G10L 15/10   using distance or distortio...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/227   of the speaker; Human-fact...

Allocation of speech recognition tasks and combination of results thereof

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Allocation of speech recognition tasks and combination of results thereof

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links