Speech recognition system having multiple speech recognizers

US 7,228,275 B1
Filed: 01/13/2003
Issued: 06/05/2007
Est. Priority Date: 10/21/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system for recognizing an input speech signal, the speech recognition system comprising:

a first speech recognizer recognizing the input speech signal to generate a first speech text and a first confidence score indicating a level of accuracy of the first speech text;

a second speech recognizer recognizing the input speech signal to generate a second speech text and a second confidence score indicating a level of accuracy of the second speech text;

a computerized decision module coupled to the first speech recognizer and the second speech recognizer for selecting either the first speech text or the second speech text as an output speech text, whereinthe decision module receives external data selected from a group consisting of location information of a speaker of the input speech signal, the accent of the speaker, and the identity of the speaker;

the decision module adjusts the first confidence score to generate a first adjusted confidence score based upon the external data; and

the decision module selects the first speech text if the first adjusted confidence score is higher than the second confidence score.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system recognizes an input speech signal by using a first speech recognizer and a second speech recognizer each coupled to a decision module. Each of the first and second speech recognizers outputs first and second recognized speech texts and first and second associated confidence scores, respectively, and the decision module selects either the first or the second speech text depending upon which of the first or second confidence score is higher. The decision module may also adjust the first and second confidence scores to generate first and second adjusted confidence scores, respectively, and select either the first or second speech text depending upon which of the first or second adjusted confidence scores is higher. The first and second confidence scores may be adjusted based upon the location of a speaker, the identity or accent of the speaker, the context of the speech, and the like.

Citations

26 Claims

1. A speech recognition system for recognizing an input speech signal, the speech recognition system comprising:
- a first speech recognizer recognizing the input speech signal to generate a first speech text and a first confidence score indicating a level of accuracy of the first speech text;
  
  a second speech recognizer recognizing the input speech signal to generate a second speech text and a second confidence score indicating a level of accuracy of the second speech text;
  
  a computerized decision module coupled to the first speech recognizer and the second speech recognizer for selecting either the first speech text or the second speech text as an output speech text, whereinthe decision module receives external data selected from a group consisting of location information of a speaker of the input speech signal, the accent of the speaker, and the identity of the speaker;
  
  the decision module adjusts the first confidence score to generate a first adjusted confidence score based upon the external data; and
  
  the decision module selects the first speech text if the first adjusted confidence score is higher than the second confidence score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The speech recognition system of claim 1, wherein the first adjusted confidence score is generated by multiplying a first speech selection parameter with the first confidence score and adding a second speech selection parameter.
  - 3. The speech recognition system of claim 1, wherein:
    - the decision module also adjusts the second confidence score to generate a second adjusted confidence score, based upon the external data; and
      
      the decision module selects the first speech text if the first adjusted confidence score is higher than the second adjusted confidence score.
  - 4. The speech recognition system of claim 1, further comprising a natural interaction and dialog interpretation module coupled to the decision module for receiving the output speech text and discerning the meaning of the output speech text, and wherein:
    - the decision module receives control data indicating the context of the input speech signal from the natural interaction and dialog interpretation module; and
      
      the decision module adjusts the first confidence score to generate the first adjusted confidence score based upon the control data.
  - 5. The speech recognition system of claim 1, further comprising a natural interaction and dialog interpretation module coupled to the decision module for receiving the output speech text and discerning the meaning of the output speech text, and wherein:
    - the decision module receives control data indicating the context of the input speech signal from the natural interaction and dialog interpretation module;
      
      the decision module adjusts the first confidence score and the second confidence score to generate the first adjusted confidence score and a second adjusted confidence score, respectively, based upon the control data; and
      
      the decision module selects the first speech text if the first adjusted confidence score is higher than the second adjusted confidence score.
  - 6. The speech recognition system of claim 1, further comprising a natural interaction and dialog interpretation module coupled to the decision module for receiving the output speech text and discerning the meaning of the output speech text, the natural interaction and dialog interpretation module generating a control signal for controlling a back-end application coupled to the speech recognition system based upon the discerned meaning of the output speech text.
  - 7. The speech recognition system of claim 1, wherein the first speech recognizer is a grammar-based speech recognizer and the second speech recognizer is a statistical speech recognizer.
  - 8. The speech recognition system of claim 1, wherein the first and second confidence scores are normalized to a common range.

9. A computer-implemented method of recognizing an input speech signal to generate an output speech text, the method comprising:
- recognizing the input speech signal using a first speech recognizer to generate a first speech text and a first confidence score indicating a level of accuracy of the first speech text;
  
  recognizing the input speech signal using a second speech recognizer to generate a second speech text and a second confidence score indicating a level of accuracy of the second speech text;
  
  receiving external data selected from a group consisting of location information of a speaker of the input speech signal, the accent of the speaker, and the identity of the speaker;
  
  adjusting the first confidence score to generate a first adjusted confidence score based upon the external data; and
  
  selecting the first speech text as the output speech text if the first adjusted confidence score is higher than the second confidence score.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 10. The method of claim 9, wherein adjusting the first confidence score comprises multiplying a first speech selection parameter with the first confidence score and adding a second speech selection parameter.
  - 11. The method of claim 9, wherein adjusting the first confidence score comprises:
    - determining a state corresponding to the received external data;
      
      selecting a first parameter and a second parameter corresponding to the determined state; and
      
      modifying the first confidence score by multiplying the first parameter with the first confidence score and adding the second parameter to generate the first adjusted confidence score.
  - 12. The method of claim 9, further comprising:
    - adjusting the second confidence score to generate a second adjusted confidence score based upon the external data; and
      
      selecting the first speech text if the first adjusted confidence score is higher than the second adjusted confidence score.
  - 13. The method of claim 9, further comprising:
    - receiving control data indicating the context of the input speech signal;
      
      adjusting the first confidence score to generate the first adjusted confidence score based upon the control data.
  - 14. The method of claim 13, wherein adjusting the first confidence score comprises:
    - determining a state corresponding to the received control data;
      
      selecting a first parameter and a second parameter corresponding to the determined state; and
      
      modifying the first confidence score by multiplying the first parameter with the first confidence score and adding the second parameter to generate the first adjusted confidence score.
  - 15. The method of claim 9, further comprising:
    - receiving control data indicating the context of the input speech signal;
      
      adjusting the first confidence score and the second confidence score to generate the first adjusted confidence score and a second adjusted confidence score, respectively, based upon the control data; and
      
      selecting the first speech text if the first adjusted confidence score is higher than the second adjusted confidence score.
  - 16. The method of claim 9, wherein the first speech recognizer is a grammar-based speech recognizer and the second speech recognizer is a statistical speech recognizer.
  - 17. The method of claim 9, further comprising rejecting both the first speech text and the second speech text as being unreliable if both the first confidence score and the second confidence score are below a threshold.
  - 18. The method of claim 9, further comprising selecting the first speech text as a default output speech text if the first confidence score and the second confidence score are same.
  - 19. The method of claim 9, wherein the first confidence score and the second confidence score are normalized to a common range.

20. A computerized decision module for use in a speech recognition system that recognizes an input speech signal to generate an output speech text by using a first speech recognizer and a second speech recognizer, the first speech recognizer recognizing the input speech signal to generate a first speech text and a first confidence score, and the second speech recognizer recognizing the input speech signal to generate a second speech text and a second confidence score, wherein:
- the computerized decision module is coupled to the first speech recognizer and the second speech recognizer to select either the first speech text or the second speech text as the output speech text;
  
  the decision module receives external data selected from a group consisting of location information of a speaker of the input speech signal, the accent of the speaker, and the identity of the speaker;
  
  the decision module adjusts the first confidence score to generate a first adjusted confidence score based upon the external data; and
  
  the decision module selects the first speech text if the first adjusted confidence score is higher than the second confidence score.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The decision module of claim 20, wherein the first adjusted confidence score is generated by multiplying a first speech selection parameter with the first confidence score and adding a second speech selection parameter.
  - 22. The decision module of claim 20, wherein:
    - the decision module also adjusts the second confidence score to generate a second adjusted confidence score, based upon the external data; and
      
      the decision module selects the first speech text if the first adjusted confidence score is higher than the second adjusted confidence score.
  - 23. The decision module of claim 20, wherein:
    - the decision module is further coupled to a natural interaction and dialog interpretation module that receives the output speech text and discerns the meaning of the output speech text, the decision module receiving control data indicating the context of the input speech signal from the natural interaction and dialog interpretation module; and
      
      the decision module adjusts the first confidence score to generate the first adjusted confidence score based upon the control data.
  - 24. The decision module of claim 20, wherein:
    - the decision module is further coupled to a natural interaction and dialog interpretation module that receives the output speech text and discerns the meaning of the output speech text, the decision module receiving control data indicating the context of the input speech signal from the natural interaction and dialog interpretation module;
      
      the decision module adjusts the first confidence score and the second confidence score to generate the first adjusted confidence score and a second adjusted confidence score, respectively, based upon the control data; and
      
      the decision module selects the first speech text if the first adjusted confidence score is higher than the second adjusted confidence score.
  - 25. The decision module of claim 20, wherein the first speech recognizer is a grammar-based speech recognizer and the second speech recognizer is a statistical speech recognizer.
  - 26. The decision module of claim 20, wherein the first and second confidence scores are normalized to a common range.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
iAnywhere Solutions Incorporated (SAP SE), Toyota Infotechnology Center Co., Ltd. (Toyota Motor Corporation)
Original Assignee
iAnywhere Solutions Incorporated (SAP SE), Toyota Infotechnology Center Co., Ltd. (Toyota Motor Corporation)
Inventors
Endo, Norikazu, Hodjat, Babak, Funaki, Masahiko, Brookes, John R., Reaves, Benjamin K.
Primary Examiner(s)
Chawan; Vijay

Application Number

US10/341,873
Time in Patent Office

1,604 Days
Field of Search

704/235, 704/255, 704/256, 704/240, 704/231, 704/251, 704/260
US Class Current

704/235
CPC Class Codes

G10L 15/32 Multiple recognisers used i...

Speech recognition system having multiple speech recognizers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system having multiple speech recognizers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links