USING WORD CONFIDENCE SCORE, INSERTION AND SUBSTITUTION THRESHOLDS FOR SELECTED WORDS IN SPEECH RECOGNITION

US 20100106505A1
Filed: 10/24/2008
Published: 04/29/2010
Est. Priority Date: 10/24/2008
Status: Active Grant

First Claim

Patent Images

1. A method for recognizing speech in acoustic data, comprising:

generating at least one hypothetical word (HYP) in a decoder;

deriving a word confidence score (WCS) for each HYP; and

determining a modified hypothetical word (mHYP) for each HYP based on the HYP and the WCS for each HYP.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for improving the accuracy of a speech recognition system using word confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS'"'"'s value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP.

24 Citations

View as Search Results

20 Claims

1. A method for recognizing speech in acoustic data, comprising:
- generating at least one hypothetical word (HYP) in a decoder;
  
  deriving a word confidence score (WCS) for each HYP; and
  
  determining a modified hypothetical word (mHYP) for each HYP based on the HYP and the WCS for each HYP.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising developing a selected word list.
  - 3. The method of claim 2, further comprising determining an insertion threshold value for each word on the selected word list.
  - 4. The method of claim 3, further comprising determining a substitution threshold value for each word on the selected word list.
  - 5. The method of claim 4 wherein the substitution threshold value is greater than the insertion threshold value for each word on the selected word list.
  - 6. The method of claim 4, further comprising:
    - conducting a tuning phase on each word to provide an occurrence distribution in WCS for such situations as;
      
      word is correctly identified, word is substituted, and word is inserted,wherein the insertion and substitution threshold values are based at least in part on WCS occurrence distributions.
  - 7. The method of claim 6, wherein mHYP is equal to HYP when HYP is absent from the selected word list.
  - 8. The method of claim 6, wherein mHYP is equal to HYP when the WCS is greater than the HYP'"'"'s insertion threshold value and the WCS is greater than the HYP'"'"'s substitution threshold value.
  - 9. The method of claim 6, wherein mHYP is a null when HYP is on the selected word list and the WCS is less than the HYP'"'"'s insertion threshold value.
  - 10. The method of claim 6, wherein mHYP is a substituted HYP when HYP is on the selected word list, the WCS is less than the HYP'"'"'s substitution threshold value, and WCS is greater than the HYP'"'"'s insertion threshold value.
  - 11. The method of claim 10, wherein the substituted HYP is determined in the tuning phase and the substituted HYP is a frequently substituted word for HYP when a substitution error occurs.
  - 12. The method of claim 2, further comprising:
    - determining at least two substitution HYPs for at least one word on the selected word list;
      
      determining a substitution threshold for each substitution HYP;
      
      outputting mHYP as one of the substitution HYPs based on a comparison of WCS with the substitution thresholds.
  - 13. The method of claim 2, further comprising:
    - comparing a transcription of an audio file with resulting HYP words and determining an error rate wherein the selected word list comprises HYP words that have a high error rate.
  - 14. The method of claim 13, wherein the selected word list is based on the frequency of occurrence of the word, with words occurring more often being more likely to be on the selected word list and words occurring less often being less likely to be on the selected word list.

15. A method for recognizing speech in acoustic data, comprising:
- performing a tuning phase, the tuning phase further comprising;
  
  generating a series of hypothetical words (HYP) from a tuning audio data set in a decoder; and
  
  setting values of tunable parameters in the decoder to minimize a weighted total error rate.
- View Dependent Claims (16, 17)
- - 16. The method of claim 15, wherein the weighted total error is calculated according to an algorithm:
    - Wt Etotal=(λ
      
      sub*num_error_sub_word+λ
      
      ins*num_error_ins_word+λ
      
      del*num_error_del_word)/total_num_RefWord,where λ
      
      sub, λ
      
      ins, and λ
      
      del are weighting factors.
  - 17. The method of claim 16. wherein λ
    - del>
      
      λ
      
      sub>
      
      λ
      
      ins.

18. A system for recognizing speech in acoustic data, comprising:
- means for generating at least one hypothetical word (HYP) based on the acoustic data;
  
  means for determining a word confidence score (WCS) for each HYP; and
  
  evaluating means for outputting a modified hypothetical word (mHYP) for each HYP based on the HYP and the WCS.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, further comprising:
    - means for processing a tuning audio data set to obtain a series of hypothetical words (HYPs); and
      
      means for setting values of tunable parameters in the decoding means such that a weighted total error rate is minimized, wherein the weighted total error rate is;
      
      $Wt Etotal = (λ sub * num_error_sub_word + λ ins * num_error_ins_word + λ del * num_error_del_word) / total_num_RefWord, and$ $λ sub, λ ins, and λ del are weighting factors .$
  - 20. The system of claim 19 further comprising means for determining a selected word list and at least one of an insertion threshold value and a substitution threshold value based on the tuning audio data set for words on the selected word list wherein the evaluating means outputs mHYP based on the selected word list and the at least one of an insertion threshold value and a substitution threshold value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adacel, Inc. (Adacel Technologies Ltd.)
Original Assignee
Adacel, Inc. (Adacel Technologies Ltd.)
Inventors
Shu, Chang-Qing

Granted Patent

US 9,478,218 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/187   Phonemic context, e.g. pron...

G10L 25/51   for comparison or discrimin...

USING WORD CONFIDENCE SCORE, INSERTION AND SUBSTITUTION THRESHOLDS FOR SELECTED WORDS IN SPEECH RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

24 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

USING WORD CONFIDENCE SCORE, INSERTION AND SUBSTITUTION THRESHOLDS FOR SELECTED WORDS IN SPEECH RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others