Error correction in speech recognition

US 20020138265A1
Filed: 05/02/2001
Published: 09/26/2002
Est. Priority Date: 05/02/2000
Status: Active Grant

First Claim

Patent Images

1. A method of correcting incorrect text associated with recognition errors in computer-implemented speech recognition, the method comprising:

performing speech recognition on an utterance to produce a recognition result for the utterance;

receiving a selection of a word from the recognized utterance, the selection indicating a bound of a portion of the recognized utterance to be corrected;

comparing a first alternative transcript to the recognized utterance to be corrected;

producing a first recognition correction based on the comparison;

comparing a second alternative transcript to the recognized utterance to be corrected;

producing a second recognition correction based on the second comparison; and

replacing a portion of the recognition result with one of the first recognition correction and the second recognition correction;

wherein a duration of the first recognition correction differs from a duration of the second recognition correction, and the portion of the recognition result replaced includes at one bound a word indicated by the selection and extends for the duration of the one of the first recognition correction and the second recognition correction with which the portion is replaced.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

New techniques and systems may be implemented to improve error correction in speech recognition. These new techniques and systems may be implemented to correct errors in speech recognition systems may be used in a standard desktop environment, in a mobile environment, or in any other type of environment that can receive and/or present recognized speech.

367 Citations

53 Claims

1. A method of correcting incorrect text associated with recognition errors in computer-implemented speech recognition, the method comprising:
- performing speech recognition on an utterance to produce a recognition result for the utterance;
  
  receiving a selection of a word from the recognized utterance, the selection indicating a bound of a portion of the recognized utterance to be corrected;
  
  comparing a first alternative transcript to the recognized utterance to be corrected;
  
  producing a first recognition correction based on the comparison;
  
  comparing a second alternative transcript to the recognized utterance to be corrected;
  
  producing a second recognition correction based on the second comparison; and
  
  replacing a portion of the recognition result with one of the first recognition correction and the second recognition correction;
  
  wherein a duration of the first recognition correction differs from a duration of the second recognition correction, and the portion of the recognition result replaced includes at one bound a word indicated by the selection and extends for the duration of the one of the first recognition correction and the second recognition correction with which the portion is replaced.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 2. The method of claim 1 wherein the selection indicates a beginning bound of a recognized utterance to be corrected.
  - 3. The method of claim 1 wherein the selection indicates a finishing bound of a recognized utterance to be corrected.
  - 4. The method of claim 1 wherein comparing an alternative transcript to the recognized utterance comprises:
    - selecting from the alternative transcript a test word that is not identical to the selected word and that begins at a time that is nearest a time at which the selected word begins; and
      
      searching in time, through the recognized utterance relative to the selected word, and through the alternative transcript relative to the test word, until finding a word common to the recognized utterance and the alternative transcript.
  - 5. The method of claim 4 wherein the common word begins at a time in the recognized utterance that is approximately equal to a time at which the common word begins in the alternative transcript.
  - 6. The method of claim 5 wherein producing a recognition correction comprises selecting a word string from the alternative transcript, the word string bound by the test word from the alternative transcript and by a word from the alternative transcript that is adjacent to the common word and between the test word and the common word.
  - 7. The method of claim 6 further comprising receiving a selection of one of the first recognition correction and the second recognition correction.
  - 8. The method of claim 6 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating a word adjacent to the test word as an alternative transcript word;
      
      designating a word adjacent to the selected word as an original transcript word; and
      
      comparing the original transcript word to the alternative transcript word.
  - 9. The method of claim 8 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating the original transcript word and the alternative transcript word as the common word if;
      
      the original transcript word is identical to the alternative transcript word, and a time at which the original transcript word begins is near a time at which the alternative transcript word begins.
  - 10. The method of claim 9 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating a word adjacent to the alternative transcript word in the alternative transcript as the alternative transcript word if;
      
      the original transcript word is not identical to the alternative transcript word, and a time at which the original transcript word begins is later than a time at which the alternative transcript word begins.
  - 11. The method of claim 9 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating a word adjacent to the alternative transcript word in the alternative transcript as the alternative transcript word if;
      
      the original transcript word is identical to the alternative transcript word, and a time at which the original transcript word begins is later than a time at which the alternative transcript word begins.
  - 12. The method of claim 9 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating a word adjacent to the original transcript word in the original transcript as the original transcript word if;
      
      the original transcript word is not identical to the alternative transcript word, and a time at which the original transcript word begins is earlier than a time at which the alternative transcript word begins.
  - 13. The method of claim 9 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating a word adjacent to the original transcript word in the original transcript as the original transcript word if;
      
      the original transcript word is identical to the alternative transcript word, and a time at which the original transcript word begins is earlier than a time at which the alternative transcript word begins.
  - 14. The method of claim 9 wherein searching in time through the recognized utterance and through the alternative transcript comprises:
    - designating a word adjacent to the original transcript word in the original transcript as the original transcript word and designating a word adjacent to the alternative transcript word in the alternative transcript as the alternative transcript word if;
      
      the original transcript word is not identical to the alternative transcript word, and a time at which the original transcript word begins is near a time at which the alternative transcript word begins.
  - 17. The method of claim 16, further comprising:
    - producing a recognition correction based on the comparison;
      
      replacing a portion of the recognition result with the recognition correction;
      
      wherein the portion of the recognition result replaced includes at one bound a word indicated by the selection and extends for the duration of the recognition correction with which the portion is replaced.
  - 19. The method of claim 18 in which generating the sequence of phonemes for the corrected text includes determining whether the corrected text is in the vocabulary.
  - 20. The method of claim 18 further comprising outputting the text document.
  - 21. The method of claim 18 in which a list of recognition candidates is associated with each recognized speech utterance.
  - 22. The method of claim 18 in which generating the sequence of phonemes for the corrected text comprises using a phonetic alphabet.
  - 23. The method of claim 18 further comprising generating the general confusability matrix using empirical data.
  - 24. The method of claim 23 in which the empirical data comprises information relating to a rate of confusion of phonemes for a preselected population.
  - 25. The method of claim 21 in which the empirical data comprises information relating to frequency characteristics of different phonemes.
  - 26. The method of claim 21 in which the empirical data comprises information acquired during an adaptive training of a user.
  - 27. The method of claim 18 in which searching the text document for the corrected text comprises searching the text document for the sequence of phonemes for the corrected text.
  - 28. The method of claim 18 in which searching the text document for the corrected text comprises searching the text document for a sequence of phonemes that is likely to be confused with the sequence of phonemes for the corrected text.
  - 29. The method of claim 18 in which searching the text document for the corrected text comprises scoring a portion of the text document and comparing the score of the portion to an empirically determined threshold value to determine whether the portion of the text document includes a word that is not in the vocabulary.
  - 30. The method of claim 29 further comprising outputting a result if it is determined that the portion of the text document includes a word that is not in the vocabulary.
  - 31. The method of claim 30 in which outputting the result comprises highlighting the portion of the text document.
  - 32. The method of claim 30 in which outputting the result comprises re-recognizing the portion of the text document.

15. A method of correcting incorrect text associated with recognition errors in computer-implemented speech recognition, the method comprising:
- performing speech recognition on an utterance to produce a recognition result for the utterance;
  
  receiving a selection of a word from the recognized utterance, the selection indicating a bound of a portion of the recognized utterance to be corrected;
  
  comparing an alternative transcript to the recognized utterance to be corrected, the comparing comprising;
  
  selecting from the alternative transcript a test word that begins at a time that is nearest a time at which the selected word begins; and
  
  searching in time, relative to the selected word, through the recognized utterance, and searching in time, relative to the test word, through the alternative transcript until finding a word common to the recognized utterance and the alternative transcript;
  
  producing a recognition correction based on the comparison; and
  
  replacing a portion of the recognition result with the recognition correction.

16. A method of correcting incorrect text associated with recognition errors in computer-implemented speech recognition, the method comprising:
- performing speech recognition on an utterance to produce a recognition result for the utterance;
  
  receiving a selection of a word from the recognized utterance, the selection indicating a bound of a portion of the recognized utterance to be corrected; and
  
  comparing an alternative transcript to the recognized utterance to be corrected, the comparing including;
  
  selecting from the alternative transcript a test word that is not identical to the selected word and that occurs at a time that is nearest a time at which the selected word occurs, and searching in time, through the recognized utterance relative to the selected word, and through the alternative transcript relative to the test word, until finding a word common to the recognized utterance and the alternative transcript.

18. A method of correcting incorrect text associated with recognition errors in computer-implemented speech recognition, the method comprising:
- receiving a text document formed by recognizing speech utterances using a vocabulary;
  
  receiving a general confusability matrix having one or more values each indicating a likelihood of confusion between a first phoneme and a second phoneme;
  
  receiving corrected text that corresponds to misrecognized text from the text document;
  
  generating a sequence of phonemes for the corrected text;
  
  aligning the generated sequence of phonemes with phonemes of the misrecognized text;
  
  adjusting one or more values of the general confusability matrix based on the alignment to form a specific confusability matrix; and
  
  searching the text document for additional instances of the corrected text using the specific confusability matrix.

33. A computer-implemented method for speech recognition, the method comprising:
- receiving dictated text;
  
  generating recognized speech based on the received dictated text, the generating comprising determining acoustic models for the dictated text that best match acoustic data for the dictated text;
  
  receiving an edited text of the recognized speech, the edited text indicating a replacement for a portion of the dictated text;
  
  determining an acoustic model for the edited text;
  
  determining whether to adapt acoustic models for the edited text based on the acoustic model for the edited text and the acoustic model for the dictated text portion.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53)
- - 34. The method of claim 33 further comprising calculating an acoustic model score based on a comparison between the acoustic model for the edited text and the acoustic data for the dictated text portion.
  - 35. The method of claim 34 in which determining whether to adapt acoustic models for the edited text is based on the calculated acoustic model score.
  - 36. The method of claim 35 in which determining whether to adapt acoustic models for the edited text comprises calculating an original acoustic model score based on a comparison between the acoustic model for the dictated text portion and the acoustic data for the dictated text portion.
  - 37. The method of claim 36 in which determining whether to adapt acoustic models for the edited text comprises calculating a difference between the acoustic model score and the original acoustic model score.
  - 38. The method of claim 37 in which determining whether to adapt acoustic models for the edited text comprises determining whether the difference is less than a predetermined value.
  - 39. The method of claim 38 in which determining whether to adapt acoustic models for the edited text comprises adapting acoustic models for the edited text if the difference is less than a predetermined value.
  - 40. The method of claim 38 in which determining whether to adapt acoustic models for the edited text comprises bypassing adapting acoustic models for the edited text if the difference is greater than or equal to a predetermined value.
  - 41. The method of claim 33 in which receiving the edited text of the recognized speech occurs during a recognition session in which the recognized speech is generated.
  - 42. The method of claim 33 in which receiving the edited text of the recognized speech occurs after a recognition session in which the recognized speech is generated.
  - 43. The method of claim 33 in which receiving the edited text of the recognized speech comprises receiving a selection of the portion of the dictated text.
  - 44. The method of claim 33 in which determining an acoustic model for the edited text comprises searching for the edited text in a vocabulary or a backup dictionary used to generate the recognized speech.
  - 45. The method of claim 33 in which determining an acoustic model for the edited text comprises selecting an acoustic model that best matches the edited text.
  - 47. The method of claim 46 further comprising generating a replacement result for the recognition result based on the correction.
  - 48. The method of claim 46 in which the constraint grammar includes a spelling portion and a dictation vocabulary portion.
  - 49. The method of claim 48 in which the spelling portion indicates that the first utterance from the user is a letter in an alphabet.
  - 50. The method of claim 48 in which the vocabulary portion indicates that the first utterance from the user is a word from the dictation vocabulary.
  - 51. The method of claim 48 in which the spelling portion indicates a frequency with which letters occur in a language model.
  - 52. The method of claim 48 in which the dictation vocabulary portion indicates a frequency with which words occur in a language model.
  - 53. The method of claim 48 further comprising introducing a biasing value between the spelling and the dictation vocabulary portions of the constraint grammar.

46. A computer-implemented method of speech recognition, the method comprising:
- performing speech recognition on an utterance to produce a recognition result for the utterance;
  
  receiving a selection of the recognition result;
  
  receiving a correction of the recognition result;
  
  performing speech recognition on the correction using a constraint grammar that permits spelling and pronunciation in parallel; and
  
  identifying whether the correction comprises a spelling or a pronunciation using the constraint grammar.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Stevens, Daniell, Roth, Robert, Newman, Michael J., Sturtevant, Dean, Abrahams, David, Gould, Joel M., Ingold, Charles E., Gold, Allan

Granted Patent

US 6,912,498 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/0631 Creating reference template...

Error correction in speech recognition

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

367 Citations

53 Claims

Specification

Solutions

Use Cases

Quick Links

Error correction in speech recognition

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

367 Citations

53 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links