Apparatus and methods for identifying homophones among words in a speech recognition system

US 6,269,335 B1
Filed: 08/14/1998
Issued: 07/31/2001
Est. Priority Date: 08/14/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine, the method comprising the steps of:

decoding the uttered word using the speech recognition engine to yield a decoded word;

computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the other existing words;

identifying the other existing words, associated with measures which correspond to a threshold range, as homophones of the uttered word; and

outputting the identified homophones, wherein the user can select an identified homophone that corresponds to the word uttered by the user.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user uttering the word; decoding the uttered word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; if at least one measure is within a threshold range, indicating, to the user, results associated with the at least one measure, the results preferably including the decoded word and the other existing vocabulary word associated with the at least one measure; and the user preferably making a selection depending on the word the user intended to utter.

102 Citations

View as Search Results

39 Claims

1. A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine, the method comprising the steps of:
- decoding the uttered word using the speech recognition engine to yield a decoded word;
  
  computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the other existing words;
  
  identifying the other existing words, associated with measures which correspond to a threshold range, as homophones of the uttered word; and
  
  outputting the identified homophones, wherein the user can select an identified homophone that corresponds to the word uttered by the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the respective distance measures are calculated via a Kuhlback-Liebler distance metric.
  - 3. The method of claim 1, wherein if at least one measure is not within the threshold range, providing the capability for the user to confirm that the decoded word is the uttered word.
  - 4. The method of claim 1, wherein the outputting step comprises displaying the homophones to the user during a real-time decoding session.
  - 5. The method of claim 1, wherein the outputting step comprises speech synthesizing the homophones for playback to the user.
  - 6. The method of claim 1, further comprising the step of adding the homophones to an N-best list generated during the decoding step to form an augmented N-best list.
  - 7. The method of claim 6, further comprising the step of performing a second decoding step using the augmented N-best list to determine a word with the highest likelihood of being the word uttered by the user.
  - 8. The method of claim 7, wherein the second decoding step includes an acoustic re-scoring step.
  - 9. The method of claim 7, wherein the second decoding step includes a language model re-scoring step.
  - 10. The method of claim 6, further comprising the step of indicating, to the user, the augmented N-best list during a correction session.
  - 11. The method of claim 1, further comprising the step of indicating, to the user, the identified homophones during a correction session.
  - 12. The method of claim 1, wherein the step of computing respective measures further comprises the steps of:
13. The method of claim 12, wherein the leaf sequence comparison step further comprises performing a best match alignment process between leaf sequences of unequal phonetic length.

14. Computer-based apparatus for identifying homophones of a word uttered by a user from at least a portion of a vocabulary associated with a speech recognition system, the speech recognition system includes a speech input processor for receiving the uttered word and a speech recognition engine for decoding the uttered word to generate a decoded word, the apparatus comprising:
- a processor, operatively coupled to the speech recognition engine, for computing respective measures between the decoded word output from the speech recognition engine and the at least a portion of other existing vocabulary words, wherein the respective measures are indicative of acoustic similarity between the decoded word and the at least a portion of other existing vocabulary words, and wherein the processor identifies the other existing words, associated with measures which correspond to a threshold range, as homophones of the uttered word; and
  
  an output device for presenting the homophones identified by the processor to the user, wherein the user can select an identified homophone that corresponds to the word uttered by the user.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The apparatus of claim 14, wherein the output device comprises a display and the processor causes display of the identified homophones to the user on the display during a real-time decoding session.
  - 16. The apparatus of claim 14, wherein the output device comprises a text-to-speech system and the processor causes speech synthesis of the identified homophones for playback to the user via the text-to-speech system.
  - 17. The apparatus of claim 14, wherein the processor causes the addition of the homophones to an N-best list generated during the decoding to form an augmented N-best list.
  - 18. The apparatus of claim 17, wherein the processor causes a second decoding pass to be performed using the augmented N-best list to determine a word with the highest likelihood of being the word uttered by the user.
  - 19. The apparatus of claim 18, wherein the second decoding pass includes an acoustic re-scoring step.
  - 20. The apparatus of claim 18, wherein the second decoding pass includes a language model re-scoring step.
  - 21. The apparatus of claim 17, wherein the processor causes indication in accordance with the output device, to the user, of the augmented N-best list during a correction session.
  - 22. The apparatus of claim 14, wherein the processor causes indication in accordance with the output device, to the user, of the identified homophones during a correction session.
  - 23. The apparatus of claim 19, wherein the processor further performs the steps of:
24. The apparatus of claim 23, wherein the processor further performs a best match alignment process between leaf sequences of unequal phonetic length.
25. The apparatus of claim 17, wherein the processor calculates the respective distance measures via a Kuhlback-Liebler distance metric.
26. The apparatus of claim 17, wherein if at least one measure is not within the threshold range, the user confirming, via the input device, that the decoded word is the uttered word.

27. A program storage device, readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine, the method steps comprising:
- decoding the uttered word using the speech recognition engine to yield a decoded word;
  
  computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the other existing words;
  
  identifying the other existing words, associated with measures which correspond to a threshold range, as homophones of the uttered word; and
  
  outputting the identified homophones, wherein the user can select an identified homophone that corresponds to the word uttered by the user.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 28. The program storage device of claim 27, wherein the instructions for outputting comprise instructions for displaying the homophones to the user during a real-time decoding session.
  - 29. The program storage device of claim 27, wherein the instructions for outputting comprise instructions for speech synthesizing the homophones for playback to the user.
  - 30. The program storage device of claim 27, further comprising instructions for adding the homophones to an N-best list generated during the decoding step to form an augmented N-best list.
  - 31. The program storage device of claim 30, further comprising the step of performing a second decoding step using the augmented N-best list to determine a word with the highest likelihood of being the word uttered by the user.
  - 32. The program storage device of claim 31, wherein the instructions for the second decoding step comprise instructions for an acoustic re-scoring step.
  - 33. The program storage device of claim 31, wherein the instructions for the second decoding step comprise instructions for a language model re-scoring step.
  - 34. The program storage device of claim 30, further comprising instructions for indicating, to the user, the augmented N-best list during a correction session.
  - 35. The program storage device of claim 27, further comprising instructions for indicating, to the user, the identified homophones during a correction session.
  - 36. The program storage device of claim 27, wherein the instructions for computing respective measures comprise instructions for performing the steps of:
37. The program storage device of claim 36, wherein the instructions for the leaf sequence comparison step further comprise instructions for performing a best match alignment process between leaf sequences of unequal phonetic length.
38. The program storage device of claim 27, wherein the respective distance measures are calculated via a Kuhlback-Liebler distance metric.
39. The program storage device of claim 27, further comprising instructions for providing the capability for the user to confirm that the decoded word is the uttered words if at least one measure is not within the threshold range.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Monkowski, Michael Daniel, Ittycheriah, Abraham, Maes, Stephane Herman, Sorensen, Jeffrey Scott
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/134,261
Time in Patent Office

1,082 Days
Field of Search

704/270, 704/271, 704/272, 704/275, 704/278, 704/246, 704/268
US Class Current

704/270
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Apparatus and methods for identifying homophones among words in a speech recognition system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

102 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for identifying homophones among words in a speech recognition system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

102 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links