Apparatus and methods for rejecting confusible words during training associated with a speech recognition system

US 6,192,337 B1
Filed: 08/14/1998
Issued: 02/20/2001
Est. Priority Date: 08/14/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the method comprising the steps of:

(a) a user uttering the at least one new word;

(b) computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words;

(c) if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and

(d) if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words comprises the steps of: a user uttering the at least one new word; computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.

Citations

35 Claims

1. A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the method comprising the steps of:
- (a) a user uttering the at least one new word;
  
  (b) computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words;
  
  (c) if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and
  
  (d) if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, further comprising the step of prompting the user to input an alternative word or additional information pertaining to the at least one new word.
  - 3. The method of claim 2, wherein the additional information pertaining to the at least one new word includes contextual information.
  - 4. The method of claim 2, wherein the additional information pertaining to the at least one new word includes an instruction by the user to temporarily exclude the existing word associated with a measure within the threshold range from the vocabulary when the at least one newly uttered word is uttered in a real-time decoding session.
  - 5. The method of claim 1, further comprising the step of indicating results associated with the at least one measure to the user.
  - 6. The method of claim 5, wherein the indicating step comprises displaying the results to the user.
  - 7. The method of claim 5, wherein the indicating step comprises speech synthesizing the results for playback to the user.
  - 8. The method of claim 5, wherein the indicating step further comprises the step of prompting the user to request an additional search.
  - 9. The method of claim 8, wherein the additional search includes increasing a beamwidth associated with a Viterbi algorithm performed during the search.
  - 10. The method of claim 1, wherein the step of computing respective measures further comprises the steps of:
11. The method of claim 10, wherein the leaf sequence comparison step further comprises performing a best match alignment process between leaf sequences.
12. The method of claim 10, wherein the respective distance measures are calculated via a Kuhlback-Liebler distance metric.
13. The method of claim 10, wherein the leaf sequence generating step also includes generating at least one additional leaf sequence representative of an alternate pronunciation of the newly uttered word.
14. The method of claim 13, wherein a Viterbi alignment is performed with the at least one additional leaf sequence and the first leaf sequence generated with respect to the newly uttered word.
15. The method of claim 14, wherein only additional leaf sequences resulting in acceptable scores are added to the vocabulary as alternate pronunciations.
16. The method of claim 1, wherein step (b) further comprises the step of performing an additional search, if at least one measure is within a threshold range, the additional search including increasing a beamwidth associated with a Viterbi algorithm performed during the search.
17. The method of claim 1, wherein step (a) further comprises the user uttering a first plurality of new words and a second plurality of new words and further wherein steps (b) through (d) are performed for each word such that words from the pluralities which are not acoustically confusing are added to the vocabulary while words from the pluralities which are acoustically confusing are rejected.

18. Computer-based apparatus for training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the apparatus comprising:
- an input device for receiving the at least one new word uttered by a user;
  
  a processor, operatively coupled to the input device, for computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; and
  
  if no measure is within the threshold range, the processor automatically adding the at least one newly uttered word to the vocabulary, and if at least one measure is within a threshold range, the processor refraining from automatically adding the at least one newly uttered word to the vocabulary.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 19. The apparatus of claim 18, wherein the processor prompts the user to input an alternative word or additional information pertaining to the at least one new word.
  - 20. The apparatus of claim 18, further comprising an output device for indicating results associated with the at least one measure to the user.
  - 21. The apparatus of claim 20, wherein the output device is a display and further wherein the processor causes display of the results to the user on the display.
  - 22. The apparatus of claim 20, wherein the output device is a text-to-speech system and further wherein the processor causes speech synthesis of the results for playback to the user via the text-to-speech system.
  - 23. The apparatus of claim 18, wherein the processor performs an additional search, if at least one measure is within a threshold range, the additional search including increasing a beamwidth associated with a Viterbi algorithm performed during the search.
  - 24. The apparatus of claim 18, wherein the additional information pertaining to the at least one newly uttered word includes contextual information.
  - 25. The apparatus of claim 18, wherein the additional information pertaining to the at least one newly uttered word includes an instruction by the user to temporarily exclude the existing word associated with a measure within the threshold range from the vocabulary when the at least one new word is uttered in a real-time decoding session.
  - 26. The apparatus of claim 18, wherein the input device receives a first plurality of new words and a second plurality of new words uttered by the user and further wherein the processor performs the computing, adding or refraining steps for each word such that words from the pluralities which are not acoustically confusing are added to the vocabulary while words from the pluralities which are acoustically confusing are rejected.
  - 27. The apparatus of claim 18, wherein the processor causes prompting of the user to request an additional search.
  - 28. The apparatus of claim 27, wherein the additional search includes increasing a beamwidth associated with a Viterbi algorithm performed during the search.
  - 29. The apparatus of claim 18, wherein the processor further performs the steps of:
30. The apparatus of claim 29, wherein the processor further performs a best match alignment process between leaf sequences.
31. The apparatus of claim 29, wherein the processor calculates the respective distance measures via a Kuhlback-Liebler distance metric.
32. The apparatus of claim 29, wherein the leaf sequence generating step also includes generating at least one additional leaf sequence representative of an alternate pronunciation of the newly uttered word.
33. The apparatus of claim 32, wherein a Viterbi alignment is performed with the at least one additional leaf sequence and the first leaf sequence generated with respect to the newly uttered word.
34. The apparatus of claim 33, wherein only additional leaf sequences resulting in acceptable scores are added to the vocabulary as alternate pronunciations.

35. Computer-based apparatus for training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the apparatus comprising:
- user input means for receiving at least one new word uttered by the user;
  
  computing means for computing respective measures between an acoustic model of the at least one newly uttered word and acoustic models of at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words;
  
  adding means for automatically adding the at least one newly uttered word to the vocabulary, if no measure is within the threshold range; and
  
  rejecting means for automatically rejecting the at least one newly uttered word, if at least one measure is within a threshold range.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Maes, Stephane H., Ittycheriah, Abraham
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Wieland, Susan

Application Number

US09/134,259
Time in Patent Office

921 Days
Field of Search

704/231, 704/232, 704/251, 704/255, 704/254, 704/238, 704/239
US Class Current

704/231
CPC Class Codes

G10L 15/063 Training

Apparatus and methods for rejecting confusible words during training associated with a speech recognition system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for rejecting confusible words during training associated with a speech recognition system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links