Apparatus and methods for rejecting confusible words during training associated with a speech recognition system
First Claim
1. A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the method comprising the steps of:
- (a) a user uttering the at least one new word;
(b) computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words;
(c) if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and
(d) if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words comprises the steps of: a user uttering the at least one new word; computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.
-
Citations
35 Claims
-
1. A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the method comprising the steps of:
-
(a) a user uttering the at least one new word;
(b) computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words;
(c) if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and
(d) if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
(a) generating a leaf sequence for the at least one newly uttered word;
(b) comparing the leaf sequence for the at least one newly uttered word to respective leaf sequences associated with the at least a portion of existing words; and
(c) generating respective distance measures in response to the comparisons, the respective distance measures indicative of acoustic distances between the compared leaf sequences.
-
-
11. The method of claim 10, wherein the leaf sequence comparison step further comprises performing a best match alignment process between leaf sequences.
-
12. The method of claim 10, wherein the respective distance measures are calculated via a Kuhlback-Liebler distance metric.
-
13. The method of claim 10, wherein the leaf sequence generating step also includes generating at least one additional leaf sequence representative of an alternate pronunciation of the newly uttered word.
-
14. The method of claim 13, wherein a Viterbi alignment is performed with the at least one additional leaf sequence and the first leaf sequence generated with respect to the newly uttered word.
-
15. The method of claim 14, wherein only additional leaf sequences resulting in acceptable scores are added to the vocabulary as alternate pronunciations.
-
16. The method of claim 1, wherein step (b) further comprises the step of performing an additional search, if at least one measure is within a threshold range, the additional search including increasing a beamwidth associated with a Viterbi algorithm performed during the search.
-
17. The method of claim 1, wherein step (a) further comprises the user uttering a first plurality of new words and a second plurality of new words and further wherein steps (b) through (d) are performed for each word such that words from the pluralities which are not acoustically confusing are added to the vocabulary while words from the pluralities which are acoustically confusing are rejected.
-
18. Computer-based apparatus for training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the apparatus comprising:
-
an input device for receiving the at least one new word uttered by a user;
a processor, operatively coupled to the input device, for computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; and
if no measure is within the threshold range, the processor automatically adding the at least one newly uttered word to the vocabulary, and if at least one measure is within a threshold range, the processor refraining from automatically adding the at least one newly uttered word to the vocabulary. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
(a) generating a leaf sequence for the at least one newly uttered word;
(b) comparing the leaf sequence for the at least one newly uttered word to respective leaf sequences associated with the at least a portion of the existing words; and
(c) generating respective distance measures in response to the comparisons, the respective distance measures indicative of acoustic distances between the compared leaf sequences.
-
-
30. The apparatus of claim 29, wherein the processor further performs a best match alignment process between leaf sequences.
-
31. The apparatus of claim 29, wherein the processor calculates the respective distance measures via a Kuhlback-Liebler distance metric.
-
32. The apparatus of claim 29, wherein the leaf sequence generating step also includes generating at least one additional leaf sequence representative of an alternate pronunciation of the newly uttered word.
-
33. The apparatus of claim 32, wherein a Viterbi alignment is performed with the at least one additional leaf sequence and the first leaf sequence generated with respect to the newly uttered word.
-
34. The apparatus of claim 33, wherein only additional leaf sequences resulting in acceptable scores are added to the vocabulary as alternate pronunciations.
-
35. Computer-based apparatus for training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words, the apparatus comprising:
-
user input means for receiving at least one new word uttered by the user;
computing means for computing respective measures between an acoustic model of the at least one newly uttered word and acoustic models of at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words;
adding means for automatically adding the at least one newly uttered word to the vocabulary, if no measure is within the threshold range; and
rejecting means for automatically rejecting the at least one newly uttered word, if at least one measure is within a threshold range.
-
Specification