Methods and apparatus for forming compound words for use in a continuous speech recognition system
First Claim
1. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
- computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value; and
replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair. A fourth measure is based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair.
64 Citations
41 Claims
-
1. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
-
computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value; and
replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
computing a second measure for an element set in the textual corpus for comparison to a threshold value, the second measure being based on mutual information between elements in the set; and
replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the second measure and the threshold value.
-
-
3. The method of claim 1, further, comprising the steps of:
-
computing a third measure for an element set in the textual corpus for comparison to a threshold value, the third measure being based on a comparison of the number of times a co-articulated baseform for the set is preferred over a concatenation of non-co-articulated individual baseforms of the elements forming the set; and
replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the third measure and the threshold value.
-
-
4. The method of claim 1, further comprising the steps of:
-
computing a fourth measure for an element set in the textual corpus for comparison to a threshold value, the fourth measure being based on a difference between an average phone recognition score for a particular compound element and a sum of respective average phone recognition scores of the elements of the set; and
replacing the element set in the textual corpus with the particular compound element depending on a result of the comparison between the fourth measure and the threshold value.
-
-
5. The method of claim 1, wherein the compound element is added to a language model vocabulary associated with the speech recognition system.
-
6. The method of claim 5, wherein a language model associated with the speech recognition system is recomputed based on the augmented textual corpus.
-
7. The method of claim 1, wherein at least one pronunciation variant of the compound element is added to an acoustic vocabulary associated with the speech recognition system.
-
8. The method of claim 7, wherein an acoustic model associated with the speech recognition system is retrained based on the augmented textual corpus.
-
9. The method of claim 1, wherein the element set is a pair of consecutive words in the textual corpus and the compound element is a compound word.
-
10. The method of claim 1, further comprising the steps of:
-
computing an occurrence count for the element set; and
comparing the occurrence count to a threshold value such that the measure is not computed for the set depending on a result of the occurrence count comparison.
-
-
11. The method of claim 1, wherein an output of the speech recognition system is a sequence of elements associated with the augmented textual corpus.
-
12. The method of claim 11, wherein the speech recognition system computes a score for a hypothesized sequence using a language model and an acoustic vocabulary augmented by one or more compound elements.
-
13. The method of claim 12, wherein the speech recognition system outputs the sequence with the highest score as the decoded sequence.
-
14. Apparatus for forming an augmented textual corpus associated with a speech recognition system, the apparatus comprising:
-
a memory which stores a textual corpus;
at least one processor operable to compute a measure for an element set in the textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value, and to replace the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. An article of manufacture for forming an augmented textual corpus associated with a speech recognition system, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value; and
replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.
-
-
28. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
-
computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being based on a comparison of the number of times a co-articulated baseform for the set is preferred over a concatenation of non-co-articulated individual baseforms of the elements forming the set; and
replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.
-
-
29. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
-
computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being based on a difference between an average phone recognition score for a particular compound element and a sum of respective average phone recognition scores of the elements of the set; and
replacing the element set in the textual corpus with the particular compound element depending on a result of the comparison between the measure and the threshold value.
-
-
30. A method for use with a speech recognition system, the method comprising the steps of:
-
computing a measure for a pair of consecutive words in a textual corpus for comparison to a threshold value, the measure being an average of a direct bigram probability value and a reverse bigram probability value; and
replacing the pair in the textual corpus with a compound word depending on a result of the comparison between the measure and the threshold value. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
computing a second measure for a pair of consecutive words in the textual corpus for comparison to a threshold value, the second measure being based on mutual information between words in the pair; and
replacing the word pair in the textual corpus with a compound word depending on a result of the comparison between the second measure and the threshold value.
-
-
32. The method of claim 30, further comprising the steps of:
-
computing a third measure for a pair of consecutive words in the textual corpus for comparison to a threshold value, the third measure being based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair; and
replacing the word pair in the textual corpus with a compound word depending on a result of the comparison between the third measure and the threshold value.
-
-
33. The method of claim 30, further comprising the steps of:
-
computing a fourth measure for a pair of consecutive words in the textual corpus for comparison to a threshold value, the fourth measure being based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair; and
replacing the word pair in the textual corpus with the particular compound word depending on a result of the comparison between the fourth measure and the threshold value.
-
-
34. The method of claim 30, wherein the compound word is added to a language model vocabulary associated with the speech recognition system.
-
35. The method of claim 34, wherein a language model associated with the-speech recognition system is recomputed based on the textual corpus including the compound word.
-
36. The method of claim 30, wherein at least one pronunciation variant of the compound word is added to an acoustic vocabulary associated with the speech recognition system.
-
37. The method of claim 36, wherein an acoustic model associated with the speech recognition system is retrained based on the textual corpus including the compound word.
-
38. The method of claim 30, further comprising the steps of:
-
computing an occurrence count for the word pair; and
comparing the occurrence count to a threshold value such that the measure is not computed for the pair depending on a result of the occurrence count comparison.
-
-
39. The method of claim 30, wherein an output of the speech recognition system is a sequence of words associated with the textual corpus which includes one or more compound words.
-
40. The method of claim 39, wherein the speech recognition system computes a score for a hypothesized sequence using a language model and an acoustic vocabulary augmented by one or more compound words.
-
41. The method of claim 40, wherein the speech recognition system outputs the sequence with the highest score as the decoded sequence.
Specification