Methods and apparatus for forming compound words for use in a continuous speech recognition system

US 6,385,579 B1
Filed: 04/29/1999
Issued: 05/07/2002
Est. Priority Date: 04/29/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:

computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value; and

replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair. A fourth measure is based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair.

64 Citations

View as Search Results

41 Claims

1. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
- computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value; and
  
  replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising the steps of:
3. The method of claim 1, further, comprising the steps of:
- computing a third measure for an element set in the textual corpus for comparison to a threshold value, the third measure being based on a comparison of the number of times a co-articulated baseform for the set is preferred over a concatenation of non-co-articulated individual baseforms of the elements forming the set; and
  
  replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the third measure and the threshold value.
4. The method of claim 1, further comprising the steps of:
- computing a fourth measure for an element set in the textual corpus for comparison to a threshold value, the fourth measure being based on a difference between an average phone recognition score for a particular compound element and a sum of respective average phone recognition scores of the elements of the set; and
  
  replacing the element set in the textual corpus with the particular compound element depending on a result of the comparison between the fourth measure and the threshold value.
5. The method of claim 1, wherein the compound element is added to a language model vocabulary associated with the speech recognition system.
6. The method of claim 5, wherein a language model associated with the speech recognition system is recomputed based on the augmented textual corpus.
7. The method of claim 1, wherein at least one pronunciation variant of the compound element is added to an acoustic vocabulary associated with the speech recognition system.
8. The method of claim 7, wherein an acoustic model associated with the speech recognition system is retrained based on the augmented textual corpus.
9. The method of claim 1, wherein the element set is a pair of consecutive words in the textual corpus and the compound element is a compound word.
10. The method of claim 1, further comprising the steps of:
- computing an occurrence count for the element set; and
  
  comparing the occurrence count to a threshold value such that the measure is not computed for the set depending on a result of the occurrence count comparison.
11. The method of claim 1, wherein an output of the speech recognition system is a sequence of elements associated with the augmented textual corpus.
12. The method of claim 11, wherein the speech recognition system computes a score for a hypothesized sequence using a language model and an acoustic vocabulary augmented by one or more compound elements.
13. The method of claim 12, wherein the speech recognition system outputs the sequence with the highest score as the decoded sequence.

14. Apparatus for forming an augmented textual corpus associated with a speech recognition system, the apparatus comprising:
- a memory which stores a textual corpus;
  
  at least one processor operable to compute a measure for an element set in the textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value, and to replace the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The apparatus of claim 14, wherein the processor is further operable to compute a second measure for an element set in the textual corpus for comparison to a threshold value, the second measure being based on mutual information between elements in the set, and to replace the element set in the textual corpus with a compound element depending on a result of the comparison between the second measure and the threshold value.
  - 16. The apparatus of claim 14, wherein the processor is further operable to compute a third measure for an element set in the textual corpus for comparison to a threshold value, the third measure being based on a comparison of the number of times a co-articulated baseform for the set is preferred over a concatenation of non-co-articulated individual baseforms of the elements forming the set, and to replace the element set in the textual corpus with a compound element depending on a result of the comparison between the third measure and the threshold value.
  - 17. The apparatus of claim 14, wherein the processor is further operable to compute a fourth measure for an element set in the textual corpus for comparison to a threshold value, the fourth measure being based on a difference between an average phone recognition score for a particular compound element and a sum of respective average phone recognition scores of the elements of the set, and to replace the element set in the textual corpus with the particular compound element depending on a result of the comparison between the fourth measure and the threshold value.
  - 18. The apparatus of claim 14, wherein the compound element is added to a language model vocabulary associated with the speech recognition system.
  - 19. The apparatus of claim 18, wherein a language model associated with the speech recognition system is recomputed based on the augmented textual corpus.
  - 20. The apparatus of claim 14, wherein at least one pronunciation variant of the compound element is added to an acoustic vocabulary associated with the speech recognition system.
  - 21. The apparatus of claim 20, wherein an acoustic model associated with the speech recognition system is retrained based on the augmented textual corpus.
  - 22. The apparatus of claim 14, wherein the element set is a pair of consecutive words in the textual corpus and the compound element is a compound word.
  - 23. The apparatus of claim 14, wherein the processor is further operable to compute an occurrence count for the element set, and to compare the occurrence count to a threshold value such that the measure is not computed for the set depending on a result of the occurrence count comparison.
  - 24. The apparatus of claim 14, wherein an output of the speech recognition system is a sequence of elements associated with the augmented textual corpus.
  - 25. The apparatus of claim 24, wherein the speech recognition system computes a score for a hypothesized sequence using a language model and an acoustic vocabulary augmented by one or more compound elements.
  - 26. The apparatus of claim 25, wherein the speech recognition system outputs the sequence with the highest score as the decoded sequence.

27. An article of manufacture for forming an augmented textual corpus associated with a speech recognition system, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being an average of a direct n-gram probability value and a reverse n-gram probability value; and
  
  replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.

28. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
- computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being based on a comparison of the number of times a co-articulated baseform for the set is preferred over a concatenation of non-co-articulated individual baseforms of the elements forming the set; and
  
  replacing the element set in the textual corpus with a compound element depending on a result of the comparison between the measure and the threshold value.

29. A method of forming an augmented textual corpus associated with a speech recognition system, the method comprising the steps of:
- computing a measure for an element set in a textual corpus for comparison to a threshold value, the measure being based on a difference between an average phone recognition score for a particular compound element and a sum of respective average phone recognition scores of the elements of the set; and
  
  replacing the element set in the textual corpus with the particular compound element depending on a result of the comparison between the measure and the threshold value.

30. A method for use with a speech recognition system, the method comprising the steps of:
- computing a measure for a pair of consecutive words in a textual corpus for comparison to a threshold value, the measure being an average of a direct bigram probability value and a reverse bigram probability value; and
  
  replacing the pair in the textual corpus with a compound word depending on a result of the comparison between the measure and the threshold value.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
- - 31. The method of claim 30, further comprising the steps of:
32. The method of claim 30, further comprising the steps of:
- computing a third measure for a pair of consecutive words in the textual corpus for comparison to a threshold value, the third measure being based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair; and
  
  replacing the word pair in the textual corpus with a compound word depending on a result of the comparison between the third measure and the threshold value.
33. The method of claim 30, further comprising the steps of:
- computing a fourth measure for a pair of consecutive words in the textual corpus for comparison to a threshold value, the fourth measure being based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair; and
  
  replacing the word pair in the textual corpus with the particular compound word depending on a result of the comparison between the fourth measure and the threshold value.
34. The method of claim 30, wherein the compound word is added to a language model vocabulary associated with the speech recognition system.
35. The method of claim 34, wherein a language model associated with the-speech recognition system is recomputed based on the textual corpus including the compound word.
36. The method of claim 30, wherein at least one pronunciation variant of the compound word is added to an acoustic vocabulary associated with the speech recognition system.
37. The method of claim 36, wherein an acoustic model associated with the speech recognition system is retrained based on the textual corpus including the compound word.
38. The method of claim 30, further comprising the steps of:
- computing an occurrence count for the word pair; and
  
  comparing the occurrence count to a threshold value such that the measure is not computed for the pair depending on a result of the occurrence count comparison.
39. The method of claim 30, wherein an output of the speech recognition system is a sequence of words associated with the textual corpus which includes one or more compound words.
40. The method of claim 39, wherein the speech recognition system computes a score for a hypothesized sequence using a language model and an acoustic vocabulary augmented by one or more compound words.
41. The method of claim 40, wherein the speech recognition system outputs the sequence with the highest score as the decoded sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Padmanabhan, Mukund, Saon, George Andrei
Primary Examiner(s)
Knepper, David D.

Application Number

US09/302,032
Time in Patent Office

1,104 Days
Field of Search

704/243-245, 704/235, 704/255-257
US Class Current

704/243
CPC Class Codes

G10L 15/063 Training

G10L 15/197 Probabilistic grammars, e.g...

Methods and apparatus for forming compound words for use in a continuous speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

64 Citations

41 Claims

Specification

Use Cases

Quick Links

Others

Methods and apparatus for forming compound words for use in a continuous speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

41 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others