Speech processing system

US 7,337,116 B2
Filed: 11/05/2001
Issued: 02/26/2008
Est. Priority Date: 11/07/2000
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for generating a sequence of sub-word units representative of a new word to be added to a dictionary of a speech recognition system, the apparatus comprising:

receiving means for receiving signals representative of first and second spoken renditions of the new word;

speech recognition means for comparing the received first and second spoken renditions with pre-stored sub-word unit models to generate first and second sequences of sub-word units representative of said first and second spoken renditions of the new word respectively;

means for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;

first comparing means for comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;

second comparing means for comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set; and

means for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing means for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is provided for allowing a user to add word models to a speech recognition system. In particular, the system allows a user to input a number of renditions of the new word and which generates from these a sequence of phonemes representative of the new word. This representative sequence of phonemes is stored in a word to phoneme dictionary together with the typed version of the word for subsequent use by the speech recognition system.

167 Citations

78 Claims

1. An apparatus for generating a sequence of sub-word units representative of a new word to be added to a dictionary of a speech recognition system, the apparatus comprising:
- receiving means for receiving signals representative of first and second spoken renditions of the new word;
  
  speech recognition means for comparing the received first and second spoken renditions with pre-stored sub-word unit models to generate first and second sequences of sub-word units representative of said first and second spoken renditions of the new word respectively;
  
  means for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  first comparing means for comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  second comparing means for comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set; and
  
  means for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing means for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 2. An apparatus according to claim 1, wherein said determining means is operable to determine said sequence of sub-word units by determining, for each aligned pair of sub-word units, a sub-word unit that is confusingly similar to the first and second sub-word units of the aligned pair.
  - 3. An apparatus according to claim 1, further comprising:
    - means for combining the comparison scores obtained when comparing the first and second sequence sub-word units in the aligned pair with the same sub-word unit from the set, to generate a plurality of combined comparison scores; and
      
      third comparing means for comparing, for each aligned pair, the combined comparison scores generated by said combining means for the aligned pair,wherein said determining means is operable to determine, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon a comparison result output by said third comparing means for the aligned pair.
  - 4. An apparatus according to claim 3, wherein said first and second comparing means are operable to generate comparison scores which are indicative of a probability of confusing the corresponding sub-word unit taken from the set of predetermined sub-word units as the sub-word unit in the aligned pair.
  - 5. An apparatus according to claim 4, wherein said combining means is operable to combine the comparison scores in order to multiply the probabilities of confusing the corresponding sub-word unit taken from the set as the sub-word units in the aligned pair.
  - 6. An apparatus according to claim 5, wherein each of said sub-word units in said set of predetermined sub-word units has a predetermined probability of occurring within a sequence of sub-word units and wherein said combining means is operable to weight each of said combined comparison scores in dependence upon the respective probability of occurrence for the sub-word unit of the set used to generate the combined comparison score.
  - 7. An apparatus according to claim 6, wherein said combining means is operable to combine said comparison scores by calculating:
    - P(d¹_i|p_r)P(d²_j|p_r)P(p_r)where d¹_iand d²_jare an aligned pair of first and second sequence sub-word units respectively;
      
      P(d¹_i|p_r) is the comparison score generated by said first comparing means and is representative of the probability of confusing set sub-word unit p_ras first sequence sub-word unit d²_i;
      
      P(d²_j|p_r) is the comparison score generated by said second comparing means and is representative of the probability of confusing set sub-word unit Pr as second sequence sub-word unit d²_j; and
      
      P(p_r) is a weight which represents the probability of set sub-word unit p_roccurring in a sequence of sub-word units.
  - 8. An apparatus according to claim 7, wherein said third comparing means is operable to identify the set sub-word unit which gives the maximum combined comparison score and wherein said determining means is operable to determine said sub-word unit representative of the sub-word units in the aligned pair as being the sub-word unit which provides the maximum combined comparison score.
  - 9. An apparatus according to claim 5, wherein said comparison scores represent log probabilities and wherein said combining means is operable to multiply said probabilities by adding the respective comparison scores.
  - 10. An apparatus according to claim 3, wherein each of the sub-word units in said first and second sequences of sub-word units belong to said set of predetermined sub-word units and wherein said first and second comparing means are operable to generate said comparison scores using predetermined data which relate the sub-word units in said set to each other.
  - 11. An apparatus according to claim 10, wherein said predetermined data comprises, for each sub-word unit in the set of sub-word units, a probability for confusing that sub-word unit with each of the other sub-word units in the set of sub-word units.
  - 12. An apparatus according to claim 1, wherein said first and second comparing means are operable to compare the first sequence sub-word unit and the second sequence sub-word unit respectively with each of the sub-word units in said set of sub-word units.
  - 13. An apparatus according to claim 1, wherein said aligning means comprises dynamic programming means for aligning said first and second sequences of sub-word units using a dynamic programming technique.
  - 14. An apparatus according to claim 13, wherein said dynamic programming means is operable to determine an optimum alignment between said first and second sequences of sub-word units.
  - 15. An apparatus according to claim 1, wherein each of said sub-word units represents a phoneme.
  - 16. An apparatus according to claim 1, wherein said receiving means is operable to receive signals representative of a third spoken rendition of the new word, wherein said recognition means is operable to compare the third rendition of the new word with said pre-stored sub-word unit models to generate a third sequence of sub-word units representative of said third rendition of the new word, wherein said aligning means is operable to align simultaneously the sub-word units of the first, second and third sequences of sub-word units to generate a number of aligned groups of sub-word units, each aligned group comprising a sub-word unit from each of the renditions, and wherein said determining means is operable to determine said representative sequence of sub-word units in dependence upon the aligned groups of sub-word units.
  - 17. An apparatus according to claim 1, wherein said receiving means is operable to receive signals representative of a third spoken rendition of the new word, wherein said recognition means is operable to compare the third rendition of the new word with said pre-stored sub-word unit models to generate a third sequence of sub-word units representative of said third rendition of the new word and wherein said aligning means is operable to align two sequences of sub-word units at a time.
  - 18. An apparatus according to claim 1, wherein said receiving means is operable to receive signals representative of a plurality of spoken renditions of the new word, wherein said speech recognition means is operable to compare the received spoken renditions with pre-stored sub-word unit models to generate a sequence of sub-word units for each of the plurality of spoken renditions, wherein said aligning means is operable to align the sub-word units of the plurality of sequences of sub-word units to form a number of aligned groups of sub-word units, each group including a sub-word unit from each sequence;
    - wherein said determining means is operable to determine a sequence of sub-word units representative of the spoken renditions of the new word;
      
      wherein the apparatus further comprises (i) means for comparing each sequence of sub-word units with said representative sequence of sub-word units to determine a score representative of the similarity therebetween; and
      
      (ii) means for processing the scores output by the comparing means to identify clusters within the scores indicating one or more different pronunciations of the spoken rendition of the new word; and
      
      wherein said determining means is operable to determine a sequence of sub-word units representative of the spoken renditions of the new word within each cluster.
  - 19. An apparatus according to claim 18, wherein said comparing means, processing means and determining means operable iteratively until a predetermined convergence criterion is met.
  - 20. An apparatus according to claim 18, further comprising means for combining the sequences of sub-word units for each of the clusters into a sub-word unit lattice.
  - 21. An apparatus according to claim 1, wherein said generated sequence of sub-word units is representative of a new command to be added to a command dictionary of said speech recognition system.
  - 22. An apparatus according to claim 1, wherein said generated sequence of sub-word units is representative of a new word to be added to a word dictionary of a speech recognition system together with a text rendition of the new word.
  - 23. An apparatus according to claim 1, wherein said generated sequence of sub-word units is representative of a new word to be added to a word dictionary of a speech recognition system together with a text rendition of the new word.

24. An apparatus for adding a new word and sub-word representation of the new word to a word dictionary of a speech recognition system, the apparatus comprising:
- means for receiving a first sequence of sub-word units representative of a first spoken rendition of the new word and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word;
  
  means for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  first comparing means for comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  second comparing means for comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  means for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing means for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word; and
  
  means for adding the new word and the representative sequence of sub-word units to said word dictionary.

25. A speech recognition system comprising:
- means for receiving speech signals to be recognised;
  
  means for storing sub-word unit models;
  
  means for matching received speech with the sub-word unit models to generate one or more sequences of sub-word units representative of the received speech signals;
  
  a word dictionary relating sequences of sub-word units to words;
  
  a word decoder for processing the one or more sequences of sub-word units output by said matching means using the word dictionary to generate one or more words corresponding to the received speech signals;
  
  an apparatus for adding a new word and a sub-word representation of the new word to the word dictionary; and
  
  mean for controllably connecting the output of said matching means to either said word decoder or said apparatus for adding the new word and a sub-word representation of the new word to the word dictionary;
  
  characterised in that said apparatus for adding the new word and a sub-word representation of the new word to the word dictionary comprises;
  
  means for receiving a first sequence of sub-word units representative of a first spoken rendition of the new word output by said comparing means and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word output by said comparing means;
  
  means for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  first comparing means for comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  second comparing means for comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  means for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing means for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word;
  
  means for receiving a text rendition of the new word; and
  
  means for adding said text rendition of the new word and the representative sequence of sub-word units to said word dictionary.

26. A method of generating a sequence of sub-word units representative of a new word to be added to a dictionary of a speech recognition system, the method comprising:
- receiving signals representative of first and second spoken renditions of the new word;
  
  comparing the received first and second spoken renditions with pre-stored sub-word unit models to generate a first sequence of sub-word units representative of said first spoken rendition of the new word and a second sequence of sub-word units representative of said second spoken rendition of the new word;
  
  aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set; and
  
  determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
- - 27. A method according to claim 26, wherein said determining step determines said sequence of sub-word units by determining, for each aligned pair of sub-word units, a sub-word unit that is confusingly similar to the first and second sub-word units of the aligned pair.
  - 28. A method according to claim 26, further comprising:
    - combining the comparison scores obtained when comparing the first and second sequence sub-word units in the aligned pair with the same sub-word unit from the set, to generate a plurality of combined comparison scores; and
      
      a third comparing step of comparing, for each aligned pair, the combined comparison scores generated by said combining step for the aligned pair;
      
      wherein said determining step determines, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon a comparison result output by said third comparing step for the aligned pair.
  - 29. A method according to claim 28, wherein said first and second comparing steps generate comparison scores which are indicative of a probability of confusing the corresponding sub-word unit taken from the set of predetermined sub-word units as the sub-word unit in the aligned pair.
  - 30. A method according to claim 29, wherein said combining step combines the comparison scores in order to multiply the probabilities of confusing the corresponding sub-word unit taken from the set as the sub-word units in the aligned pair.
  - 31. A method according to claim 30, wherein each of said sub-word units in said set of predetermined sub-word units has a predetermined probability of occurring within a sequence of sub-word units and wherein said combining step weights each of said combined comparison scores in dependence upon the respective probability of occurrence for the sub-word unit of the set used to generate the combined comparison score.
  - 32. A method according to claim 31, wherein said combining step combines said comparison scores by calculating:
    - P(d¹_i|p_r)P(d²_j|p_r)P(p_r)where d¹_iand d²_jare an aligned pair of first and second sequence sub-word units respectively;
      
      P(d¹_i|p_r) is the comparison score generated by said first comparing step and is representative of the probability of confusing set sub-word unit p_ras first sequence sub-word unit d²_i;
      
      P(d²_j|p_r) is the comparison score generated by said second comparing step and is representative of the probability of confusing set sub-word unit p_ras second sequence sub-word unit d²_j; and
      
      P(p_r) is a weight which represents the probability of set sub-word unit p_roccurring in a sequence of sub-word units.
  - 33. A method according to claim 32, wherein said third comparing step identifies the set sub-word unit which gives the maximum combined comparison score and wherein said determining step determines said sub-word unit representative of the sub-word units in the aligned pair as being the sub-word unit which provides the maximum combined comparison score.
  - 34. A method according to claim 30, wherein said comparison scores represent log probabilities and wherein said combining step multiplies said probabilities by adding the respective comparison scores.
  - 35. A method according to claim 28, wherein each of the sub-word units in said first and second sequences of sub-word units belong to said set of predetermined sub-word units and wherein said first and second comparing steps generate said comparison scores using predetermined data which relate the sub-word units in said set to each other.
  - 36. A method according to claim 35, wherein said predetermined data comprises, for each sub-word unit in the set of sub-word units, a probability for confusing that sub-word unit with each of the other sub-word units in the set of sub-word units.
  - 37. A method according to claim 26, wherein said first and second comparing steps compare the first sequence sub-word unit and the second sequence sub-word unit respectively with each of the sub-word units in said set of sub-word units.
  - 38. A method according to claim 26, wherein said aligning step uses a dynamic programming technique to align said first and second sequences of sub-word units.
  - 39. A method according to claim 38, wherein said dynamic programming technique determines an optimum alignment between said first and second sequences of sub-word units.
  - 40. A method according to claim 26, wherein each of said sub-word units represents a phoneme.
  - 41. A method according to claim 26, wherein said receiving step receives signals representative of a third spoken rendition of the new word, wherein said comparing step compares the third rendition of the new word with said pre-stored sub-word unit models to generate a third sequence of sub-word units representative of said third rendition of the new word, wherein said aligning step simultaneously aligns the sub-word units of the first, second and third sequences of sub-word units to generate a number of aligned groups of sub-word units, each aligned group comprising a sub-word unit from each of the renditions, and wherein said determining step determines said representative sequence of sub-word units in dependence upon the aligned groups of sub-word units.
  - 42. A method according to claim 26, wherein said receiving step receives signals representative of a third spoken rendition of the new word, wherein said comparing step compares the third rendition of the new word with said pre-stored sub-word unit models to generate a third sequence of sub-word units representative of said third rendition of the new word and wherein said aligning step aligns two sequences of sub-word units at a time.
  - 43. A method according to claim 26, wherein said receiving step receives signals representative of a plurality of spoken renditions of the new word, wherein said comparing step compares the received spoken renditions with pre-stored sub-word unit models to generate a sequence of sub-word units for each of the plurality of spoken renditions, wherein said aligning step aligns the sub-word units of the plurality of sequences of. sub-word units to form a number of aligned groups of sub-word units, each group including a sub-word unit from each sequence;
    - wherein said determining step determines a sequence of sub-word units representative of the spoken renditions of the new word;
      
      wherein the method further comprises the steps of;
      
      (i) comparing each sequence of sub-word units with said representative sequence of sub-word units to determine a score representative of the similarity therebetween; and
      
      (ii) processing the scores output by the comparing step to identify clusters within the scores indicating one or more different pronunciations of the spoken rendition of the new word; and
      
      wherein said determining step determines a sequence of sub-word units representative of the spoken renditions of the new word within each cluster.
  - 44. A method according to claim 43, wherein said comparing step, processing step and determining step operate iteratively until a predetermined convergence criterion is met.
  - 45. A method according to claim 43, further comprising means for combining the sequences of sub-word units for each of the clusters into a sub-word unit lattice.
  - 46. A method according to claim 26, wherein said generated sequence of sub-word units are representative of a new word to be added to a word dictionary of a speech recognition system.
  - 47. A method according to claim 26, wherein said generated sequence of sub-word units are representative of a new word to be added to a command dictionary of a speech recognition system.

48. A method of adding a new word and sub-word representation of the new word to a word dictionary of a speech recognition system, the method comprising the steps of:
- receiving a first sequence of sub-word units representative of a first spoken rendition of the new word and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word;
  
  aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing step for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word; and
  
  adding the new word and the representative sequence of sub-word units to said word dictionary.

49. A speech recognition method comprising the steps of:
- receiving speech signals to be recognised;
  
  storing sub-word unit models;
  
  matching received speech signals with the sub-word unit models to generate one or more sequences of sub-word units representative of the received speech signals;
  
  storing a word dictionary relating sequences of sub-word units to words;
  
  processing the one or more sequences of sub-word units output by said matching step using the stored word dictionary to generate one or more words corresponding to the received speech signals;
  
  a step of adding a new word and a sub-word representation of the new word to the word dictionary; and
  
  controllably feeding the output of said matching step to either said processing step or said adding step;
  
  characterised in that said adding step comprises;
  
  receiving a first sequence of sub-word units representative of a first spoken rendition of the new word output by said comparing step and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word output by said comparing step;
  
  aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word;
  
  receiving a text rendition of the new word; and
  
  adding said text rendition of the new word and the representative sequence of sub-word units to said word dictionary.

50. A storage medium storing processor implementable instructions for controlling a processor to carry out a method of generating a sequence of sub-word units representative of a new word to be added to a dictionary of a speech recognition system, the processor instructions comprising:
- receiving instructions for receiving signals representative of first and second spoken renditions of the new word;
  
  instructions for comparing the received first and second spoken renditions with pre-stored sub-word unit models to generate a first sequence of sub-word units representative of said first spoken rendition of the new word and a second sequence of sub-word units representative of said second spoken rendition of the new word;
  
  instructions for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  instructions for a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set; and
  
  instructions for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word.

51. A storage medium storing processor implementable instructions for controlling a processor to carry out a method of adding a new word and sub-word representation of the new word to a word dictionary of a speech recognition system, the process instructions comprising:
- instructions for receiving a first sequence of sub-word units representative of a first spoken rendition of the new word and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word;
  
  instructions for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  instructions for a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word; and
  
  instructions for adding the new word and the representative sequence of sub-word units to said word dictionary.

52. A storage medium storing processor implementable instructions for controlling a processor to carry out a speech recognition method, the process instructions comprising:
- instructions for receiving speech signals to be recognised;
  
  instructions for storing sub-word unit models;
  
  instructions for matching received speech signals with the sub-word unit models to generate one or more sequences of sub-word units representative of the received speech signals;
  
  instructions for storing a word dictionary relating sequences of sub-word units to words;
  
  instructions for processing the one or more sequences of sub-word units output by said matching step using the stored word dictionary to generate one or more words corresponding to the received speech signals;
  
  instructions for adding a new word and a sub-word representation of the new word to the word dictionary; and
  
  instructions for controllably feeding the output of said matching step to either said processing step or said adding step;
  
  characterised in that said adding instructions comprise;
  
  instructions for receiving a first sequence of sub-word units representative of a first spoken rendition of the new word output by said comparing step and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word output by said matching step;
  
  instructions for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  instructions for a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word;
  
  instructions for receiving a text rendition of the new word; and
  
  instructions for adding said text rendition of the new word and the representative sequence of sub-word units to said word dictionary.

53. Processor implementable instructions for controlling a processor to carry out a method of generating a sequence of sub-word units representative of a new word to be added to a dictionary of a speech recognition system, the processor instructions comprising:
- instructions for receiving signals representative of first and second spoken renditions of the new word;
  
  instructions for matching the received first and second spoken renditions with pre-stored sub-word unit models to generate a first sequence of sub-word units representative of said first spoken rendition of the new word and a second sequence of sub-word units representative of said second spoken rendition of the new word;
  
  instructions for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  instructions for a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set; and
  
  instructions for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word.

54. Processor implementable instructions for controlling a processor to carry out a method of adding a new word and sub-word representation of the new word to a word dictionary of a speech recognition system, the processor instructions composing:
- instructions for receiving a first sequence of sub-word units representative of a first spoken rendition of the new word and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word;
  
  instructions for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  instructions for a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word; and
  
  instructions for adding the new word and the representative sequence of sub-word units to said word dictionary.

55. Processor implementable instructions for controlling a processor to carry out a speech recognition method, the processor instructions comprising:
- instructions for receiving speech signals to be recognised;
  
  instructions for storing sub-word unit models;
  
  instructions for matching received speech signals with the sub-word unit models to generate one or more sequences of sub-word units representative of the received speech signals;
  
  instructions for storing a word dictionary relating sequences of sub-word units to words;
  
  instructions for processing the one or more sequences of sub-word units output by said matching step using the stored word dictionary to generate one or more words corresponding to the received speech signals;
  
  instructions for adding a new word and a sub-word representation of the new word to the word dictionary; and
  
  instructions for controllably feeding the output of said matching step to either said processing step or said adding step;
  
  characterised in that said adding instructions comprise;
  
  instructions for receiving a first sequence of sub-word units representative of a first spoken rendition of the new word output by said matching step and for receiving a second sequence of sub-word units representative of a second spoken rendition of the new word output by said matching step;
  
  instructions for aligning sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  instructions for a first comparing step of comparing, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for a second comparing step of comparing, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  instructions for determining, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing steps for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word;
  
  instructions for receiving a text rendition of the new word; and
  
  instructions for adding said text rendition of the new word and the representative sequence of sub-word units to said word dictionary.

56. An apparatus for generating a sequence of sub-word units representative of a new word to be added to a dictionary of a speech recognition system, the apparatus comprising:
- a receiver operable to receive signals representative of first and second spoken renditions of the new word;
  
  a speech recogniser operable to compare the received first and second spoken renditions with pre-stored sub-word unit models to generate first and second sequence of sub-word units representative of said first and second spoken renditions of the new word respectively;
  
  a sub-word unit aligner operable to align sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  a first comparator operable to compare, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  a second comparator operable to compare, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set; and
  
  a determiner operable to determine, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparing means for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word.
- View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76)
- - 57. An apparatus according to claim 56, wherein said determiner is operable to determine said sequence of sub-word units by determining, for each aligned pair of sub-word units, a sub-word unit that is confusingly similar to the first and second sub-word units of the aligned pair.
  - 58. An apparatus according to claim 56, further comprising:
    - a combined score generator operable to combine the comparison scores obtained when comparing the first and second sequence sub-word units in the aligned pair with the same sub-word unit from the set, to generate a plurality of combined comparison scores; and
      
      a third comparator operable to compare, for each aligned pair, the combined comparison scores generated by said combined score generator for the aligned pair,wherein said determiner is operable to determine, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon a comparison result output by said third comparator for the aligned pair.
  - 59. An apparatus according to claim 58, wherein said first and second comparators are operable to generate comparison scores which are indicative of a probability of confusing the corresponding sub-word unit taken from the set of predetermined sub-word units as the sub-word unit in the aligned pair.
  - 60. An apparatus according to claim 59, wherein said combined score generator is operable to combine the comparison scores in order to multiply the probabilities of confusing the corresponding sub-word unit taken from the set as the sub-word units in the aligned pair.
  - 61. An apparatus according to claim 60, wherein each of said sub-word units in said set of predetermined sub-word units has a predetermined probability of occurring within a sequence of sub-word units and wherein said combined score generator is operable to weight each of said combined comparison scores in dependence upon the respective probability of occurrence for the sub-word unit of the set used to generate the combined comparison score.
  - 62. An apparatus according to claim 61, wherein said combined score generator is operable to combine said comparison scores by calculating:
    - P(d¹_i|p_r)P(d²_j|p_r)P(p_r)where d¹_iand d²_jare an aligned pair of first and second sequence sub-word units respectively;
      
      P(d¹_i|p_r) is the comparison score generated by said first comparator and is representative of the probability of confusing set sub-word unit p_ras first sequence sub-word unit d¹_i;
      
      P(d²_j|p_r) is the comparison score generated by said second comparator and is representative of the probability of confusing set sub-word unit p_ras second sequence sub-word unit d²_j; and
      
      P(p_r) is a weight which represents the probability of set sub-word unit p_roccurring in a sequence of sub-word units.
  - 63. An apparatus according to claim 62, wherein said third comparator is operable to identify the set sub-word unit which gives the maximum combined comparison score and wherein said determiner is operable to determine said sub-word unit representative of the sub-word units in the aligned pair as being the sub-word unit which provides the maximum combined comparison score.
  - 64. An apparatus according to claim 60, wherein said comparison scores represent log probabilities and wherein said combined score generator is operable to multiply said probabilities by adding the respective comparison scores.
  - 65. An apparatus according to claim 58, wherein each of the sub-word units in said first and second sequences of sub-word units belong to said set of predetermined sub-word units and wherein said first and second comparators are operable to generate said comparison scores using predetermined data which relate the sub-word units in said set to each other.
  - 66. An apparatus according to claim 65, wherein said predetermined data comprises, for each sub-word unit in the set of sub-word units, a probability for confusing that sub-word unit with each of the other sub-word units in the set of sub-word units.
  - 67. An apparatus according to claim 56, wherein said first and second comparators are operable to compare the first sequence sub-word unit and the second sequence sub-word unit respectively with each of the sub-word units in said set of sub-word units.
  - 68. An apparatus according to claim 56, wherein said sub-word unit aligner comprises a dynamic programming unit operable to align said first and second sequences of sub-word units using a dynamic programming technique.
  - 69. An apparatus according to claim 68, wherein said dynamic programming unit is operable to determine an optimum alignment between said first and second sequences of sub-word units.
  - 70. An apparatus according to claim 56, wherein each of said sub-word units represents a phoneme.
  - 71. An apparatus according to claim 56, wherein said receiver is operable to receive signals representative of a third spoken rendition of the new word, wherein said speech recogniser is operable to compare the third rendition of the new word with said pre-stored sub-word unit models to generate a third sequence of sub-word units representative of said third rendition of the new word, wherein said sub-word unit aligner is operable to align simultaneously the sub-word units of the first, second and third sequences of sub-word units to generate a number of aligned groups of sub-word units, each aligned group comprising a sub-word unit from each of the renditions, and wherein said determiner is operable to determine said representative sequence of sub-word units in dependence upon the aligned groups of sub-word units.
  - 72. An apparatus according to claim 56, wherein said receiver is operable to receive signals representative of a third spoken rendition of the new word, wherein said speech recogniser is operable to compare the third rendition of the new word with said pre-stored sub-word unit models to generate a third sequence of sub-word units representative of said third rendition of the new word and wherein said sub-word unit aligner is operable to align two sequences of sub-word units at a time.
  - 73. An apparatus according to claim 56, wherein said receiver is operable to receive signals representative of a plurality of spoken renditions of the new word, wherein said speech recogniser is operable to compare the received spoken renditions with pre-stored sub-word unit models to generate a sequence of sub-word units for each of the plurality of spoken renditions, wherein said sub-word unit aligner is operable to align the sub-word units of the plurality of sequences of sub-word units to form a number of aligned groups of sub-word units, each group including a sub-word unit from each sequence;
    - wherein said determiner is operable to determine a sequence of sub-word units representative of the spoken renditions of the new word;
      
      wherein the apparatus further comprises (i) a comparator operable to compare each sequence of sub-word units with said representative sequence of sub-word units to determine a score representative of the similarity therebetween; and
      
      (ii) a cluster identifier operable to process the scores output by the comparator to identify clusters within the scores indicating one or more different pronunciations of the spoken rendition of the new word; and
      
      wherein said determiner is operable to determine a sequence of sub-word units representative of the spoken renditions of the new word within each cluster.
  - 74. An apparatus according to claim 73, wherein said comparator, cluster identifier and determiner operate iteratively until a predetermined convergence criterion is met.
  - 75. An apparatus according to claim 73, further comprising a sub-word unit combiner operable to combine the sequences of sub-word units for each of the clusters into a sub-word unit lattice.
  - 76. An apparatus according to claim 56, wherein said generated sequence of sub-word units is representative of a new command to be added to a command dictionary of said speech recognition system.

77. An apparatus for adding a new word and sub-word representation of the new word to a word dictionary of a speech recognition system, the apparatus comprising;
- a receiver operable to receive a first sequence of sub-word units representative of a first spoken rendition of the new word and to receive a second sequence of sub-word units representative of a second spoken rendition of the new word;
  
  a sub-word unit aligner operable to align sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  a first comparator operable to compare, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  a second comparator operable to compare, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  a determiner operable to determine, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparators for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word; and
  
  an apparatus operable to add the new word and the representative sequence of sub-word units to said word dictionary.

78. A speech recognition system comprising:
- a first receiver operable to receive speech signals to be recognised;
  
  a store operable to store sub-word unit models;
  
  a comparator operable to compare received speech with the sub-word unit models to generate one or more sequences of sub-word units representative of the received speech signals;
  
  a word dictionary relating sequences of sub-word units to words;
  
  a word decoder operable to process the one or more sequences of sub-word units output by said comparator using the word dictionary to generate one or more words corresponding to the received speech signals;
  
  a first apparatus operable to add a new word and a sub-word representation of the new word to the word dictionary; and
  
  a switch operable to controllably connect the output of said comparator to either said word decoder or said apparatus for adding the new word and a sub-word representation of the new word to the word dictionary;
  
  characterised in that said apparatus for adding the new word and a sub-word representation of the new word to the word dictionary comprises;
  
  a second receiver operable to receive a first sequence of sub-word units representative of a first spoken rendition of the new word output by said comparator and to receive a second sequence of sub-word units representative of a second spoken rendition of the new word output by said comparator;
  
  a sub-word unit aligner operable to align sub-word units of the first sequence with sub-word units of the second sequence to form a number of aligned pairs of sub-word units;
  
  a first comparator operable to compare, for each aligned pair, the first sequence sub-word unit in the aligned pair with each of a plurality of sub-word units taken from a set of predetermined sub-word units, to generate a corresponding plurality of comparison scores representative of the similarities between the first sequence sub-word unit and the respective sub-word units of the set;
  
  a second comparator operable to compare, for each aligned pair, the second sequence sub-word unit in the aligned pair with each of said plurality of sub-word units from the set, to generate a further corresponding plurality of comparison scores representative of the similarities between said second sequence sub-word unit and the respective sub-word units of the set;
  
  a determiner operable to determine, for each aligned pair of sub-word units, a sub-word unit representative of the sub-word units in the aligned pair in dependence upon the comparison scores generated by said first and second comparators for the aligned pair, to determine a sequence of sub-word units representative of the spoken renditions of the new word;
  
  a third receiver operable to receive a text rendition of the new word; and
  
  a second apparatus operable to add said text rendition of the new word and the representative sequence of sub-word units to said word dictionary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Charlesworth, Jason Peter Andrew, Rajan, Jebu Jacob
Primary Examiner(s)
HARPER, V PAUL

Application Number

US09/985,543
Publication Number

US 20020120447A1
Time in Patent Office

2,304 Days
Field of Search

704/254, 704/256, 704/251, 704/244, 434/169
US Class Current

704/254
CPC Class Codes

G10L 15/06 Creation of reference templ...

G10L 2015/0631 Creating reference template...

Speech processing system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

167 Citations

78 Claims

Specification

Solutions

Use Cases

Quick Links

Speech processing system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

167 Citations

78 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links