Small footprint language and vocabulary independent word recognizer using registration by word spelling
First Claim
Patent Images
1. A speech recognizer having a lexicon updatable by spelled work input, comprising:
- a phoneticizer for generating a first phonetic transcription of said spelled word input using probabilistic rules and a second phonetic transcription of said spelled word input using probabilistic rules;
a hybrid unit generator receptive of said first phonetic transcription and said second phonetic transcription for generating a first hybrid unit representation of said spelled word input and a second hybrid unit representation of said spelled word input;
a transcription selector that selects one of said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription; and
a word template constructor that generates for said word a sequence of symbols indicative of said selected hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represents sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used.
2 Assignments
0 Petitions
Accused Products
Abstract
A phoneticizer converts spelled words or names into one or an n-best number of phonetic transcriptions. The n-best transcriptions may be generated from a single transcription using a confusion matrix. These n-best transcriptions are then transformed into hybrid units. Preferably only the most frequently encountered units are stored as syllables, with the remainder being stored as smaller units such as demi-syllables or phonemes. Voice input is then used to rescore the n-best transcriptions and these are stored preferably as speaker-independent, similarity-based hybrid units concatenated into a string representing the spelled word.
-
Citations
26 Claims
-
1. A speech recognizer having a lexicon updatable by spelled work input, comprising:
-
a phoneticizer for generating a first phonetic transcription of said spelled word input using probabilistic rules and a second phonetic transcription of said spelled word input using probabilistic rules;
a hybrid unit generator receptive of said first phonetic transcription and said second phonetic transcription for generating a first hybrid unit representation of said spelled word input and a second hybrid unit representation of said spelled word input;
a transcription selector that selects one of said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription; and
a word template constructor that generates for said word a sequence of symbols indicative of said selected hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represents sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
a decision tree data structure stored in said memory that defines a plurality of internal nodes and a plurality of leaf nodes, said internal nodes adapated for storing yes-no questions and said leaf nodes adapted for storing probability data;
a first plurality of said internal nodes being populated with letter questions about a given letter and its neighboring letters in said spelled work input;
a second plurality of said internal nodes being populated with phoneme questions about a phoneme and its neighboring phonemes in said spelled word input;
said leaf nodes being populated with probability data that associates said given letter with a plurality of phoneme pronunciations.
-
-
10. The speech recognizer of claim 1 further wherein said hybrid units are represented as similarity parameters.
-
11. The speech recognizer of claim 1 wherein said hybrid units are represented as phone similarity parameters based on an average similarity derived from a plurality of training examples.
-
12. The speech recognizer of claim 1 further comprising hybrid unit duration modification rules for expanding or compressing duration of selected hybrid units based on length of said spelled word.
-
13. The speech recognizer of claim 1 further comprising pattern matching mechanism for comparing a voiced input to said lexicon, said pattern matching mechanism having weighted mechanism for increasing the importance of selected portions of said hybrid units during pattern matching.
-
14. The speech recognizer of claim 1, wherein said transcription selector selects one of said first hybrid unit representation and said second hybrid unit representation based on voiced pronunciation by said user of the word corresponding to the spelled word input such that the voiced pronunciation of the word is sufficient to identify said selected hybrid unit representation.
-
15. The speech recognizer of claim 1, wherein said transcription selector has a rescoring mechanism that assigns new probability scores to said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription.
-
16. The speech recognizer of claim 15, wherein said rescording selector employs mixed decision trees comprising questions based on letters and questions based on phonemes.
-
17. A speech recognizer having a lexicon user updateable by spelled word input, comprising:
-
a phoneticizer for generating a first phonetic transcription of said spelled word input using a probabilistic and a second phonetic transcription of said spelled word input using probabilistic rules, and generating stress level indicators for different phonemes;
a hybrid unit generator receptive of said first phonetic transcription and said second phonetic transcription for generating a first hybrid unit representation and a second hybrid unit representation of said spelled word input based on a syllabic transcription of said first phonetic transcription and second phonetic transcription using said stress level indicators;
a transcription selector that selects one of said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription; and
a word template constructor that generates for said spelled word a sequence of symbols indicative of said selected hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represent sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
a decision tree data structure stored in said memory that defines a plurality of internal nodes and a plurality of leaf nodes, said internal nodes adapted for storing yes-no questions and said leaf nodes adapted for storing probability data;
a first plurality of said internal nodes being populated with letter questions about a given letter and its neighboring letters in said spelled word input;
a second plurality of said internal nodes being populated with phoneme questions about a phoneme and its neighboring phonemes in said spelled word input;
said leaf nodes being populated with probability data that associates said given letter with a plurality of phoneme pronunciations.
-
-
23. The speech recognizer of claim 17, wherein said transcription selector selects one of said first hybrid unit representation and said second hybrid unit representation based on voiced pronunciation by said user of the word corresponding to the spelled word input such that the voiced pronunciation of the word is sufficient to identify said selected hybrid unit representation.
-
24. The speech recognizer of claim 23, wherein said transcription selector has a rescording mechanism that assigns new probability scores to said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription.
-
25. The speech recognizer of claim 24, wherein said rescoring selector employs mixed decision trees comprising questions based on letters and questions based on phonemes.
-
26. A speech recognizer having a lexicon user updateable by spelled word input, comprising:
-
a phoneticizer for generating a first phonetic transcription of said spelled word input;
a hybrid unit generator receptive of said phonetic transcription for generating at least one hybrid unit representation of said spelled word input based on said phonetic transcription; and
a word template constructor that generates for said spelled word a sequence of symbols indicative of said hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represent sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used.
-
Specification