System and method for generating and using context dependent sub-syllable models to recognize a tonal language
First Claim
1. A speech recognition system for recognizing syllables of a language, the syllables of the language each being formed from an initial sub-syllable and a final sub-syllable, the speech recognition system comprising:
- a speech identifier for storing a plurality of valid combinations of initial sub-syllables and final sub-syllables;
a storage device for storing a plurality of initial sub-syllable models and final sub-syllable models; and
a speech determinator for receiving;
an input signal to be recognized via a first input;
the plurality of valid combinations from the speech identifier via a second input; and
the plurality of models from the storage device via a third input;
wherein, after the speech determinator receives the input signal, the plurality of valid combinations and the plurality of models, the speech determinator creates appended models from the received plurality of models according to the received plurality of valid combinations, each appended model comprising a final sub-syllable model appended to the end of an initial sub-syllable model, compares the input signal to each appended model, and then generates and outputs a signal indicating one of the appended models that most closely matches the input signal.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system for Mandarin Chinese comprises a preprocessor, HMM storage, speech identifier, and speech determinator. The speech identifier includes pseudo initials for representing glottal stops that precede syllables of lone finals. The HMM storage stores context dependent models of the initials, finals, and pseudo initials that make the syllables of Mandarin Chinese speech. The models may be dependent on associated initials or finals and on the tone of the syllable. The speech determinator joins the initials and finals and pseudo initials and finals according to the syllables of the speech identifier. The speech determinator then compares input signals of syllables to the joined models to determine the phonetic structure of the syllable and the tone of the syllable. The system also includes a smoother for smoothing models to make recognitions more robust. The smoother comprises an LDM generator and a detailed model modifier. The LDM generator generates less detailed models from the detailed models, and the detailed model modifier smoothes the models with the less detailed models. A method for recognizing Mandarin Chinese speech includes the steps of arranging context dependent, sub-syllable models; comparing an input signal to the arranged models; and selecting the arrangement of models that best matches the input signal to recognize the phonetic structure and tone of the input signal.
48 Citations
26 Claims
-
1. A speech recognition system for recognizing syllables of a language, the syllables of the language each being formed from an initial sub-syllable and a final sub-syllable, the speech recognition system comprising:
-
a speech identifier for storing a plurality of valid combinations of initial sub-syllables and final sub-syllables; a storage device for storing a plurality of initial sub-syllable models and final sub-syllable models; and a speech determinator for receiving; an input signal to be recognized via a first input; the plurality of valid combinations from the speech identifier via a second input; and the plurality of models from the storage device via a third input; wherein, after the speech determinator receives the input signal, the plurality of valid combinations and the plurality of models, the speech determinator creates appended models from the received plurality of models according to the received plurality of valid combinations, each appended model comprising a final sub-syllable model appended to the end of an initial sub-syllable model, compares the input signal to each appended model, and then generates and outputs a signal indicating one of the appended models that most closely matches the input signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of recognizing an input signal including a syllable of a language, the syllable having an initial sub-syllable and a final sub-syllable, the method comprising the steps of:
-
receiving the input signal; receiving a plurality of valid combinations of initial sub-syllables and final sub-syllables; receiving a plurality of initial sub-syllable models and final sub-syllable models; creating appended models from the received plurality of models according to the received plurality of valid combinations, each appended model comprising a final sub-syllable model appended to the end of an initial sub-syllable model; comparing the input signal to each appended model; and generating a signal indicating one of the appended models that most closely matches the input signal. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A system for recognizing an input signal including a syllable of a language, the syllable having an initial sub-syllable and a final sub-syllable, the system comprising:
-
means for receiving the input signal; means for receiving a plurality of valid combinations of initial sub-syllables and final sub-syllables; means for receiving a plurality of initial sub-syllable models and final sub-syllable models; means for creating appended models from the received plurality of models according to the received plurality of valid combinations, each appended model comprising a final sub-syllable model appended to the end of an initial sub-syllable model; means for comparing the input signal to each appended model; and means for generating a signal indicating one of the appended models that most closely matches the input signal. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
Specification