System and method of intelligent Mandarin speech input for Chinese computers
First Claim
1. A Mandarin speech input method for directly translating a plurality of spoken words of Mandarin speech into corresponding Chinese characters, comprising steps of:
- acoustic processing of the Mandarin speech, the acoustic processing step employing "Segmental Probability Models" to calculate probabilities of each of a plurality of mono-syllables in the Mandarin speech input and each of a plurality of tones thereof for further recognition; and
linguistic decoding of the plurality of mono-syllables recognized by the acoustic processing step, the linguistic decoding step employing "Word-class-based Markov Chinese Language Models" to locate the corresponding Chinese characters for a series of the plurality of mono-syllables.
1 Assignment
0 Petitions
Accused Products
Abstract
A Mandarin speech input system and method for directly translating arbitrary sentences of Mandarin speech into corresponding Chinese characters is disclosed. The system and method comprises acoustic processing section and linguistic decoding section. The acoustic processing step employs "Segmental Probability Models" to calculate the probabilities of each of the mono-syllables in the Mandarin speech input and each of the tones thereof for further recognition. The linguistic decoding step employs "Word-class-based Chinese Language Models" to locate the corresponding Chinese characters for the series of recognized syllables provided by said acoustic processing step. A Mandarin dictation machine translates the provided speeches into characters in accordance with above method and displays the characters. The dictation machine is featured for the "intelligence" that can "learn" if taught by including several "intelligent learning techniques" such as automatic new user'"'"'s voice learning to enable the new user can use the dictation machine quickly, automatic environmental noise learning to adapt to the environmental noise in the user'"'"'s environment, and continuous on-line learning of user'"'"'s voice, special words, wording and sentence styles to continuously improve the correct recognition rate.
81 Citations
19 Claims
-
1. A Mandarin speech input method for directly translating a plurality of spoken words of Mandarin speech into corresponding Chinese characters, comprising steps of:
-
acoustic processing of the Mandarin speech, the acoustic processing step employing "Segmental Probability Models" to calculate probabilities of each of a plurality of mono-syllables in the Mandarin speech input and each of a plurality of tones thereof for further recognition; and linguistic decoding of the plurality of mono-syllables recognized by the acoustic processing step, the linguistic decoding step employing "Word-class-based Markov Chinese Language Models" to locate the corresponding Chinese characters for a series of the plurality of mono-syllables. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A learning method of a Mandarin speech recognition system for quickly adapting to a voice of a new user to recognize a Mandarin speech input of the new user, the learning method training, in advance, each mono-syllable of a plurality of mono-syllables as "Segmental Probability Models" including feature parameters of each of the mono-syllables of the plurality of mono-syllables pronounced by different users, comprising:
-
training a plurality of pronunciations by many speakers with respect to one mono-syllable of the plurality of mono-syllables as the "Segmental Probability Models", in which a plurality of Mixtures of Gaussian Probabilities is required to describe each state of the one mono-syllable in consideration of different feature parameters of the many speakers; pronouncing the one mono-syllable by the new user and establishing the "Segmental Probability Models" of the new user by selecting a plurality of Mixtures of Gaussian Probabilities having feature parameters close to the feature parameters of the new user from the plurality of Mixtures of Gaussian Probabilities under the "Segmental Probability Models" for the many speakers and by de-emphasizing other unnecessary Mixtures of Gaussian Probabilities; calculating new Mixtures of Gaussian Probabilities and updating new "Segmental Probability Models" by averaging feature vectors of a plurality of segments of a new pronunciation of the one mono-syllable when the new user continuously pronounces the one mono-syllable; and repeating the calculating step so that a ratio of the Mandarin speech of the new user in the new "Segmental Probability Models" will be gradually increased to result in the new "Segmental Probability Models" that can better describe the Mandarin speech of the new user. - View Dependent Claims (9)
-
-
10. A Mandarin dictation machine for recognizing Mandarin speech, comprising:
-
an analog-to-digital converter with a filter for filtering and converting speech input signals into digital signals; a computer coupled with a digital signal processing board for receiving and processing the digital signals provided by the analog-to-digital converter; a pitch frequency detector; a feature abstraction apparatus, the feature abstraction apparatus and the pitch frequency detector both being coupled to the computer for detecting and calculating a pitch frequency and other feature parameters of the digital signals received by the computer; Segmental Probability Models processing means coupled with a Mixed Gaussian Probabilities processing means, after calculating an endpoint of each of a plurality of mono-syllables, for recognizing a basic mono-syllable of the plurality of mono-syllables and a tone thereof; word-class-based Markov Chinese Language Models processing means, which calculates probabilities with characters, for calculating probabilities of all possible homonym characters of each of the plurality of mono-syllables input and transferring recognized results to the computer; and training means for training first probabilities of all of the basic mono-syllables and tones under "Segmental Probability Models" and training second probabilities under "Word-class-based Markov Chinese Language Models", and for transferring both the first probabilities and the second probabilities to the computer. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A learning method for training a Mandarin dictation machine to be adaptive to a voice of a new user, comprising:
repetitively pronouncing a plurality of selected sentences that include all basic acoustic units of Mandarin speech including initials, finals and basic mono-syllables within a minimum number of possible characters such that frequently used basic acoustic units will occur frequently in the plurality of selected sentences, wherein the repetitive pronouncing step better trains "Segmental Probability Models" and trains the Mandarin dictation machine to be adaptive to pronunciations of the new user, the pronunciations of the new user being stored in the Mandarin dictation machine. - View Dependent Claims (17, 18, 19)
Specification