System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models
First Claim
Patent Images
1. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;
employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition; and
linguistic decoding an output of the acoustic processing step, the linguistic decoding comprising;
employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;
the Mandarin speech input comprises continuous speech, andThe "Chinese Language Modesl" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
1 Assignment
0 Petitions
Accused Products
Abstract
A mandarin speech input method for directly translating arbitrary sentences of mandarin speech into corresponding Chinese Characters. The present invention is capable of processing a sequence of "mono-syllables," "(but each of the characters in the poly-character word is continuous)," "prosodic segments," or even a "whole sentence of continuous mandarin speech." A prosodic segment comprising one or more words is a segment that is automatically isolated by a speaker by pausing where characters in the prosodic segment are continuous.
111 Citations
35 Claims
-
1. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising; employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition; and linguistic decoding an output of the acoustic processing step, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;
the Mandarin speech input comprises continuous speech, andThe "Chinese Language Modesl" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 30, 31)
-
-
2. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising; employing "Sub-Syllable Unit Models", developed for characteristics of the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to directly compare with the Mandarin speech input and to locate corresponding mono-syllables from a resultant "sub-syllable unit string" and a "tone string" for recognition; and linguistic decoding an output of the acoustic processing, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein; the Mandarin speech input comprises continuous speech, and the "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
-
-
10. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing comprising; employing "Sub-Syllable Unit Models", developed for characteristics of the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to directly compare with the Mandarin speech input and to locate corresponding mono-syllables from a resultant "sub-syllable unit string" and a "tone string" for recognition; and linguistic decoding an output of the acoustic processing, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, comparing base syllable string candidates and tone string candidates provided by the acoustic processing with a built-in dictionary by means of a "character and word string formation means" to locate all possible homonym characters or homonym words formed therefrom for generating a word lattice, calculating probabilities for generation of sentences, each of the sentences composed of the words in the word lattice in accordance with the "Chinese Language Models" and linguistic knowledge, and outputting the sentence with the highest probability or score, wherein; the Mandarin speech input comprises continuous speech, and the "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A Mandarin dictation machine for receiving Mandarin speech input, comprising:
-
filtering and analog-to-digital converting means for filtering and converting speech input signals into digital signals; a personal computer and add-in digital signal processor board for receiving and processing the digital signals provided by the analog-to-digital converting means; feature extracting means and pitch frequency detecting means, connected to the personal computer, for detecting and calculating pitch frequencies and other feature parameters of the digital signals provided by the personal computer; endpoint detection means and "Hidden Markov Models" processing means, in conjunction with Mixtures of Gaussian Probabilities processing means, for calculating endpoints of each speech segment of the Mandarin speech input and for recognizing base syllables and tones thereof; a set of "character-based", "word-based" or "word-class-based" Chinese Language Models, set up by calculating occurrence probabilities and including a linguistic knowledge, for calculating probabilities of each homonym character and words for syllables of the Mandarin input speech and for further forming word strings or sentences and providing recognized results to the personal computer; and training and learning means for training and learning probabilities of "Hidden Markov Models" for all "sub-syllable units", base syllables and tones and probabilities or knowledge of the "Chinese Language Models" and for providing the probabilities or the knowledge to the personal computer wherein the Mandarin speech input comprises continuous speech, wherein the "Chinese Language Modesl" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with the linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A method for training the Mandarin dictation machine to be adapted to a voice and an environment of a user, comprising using a plurality of learning algorithms comprising:
-
a first learning algorithm; a second learning algorithm; a third learning algorithm; and a fourth learning algorithm, wherein (1) the first learning algorithm is automatic learning of a user'"'"'s voice through "learning sentences" arranged in a plurality of learning stages; (2) the second learning algorithm is automatic "on-line" real-time learning for the user'"'"'s voice, and the second learning algorithm can be used in conjunction with the first learning algorithm; (3) the third learning algorithm is automatic learning for environmental noise; and (4) the fourth learning algorithm is automatic learning for special words, a wording and a sentence style of the user, wherein; input to the Mandarin dictation machine is in a form comprising continuous speech, the fourth learning algorithm dynamically adjusts statistical parameters and linguistic knowledge in "Chinese Language Models" and can add new words to a global dictionary, while the fourth learning algorithm stores the wording and idioms of the user or the special words which have a plurality of occurrences in a certain input text in a dynamic memory device which will be accessed in first priority, and the wording, the idioms or the special words are stored in different memory areas in accordance with occurrence frequencies of the wording, the idioms or the special words, the "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with the linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language. - View Dependent Claims (22, 23, 24, 25, 27)
-
-
26. A training method for training a Mandarin dictation machine to recognize a Mandarin speech input of a new user, comprising:
(1) training "Hidden Markov Models" of each "Sub-syllable Units" and "Tone Model" in Mandarin with voices from multiple speakers, wherein an amount of Mixtures of Gaussian Probabilities is required to describe each state since feature parameters of the multiple speakers are different, the training step further comprising; (i) inputting a training speech of the new user, the training speech comprising continuous Mandarin speech; (ii) setting up "Hidden Markov Models" of the new user, the setting up step comprising; obtaining a "sub-syllable unit" segment from the training speech of the new user; selecting a plurality of the Mixtures of Gaussian Probabilities, from a group of the Mixtures of Gaussian Probabilities in the "Hidden Markov Models" for the multiple speakers, and de-emphasizing other Mixtures of Gaussian Probabilities; (iii) generating a new "Hidden Markov Models", the generating step further comprising; continuously obtaining the "sub-syllable unit" segments of the new user, and averaging feature parameters of the continuously pronounced "sub-syllable unit" segments into the "Hidden Markov Models" of the new user, set up in step (ii), to calculate new Mixtures of Gaussian Probabilities; and (iv) repeating step (iii) to include more features of the new user in the Hidden Markov Models" so that a "Hidden Markov Models" which can better describe a voice of the new user is thus generated.
-
28. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
inputting the Mandarin speech, the Mandarin speech being continuous speech; acoustic processing of the arbitrary sentences of the continuous Mandarin speech, the acoustic processing step comprising; employing "Base Syllable Models", formed form "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for further recognition; and linguistic decoding an output of the acoustic processing, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein words and sentences are formed by the "Base Syllable Models" for further recognition. - View Dependent Claims (29)
-
-
32. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising; employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition, detecting endpoints of the Mandarin speech input to locate a starting point and an ending point of the Mandarin speech input, recognizing base syllables and the tones of the Mandarin speech input by respectively comparing the "Base Syllable Models" or the "Sub-syllable Unit Models" with the Mandarin speech input to locate corresponding base syllables and to locate corresponding tones from the "Tone Models" for forming words and sentences, the recognizing the base syllables and the tones utilizing a "pattern Matching Algorithm for Continuous Syllables" and a "Word-based Matching Algorithm for Syllables", and selecting base syllable strings and tone strings from possible base syllables and tones obtained from the recognizing as base syllable and tone string candidates and providing the base syllable and the tone string candidates for the linguistic decoding; and linguistic decoding an output to the acoustic processing step, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;
the Mandarin speech input comprises continuous speech,the "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language, the "Word-based Matching Algorithm Matching for Syllables" comprises; (1) setting up a "tree dictionary data structure" representing all words in a built-in dictionary of a computer in accordance with an order of base syllables, disregarding tones, or mono-syllables, with tones, (2) moving along the tree dictionary data structure to find a word, each node of the tree dictionary data structure representing a base syllable or a mono-syllable, and (3) considering in first priority base syllables or mono-syllables adjacent to each other in each word along paths in the tree dictionary data structure in accordance with probabilities of each base syllable or mono-syllable being adjacent to a preceding and a following base syllable or mono-syllable thereof so as to reduce a search space and improve a correct recognition rate, a linguistic knowledge included in the "Chinese Language Models" includes knowledge, rules and information, obtained from a linguistic analysis of parts-of-speech, syntax and semantics of Chinese, in combination with linguistic information obtained from an analysis of a Chinese text corpus, and word occurrence frequencies are used to locate the words in such a manner that frequently used words are considered before less frequently used words.
-
-
33. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese Characters, comprising:
-
acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising; employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech to calculate probabilities of each of the mono-syllables in the Mandarin speech input and each of a plurality of tones of the Mandarin speech input for further recognition; and linguistic decoding an output of the acoustic processing, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein a sub-syllable unit of the sub-syllable units is a phoneme which is affected by a following phoneme, and the Mandarin speech input comprises continuous speech.
-
-
34. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
inputting the Mandarin speech, the Mandarin speech being continuous speech; acoustic processing of the arbitrary sentences of the continuous Mandarin speech, the acoustic processing step comprising; employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech to calculate probabilities of each of the mono-syllables in the Mandarin speech input and each of a plurality of tones of the Mandarin speech input for further recognition; and linguistic decoding an output of the acoustic processing, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein each one of the "sub-syllable units" is formed from an "initial" affected by a starting phoneme of a following "final" thereof or an other "final" unaffected by a preceding and a following phoneme.
-
-
35. A method Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
-
acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising; employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition; and linguistic decoding an output of the acoustic processing step, the linguistic decoding comprising; employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables; classifying words into word classes, comprising; (1) classifying the words into a plurality of word groups, each of the word groups having common parts-of-speech, semantics and syntax in accordance with a linguistic knowledge, (2) dividing the words, which are classified into any of the word groups during step 1 into a plurality of word sub-groups with consistent statistical characteristics including statistical characteristics pertaining to preceding words, following words, and word-pairs that tend to be present in a same sentence, obtained from a Chinese text corpus, and (3) recombining the word sub-groups into a final word class, wherein;
the Mandarin speech input comprises continuous speech, andthe "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
-
Specification