System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models

US 6,067,520 A
Filed: 12/29/1995
Issued: 05/23/2000
Est. Priority Date: 12/29/1995
Status: Expired due to Term

First Claim

Patent Images

1. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:

acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;

employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition; and

linguistic decoding an output of the acoustic processing step, the linguistic decoding comprising;

employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;

the Mandarin speech input comprises continuous speech, andThe "Chinese Language Modesl" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A mandarin speech input method for directly translating arbitrary sentences of mandarin speech into corresponding Chinese Characters. The present invention is capable of processing a sequence of "mono-syllables," "(but each of the characters in the poly-character word is continuous)," "prosodic segments," or even a "whole sentence of continuous mandarin speech." A prosodic segment comprising one or more words is a segment that is automatically isolated by a speaker by pausing where characters in the prosodic segment are continuous.

111 Citations

35 Claims

1. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;
  
  employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition; and
  
  linguistic decoding an output of the acoustic processing step, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;
  
  the Mandarin speech input comprises continuous speech, andThe "Chinese Language Modesl" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 30, 31)
- - 3. A method as claimed in one of claims 1 and 2, wherein the acoustic processing step comprises:
    - (1) detecting endpoints of the Mandarin speech input to locate a starting point and an ending point of the Mandarin speech input;
      
      (2) recognizing base syllables and the tones of the Mandarin speech input by respectively comparing the "Base Syllable Models" or the "Sub-syllable Unit Models" with the Mandarin speech input to locate corresponding base syllables and to locate corresponding tones from the "Tone Models" for forming words and sentences; and
      
      (3) selecting base syllable strings and tone strings from possible base syllables and tones obtained from the recognizing step as base syllable and tone string candidates and providing the base syllable and the tone string candidates for the linguistic decoding step.
  - 4. A method as claimed in claim 3, wherein the step of recognizing the base syllables and the tones utilizes a "Pattern Matching Algorithm for Continuous Syllables" and a "Word-based Matching Algorithm for Syllables".
  - 5. A method as claimed in claim 4, wherein the "Word-based Matching Algorithm Matching for Vocabulary Syllables" comprises:
    - (1) setting up a "tree dictionary data structure" representing all words in a built-in dictionary of a computer in accordance with an order of base syllables, disregarding tones, or mono-syllables, with tones;
      
      (2) moving along the tree dictionary data structure to find a word, each node of the tree dictionary data structure representing a base syllable or a mono-syllable; and
      
      (3) considering in first priority base syllables or mono-syllables adjacent to each other in each word along paths in the tree dictionary data structure in accordance with probabilities of each base syllable or mono-syllable being adjacent to a preceding and a following base syllable or mono-syllable thereof so as to reduce a search space and improve a correct recognition rate.
  - 6. A method as claimed in claim 5, wherein word occurrence frequencies are used to locate the words in such a manner that frequently used words are considered before less frequently used words.
  - 7. A method as claimed in claim 4, wherein the "Pattern Matching Algorithm for Continuous Syllables" comprises:
    - (1) locating starting points and ending points of each possible syllable by means of an instantaneous energy and a range of syllable duration of an input speech segment;
      
      (2) comparing each of the possible mono-syllables between each pair of the starting points and the ending points with the "Sub-syllable Unit Models" or the "Base Syllable Models" and the "Tone Models";
      
      (3) calculating and accumulating a score after the comparing step with respect to each of the possible mono-syllables between each pair of the starting points and the ending points from a beginning to an end of a whole speech utterance in accordance with "Dynamic Programming" to locate a possible combination of "base syllable strings" and "tone strings" of the whole speech utterance; and
      
      (4) outputting the "base syllable strings" and the "tone strings" with the highest scores.
  - 8. A method as claimed in claim 3, whereinthe "sub-syllable Unit Models" and "tone models", for tone recognition, are the "Hidden Markov Models" trained by interpolation training, the interpolation training comprising:
    - first stage training of models;
      
      performing second stage training by processing an output of the first stage training to produce required models; and
      
      interpolating the models generated from each recursive training iteration during the second stage training step with the models generated from the first stage training step for utilizing a precision of the models generated during the first stage training step.
  - 9. A method as claimed in one of claims 1 and 2, further comprising:
    - establishing a set of "tone models" for tone recognition of continuous Mandarin speech in which a tone is affected by a preceding and a following tone,wherein the "tone models" judge a feature of each tone affected by a corresponding preceding tone and a corresponding following tone to combine the feature with other features of tone in order to reduce a number of all 175 "tone models" while the tones can be fully recognized.
  - 30. A method as claimed in claim 1 or 2, wherein the "word class" includes a plurality of words comprising a same ending character, a same beginning character, or common syntactic characteristics, semantic characteristics and statistical characteristics.
  - 31. A method as claimed in claim 30, further comprising:
    - classifying words into word classes, comprising;
      
      (1) classifying the words into a plurality of word groups, each of the word groups having common parts-of-speech, semantics and syntax in accordance with a linguistic knowledge;
      
      (2) dividing the words, which are classified into any of the word groups during step 1 into a plurality of word sub-groups with consistent statistical characteristics including statistical characteristics pertaining to preceding words, following words, and word-parts that tend to be present in a same sentence, obtained from a Chinese text corpus; and
      
      (3) recombining the word sub-groups into a final word class.

2. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;
  
  employing "Sub-Syllable Unit Models", developed for characteristics of the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to directly compare with the Mandarin speech input and to locate corresponding mono-syllables from a resultant "sub-syllable unit string" and a "tone string" for recognition; and
  
  linguistic decoding an output of the acoustic processing, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;
  
  the Mandarin speech input comprises continuous speech, andthe "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.

10. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing comprising;
  
  employing "Sub-Syllable Unit Models", developed for characteristics of the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to directly compare with the Mandarin speech input and to locate corresponding mono-syllables from a resultant "sub-syllable unit string" and a "tone string" for recognition; and
  
  linguistic decoding an output of the acoustic processing, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables,comparing base syllable string candidates and tone string candidates provided by the acoustic processing with a built-in dictionary by means of a "character and word string formation means" to locate all possible homonym characters or homonym words formed therefrom for generating a word lattice,calculating probabilities for generation of sentences, each of the sentences composed of the words in the word lattice in accordance with the "Chinese Language Models" and linguistic knowledge, andoutputting the sentence with the highest probability or score,wherein;
  
  the Mandarin speech input comprises continuous speech, andthe "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. A method as claimed in claim 10, wherein the probabilities for generation of the sentences, each composed of words in word family candidates in accordance with the "Chinese Language models" including occurrence probabilities of a "character", a "word", a "word class", two "characters", "words" or "word classes" adjacent to each other, three "characters", "words" or "word classes" adjacent to each other, and a plurality of "characters", "words" or "word classes" present in a same sentence.
  - 12. A method as claimed in claim 11, further comprising:
    - classifying words into the word classes, comprising;
      
      (1) classifying the words into a plurality of word groups, each of the word groups having common parts-of-speech, semantics and syntax in accordance with a linguistic knowledge;
      
      (2) dividing the words, which are classified into any of the word groups during step 1 into a plurality of word sub-groups with consistent statistical characteristics including statistical characteristics pertaining to preceding words, following words, and word-pairs that tend to be present in a same sentence, obtained from a Chinese text corpus; and
      
      (3) recombining the word sub-groups into a final word class.
  - 13. A method as claimed is claim 10, wherein each of the mono-syllables provided by the acoustic processing step has a score corresponding to the recognition in the acoustic processing step, and wherein a character or a word composed of the ones of the mono-syllables with first scores shall be considered before another character or word composed of other ones of the mono-syllables with second scores, wherein the second scores are less than the first scores.
  - 14. A method as claimed in claim 10, wherein the "Chinese Language Models" further calculate occurrence probabilities of one mono-syllable, two mono-syllables adjacent to each other, and three mono-syllables adjacent to each other.
  - 15. A method as claimed in claim 10, wherein the "Chinese Language Models" are capable of correcting errors generated by the acoustic processing step.

16. A Mandarin dictation machine for receiving Mandarin speech input, comprising:
- filtering and analog-to-digital converting means for filtering and converting speech input signals into digital signals;
  
  a personal computer and add-in digital signal processor board for receiving and processing the digital signals provided by the analog-to-digital converting means;
  
  feature extracting means and pitch frequency detecting means, connected to the personal computer, for detecting and calculating pitch frequencies and other feature parameters of the digital signals provided by the personal computer;
  
  endpoint detection means and "Hidden Markov Models" processing means, in conjunction with Mixtures of Gaussian Probabilities processing means, for calculating endpoints of each speech segment of the Mandarin speech input and for recognizing base syllables and tones thereof;
  
  a set of "character-based", "word-based" or "word-class-based" Chinese Language Models, set up by calculating occurrence probabilities and including a linguistic knowledge, for calculating probabilities of each homonym character and words for syllables of the Mandarin input speech and for further forming word strings or sentences and providing recognized results to the personal computer; and
  
  training and learning means for training and learning probabilities of "Hidden Markov Models" for all "sub-syllable units", base syllables and tones and probabilities or knowledge of the "Chinese Language Models" and for providing the probabilities or the knowledge to the personal computer wherein the Mandarin speech input comprises continuous speech, whereinthe "Chinese Language Modesl" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with the linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
- View Dependent Claims (17, 18, 19, 20)
- - 17. A Mandarin dictation machine as claimed in claim 16, wherein the Mandarin speech input uses speech segments including a mono-syllable, a word, a prosodic segment, or a whole sentence as an input unit.
  - 18. A Mandarin dictation machine as claimed in claim 16, further comprising a display screen for displaying input phonetic symbols and Chinese characters;
    - anda correction software provided for a user to directly correct errors on the display screen by means of a mouse.
  - 19. A Mandarin dictation machine as claimed in claim 16, further comprising:
    - a dynamic memory device for storing a wording and idioms of the user or special words which have a plurality of occurrences in certain input texts, wherein the wording, the idioms or the special words are stored in different memory areas in accordance with occurrence frequencies of the wording, the idioms or the special words which can be included in a global dictionary and Chinese Language Models along with corresponding messages and which can be deleted after use.
  - 20. A Mandarin dictation machine as claimed in claim 16, further comprising:
    - a first memory device for storing a first group of words; and
      
      a second memory device for storing a second group of words,wherein during operation the Mandarin dictation machine will first search in the first memory device for the first group of words and will then search in the second memory device for the second group of words if required words cannot be found in the first memory device, the found words of the second group of words will be moved to the first memory device,wherein some of the words stored in the first memory device are moved to the second memory device if the some of the words stored in the first memory device are used less than a given amount over time.

21. A method for training the Mandarin dictation machine to be adapted to a voice and an environment of a user, comprising using a plurality of learning algorithms comprising:
- a first learning algorithm;
  
  a second learning algorithm;
  
  a third learning algorithm; and
  
  a fourth learning algorithm, wherein(1) the first learning algorithm is automatic learning of a user'"'"'s voice through "learning sentences" arranged in a plurality of learning stages;
  
  (2) the second learning algorithm is automatic "on-line" real-time learning for the user'"'"'s voice, and the second learning algorithm can be used in conjunction with the first learning algorithm;
  
  (3) the third learning algorithm is automatic learning for environmental noise; and
  
  (4) the fourth learning algorithm is automatic learning for special words, a wording and a sentence style of the user, wherein;
  
  input to the Mandarin dictation machine is in a form comprising continuous speech,the fourth learning algorithm dynamically adjusts statistical parameters and linguistic knowledge in "Chinese Language Models" and can add new words to a global dictionary, while the fourth learning algorithm stores the wording and idioms of the user or the special words which have a plurality of occurrences in a certain input text in a dynamic memory device which will be accessed in first priority, andthe wording, the idioms or the special words are stored in different memory areas in accordance with occurrence frequencies of the wording, the idioms or the special words,the "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with the linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.
- View Dependent Claims (22, 23, 24, 25, 27)
- - 22. A method as claimed in claim 21, wherien in each of the plurlaity of learning stages of the automatic learning algorithm a new user shall utter a set of specially-designed sentences which include all basic acoustic units of Mandarin speech, including a sub-syllable unit, a phoneme, an "initial", a "final", a mono-syllable, and a tone, in a number of sentences in which certain ones of acoustic units will be present at least a given number of times so that after several utterances a "Hidden Markov Models" can be trained and the Mandarin dictation machine will be adapted to pronouncing styles of the new user;
    - the pronouncing styles of the new user are recorded, the Mandarin dictation machine learning the pronouncing styles of the new user when the new user repeatedly utters the specially-designed sentences,wherein with a different emphasis of the basic acoustic units arranged in the "learning sentences" of each learning stage, a correct recognition rate for recognizing a voice of the new user can be improved in such a manner that a plurality of basic acoustic units are uttered through the number of sentences in the first learning stage;
      
      said Mandarin dictation machine will learn the voice of the new user and the correct recognition rate will be improved in successive learning stages.
  - 23. A method as claimed in claim 21, whereinthe on-line real-time learning algorithm can be carried out during the learning stages or during use of the Mandarin dictation machine,wherein during the on-line real-time learning algorithm the user corrects erroneously recognized voices or texts generated by the Mandarin dictation machine on a real-time basis so that the dictation machine will learn correct voices and texts on the real-time basis and will store corresponding texts of the corrected voice.
  - 24. A method as claimed in claim 21, whereinthe automatic learning algorithm for environmental noise is carried out in conjunction with learning algorithms for the voice of the user so that the environmental noise is also averaged into feature parameters of "Hidden Markov Models" for the Mandarin dictation machine to be adapted to the environmental noise.
  - 25. A method as claimed in claim 21, whereinthe "learning sentences" for the Mandarin dictation machine are selected by a computer from a Chinese text corpus through a selection step, the selection step comprising:
    - giving different scores for all basic acoustic units in Mandarin;
      
      selecting a sentence in a high priority if the sentence is composed of basic acoustic units corresponding to a total score being greater than total scores of other sentences composed of basic acoustic units; and
      
      using a parameter, which described an occurrence frequency distribution of each of the basic acoustic units in the selected learning sentences as compared to a specific distribution, as a criterion of the selection step.
  - 27. A training method as claimed in claim 23, wherein the training method further comprises using a computer to perform steps 1, including steps i-iv, of claim 26, the training method further comprising:
    - correcting, on an on-line real-time basis, errors generated while the computer executes a process to recognize the Mandarin speech input of the new user;
      
      providing results of the correcting step to a memory device;
      
      repeating the steps i-iv of claim 23 so that the Mandarin dictation machine can learn a new voice on a real-time basis; and
      
      using the new "Hidden Markov Models" when repeating the steps i-iv of claim 23 to continuously improve a correct recognition rate.

26. A training method for training a Mandarin dictation machine to recognize a Mandarin speech input of a new user, comprising:
- (1) training "Hidden Markov Models" of each "Sub-syllable Units" and "Tone Model" in Mandarin with voices from multiple speakers, wherein an amount of Mixtures of Gaussian Probabilities is required to describe each state since feature parameters of the multiple speakers are different, the training step further comprising;
  
  (i) inputting a training speech of the new user, the training speech comprising continuous Mandarin speech;
  
  (ii) setting up "Hidden Markov Models" of the new user, the setting up step comprising;
  
  obtaining a "sub-syllable unit" segment from the training speech of the new user;
  
  selecting a plurality of the Mixtures of Gaussian Probabilities, from a group of the Mixtures of Gaussian Probabilities in the "Hidden Markov Models" for the multiple speakers, and de-emphasizing other Mixtures of Gaussian Probabilities;
  
  (iii) generating a new "Hidden Markov Models", the generating step further comprising;
  
  continuously obtaining the "sub-syllable unit" segments of the new user, and averaging feature parameters of the continuously pronounced "sub-syllable unit" segments into the "Hidden Markov Models" of the new user, set up in step (ii), to calculate new Mixtures of Gaussian Probabilities; and
  
  (iv) repeating step (iii) to include more features of the new user in the Hidden Markov Models" so that a "Hidden Markov Models" which can better describe a voice of the new user is thus generated.

28. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- inputting the Mandarin speech, the Mandarin speech being continuous speech;
  
  acoustic processing of the arbitrary sentences of the continuous Mandarin speech, the acoustic processing step comprising;
  
  employing "Base Syllable Models", formed form "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for further recognition; and
  
  linguistic decoding an output of the acoustic processing, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables,wherein words and sentences are formed by the "Base Syllable Models" for further recognition.
- View Dependent Claims (29)
- - 29. A method as claimed in claim 28, whereinthe "sub-syllable Unit Models" and "tone models", for tone recognition, are the "Hidden Markov Models" trained by interpolation training, the interpolation training comprising:
    - first stage training of models; and
      
      performing second stage training by processing an output of the first stage training to produce required models;
      
      interpolating the models generated from each recursive training iteration during the second stage training step with the models generated from the first stage training step for utilizing a precision of the models generated during the first stage training step.

32. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;
  
  employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition,detecting endpoints of the Mandarin speech input to locate a starting point and an ending point of the Mandarin speech input,recognizing base syllables and the tones of the Mandarin speech input by respectively comparing the "Base Syllable Models" or the "Sub-syllable Unit Models" with the Mandarin speech input to locate corresponding base syllables and to locate corresponding tones from the "Tone Models" for forming words and sentences, the recognizing the base syllables and the tones utilizing a "pattern Matching Algorithm for Continuous Syllables" and a "Word-based Matching Algorithm for Syllables", andselecting base syllable strings and tone strings from possible base syllables and tones obtained from the recognizing as base syllable and tone string candidates and providing the base syllable and the tone string candidates for the linguistic decoding; and
  
  linguistic decoding an output to the acoustic processing step, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables, wherein;
  
  the Mandarin speech input comprises continuous speech,the "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language,the "Word-based Matching Algorithm Matching for Syllables" comprises;
  
  (1) setting up a "tree dictionary data structure" representing all words in a built-in dictionary of a computer in accordance with an order of base syllables, disregarding tones, or mono-syllables, with tones,(2) moving along the tree dictionary data structure to find a word, each node of the tree dictionary data structure representing a base syllable or a mono-syllable, and(3) considering in first priority base syllables or mono-syllables adjacent to each other in each word along paths in the tree dictionary data structure in accordance with probabilities of each base syllable or mono-syllable being adjacent to a preceding and a following base syllable or mono-syllable thereof so as to reduce a search space and improve a correct recognition rate,a linguistic knowledge included in the "Chinese Language Models" includes knowledge, rules and information, obtained from a linguistic analysis of parts-of-speech, syntax and semantics of Chinese, in combination with linguistic information obtained from an analysis of a Chinese text corpus, andword occurrence frequencies are used to locate the words in such a manner that frequently used words are considered before less frequently used words.

33. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese Characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;
  
  employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech to calculate probabilities of each of the mono-syllables in the Mandarin speech input and each of a plurality of tones of the Mandarin speech input for further recognition; and
  
  linguistic decoding an output of the acoustic processing, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables,wherein a sub-syllable unit of the sub-syllable units is a phoneme which is affected by a following phoneme, andthe Mandarin speech input comprises continuous speech.

34. A Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- inputting the Mandarin speech, the Mandarin speech being continuous speech;
  
  acoustic processing of the arbitrary sentences of the continuous Mandarin speech, the acoustic processing step comprising;
  
  employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech to calculate probabilities of each of the mono-syllables in the Mandarin speech input and each of a plurality of tones of the Mandarin speech input for further recognition; and
  
  linguistic decoding an output of the acoustic processing, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables,wherein each one of the "sub-syllable units" is formed from an "initial" affected by a starting phoneme of a following "final" thereof or an other "final" unaffected by a preceding and a following phoneme.

35. A method Mandarin speech input method for directly translating arbitrary sentences of a Mandarin speech input into corresponding Chinese characters, comprising:
- acoustic processing of the arbitrary sentences of the Mandarin speech, the acoustic processing step comprising;
  
  employing "Base Syllable Models", formed from "sub-syllable units" of the Mandarin speech, based on "Hidden Markov Models" developed for characteristics of Mandarin mono-syllables in the Mandarin speech, and based on "Tone Models" developed for characteristics of tone of the Mandarin speech, to calculate probabilities of each of the mono-syllables in the Mandarin speech input for recognition; and
  
  linguistic decoding an output of the acoustic processing step, the linguistic decoding comprising;
  
  employing "Chinese Language Models" to locate the corresponding Chinese characters for a sequence of the recognized mono-syllables;
  
  classifying words into word classes, comprising;
  
  (1) classifying the words into a plurality of word groups, each of the word groups having common parts-of-speech, semantics and syntax in accordance with a linguistic knowledge,(2) dividing the words, which are classified into any of the word groups during step 1 into a plurality of word sub-groups with consistent statistical characteristics including statistical characteristics pertaining to preceding words, following words, and word-pairs that tend to be present in a same sentence, obtained from a Chinese text corpus, and(3) recombining the word sub-groups into a final word class, wherein;
  
  the Mandarin speech input comprises continuous speech, andthe "Chinese Language Models" are generated by combining statistical information, resulting from an analysis of probabilities of associativity among "characters", "words" and "word classes" of a Chinese language, with a linguistic knowledge or rules obtained from an analysis of parts-of-speech, syntax and semantics of the Chinese language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Science Council
Original Assignee
Ji-Na Lee
Inventors
Lee, Lin-Shan
Primary Examiner(s)
Voeltz, Emanuel Todd
Assistant Examiner(s)
SOFOCLEOUS, MICHAEL D

Application Number

US08/580,594
Time in Patent Office

1,607 Days
Field of Search

395/2.65, 395/2.51, 395/2.52, 704/2, 704/9, 704/251, 704/253, 704/256, 704/231, 704/233, 704/255, 704/270, 704/275
US Class Current

704/270
CPC Class Codes

G10L 15/144 Training of HMMs

System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

111 Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

111 Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links