System and method of intelligent Mandarin speech input for Chinese computers

US 5,787,230 A
Filed: 12/09/1994
Issued: 07/28/1998
Est. Priority Date: 12/09/1994
Status: Expired due to Term

First Claim

Patent Images

1. A Mandarin speech input method for directly translating a plurality of spoken words of Mandarin speech into corresponding Chinese characters, comprising steps of:

acoustic processing of the Mandarin speech, the acoustic processing step employing "Segmental Probability Models" to calculate probabilities of each of a plurality of mono-syllables in the Mandarin speech input and each of a plurality of tones thereof for further recognition; and

linguistic decoding of the plurality of mono-syllables recognized by the acoustic processing step, the linguistic decoding step employing "Word-class-based Markov Chinese Language Models" to locate the corresponding Chinese characters for a series of the plurality of mono-syllables.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A Mandarin speech input system and method for directly translating arbitrary sentences of Mandarin speech into corresponding Chinese characters is disclosed. The system and method comprises acoustic processing section and linguistic decoding section. The acoustic processing step employs "Segmental Probability Models" to calculate the probabilities of each of the mono-syllables in the Mandarin speech input and each of the tones thereof for further recognition. The linguistic decoding step employs "Word-class-based Chinese Language Models" to locate the corresponding Chinese characters for the series of recognized syllables provided by said acoustic processing step. A Mandarin dictation machine translates the provided speeches into characters in accordance with above method and displays the characters. The dictation machine is featured for the "intelligence" that can "learn" if taught by including several "intelligent learning techniques" such as automatic new user'"'"'s voice learning to enable the new user can use the dictation machine quickly, automatic environmental noise learning to adapt to the environmental noise in the user'"'"'s environment, and continuous on-line learning of user'"'"'s voice, special words, wording and sentence styles to continuously improve the correct recognition rate.

81 Citations

View as Search Results

19 Claims

1. A Mandarin speech input method for directly translating a plurality of spoken words of Mandarin speech into corresponding Chinese characters, comprising steps of:
- acoustic processing of the Mandarin speech, the acoustic processing step employing "Segmental Probability Models" to calculate probabilities of each of a plurality of mono-syllables in the Mandarin speech input and each of a plurality of tones thereof for further recognition; and
  
  linguistic decoding of the plurality of mono-syllables recognized by the acoustic processing step, the linguistic decoding step employing "Word-class-based Markov Chinese Language Models" to locate the corresponding Chinese characters for a series of the plurality of mono-syllables.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The Mandarin speech input method of claim 1, whereinthe "Word-class-based Markov Chinese Language Models" are based on "Chinese word classes", the "Word-class-based Markov Chinese Language Models" using beginning characters and ending characters to calculate the probabilities.
  - 3. The Mandarin speech input method of claim 2, whereinthe "Word-class-based Markov Chinese Language Models" divide homonyms corresponding to the series of the plurality of mono-syllables into a plurality of words and determine exact characters in each of the plurality of mono-syllables by comparing probabilities of an associativity between each of the ending characters and each of the beginning characters that represent each preceding word and each following word respectively and by comparing occurrence frequencies of each of the plurality of words.
  - 4. The Mandarin speech input method of claim 1, wherein a training algorithm of the "Segmental Probability Models" comprises:
    - dividing one of the plurality of mono-syllables having T frames of duration into a plurality of equal N segments, each of the plurality of equal N segments including a plurality of equal T/N frames;
      
      pronouncing the one of the plurality of mono-syllables by a user for L times to constitute L utterances of the one of the plurality of mono-syllables, and dividing each of the L utterances, which may be different in duration, into the plurality of equal N segments;
      
      using a first plurality of combined feature vectors in a first segment of the plurality of equal N segments to train a state of the first segment of the plurality of equal N segments;
      
      using a second plurality of combined feature vectors in each successive segment of the plurality of equal N segments to train a state of each of the successive segments of the plurality of equal N segments, and repeating the step of using the plurality of combined feature vectors in each successive segment until all N states of the plurality of equal N segments have been trained;
      
      describing each one of the N states with M Mixtures of Gaussian Probabilities, and training a plurality of parameters of each of the Gaussian Probabilities with the first plurality of combined feature vectors and the second plurality of combined feature vectors in the T frames; and
      
      establishing the "Segmental Probability Models" of the one of the plurality of mono-syllables with the N states.
  - 5. The Mandarin speech input method of claim 4, wherein the training algorithm of the "Segmental Probability Models" further comprises a "Segment Sharing" training algorithm comprising:
    - dividing the input one of the plurality of mono-syllables into N segments, in which a first plurality of the N segments, describe an "initial" of the one of the plurality of mono-syllables and a following plurality of the N segments describe a "final" of the one of the plurality of mono-syllables; and
      
      training states of a plurality of particular segments of a plurality of other mono-syllables, with a common "initial" or a common "final" that corresponds to the "initial" or the "final", by means of the first plurality of the N segments or the following plurality of the N segments, using the L utterances of the one of the plurality of mono-syllables.
  - 6. The Mandarin speech input method of claim 4, wherein a recognition algorithm of the "Segmental Probability Models" comprises:
    - training the "Segmental Probability Models" of all 408 basic mono-syllables;
      
      dividing an unknown input mono-syllable into a plurality of N segments;
      
      applying the plurality of combined feature vectors of each of the T/N frames in each of the plurality of N segments to the M Mixtures of the Gaussian Probabilities representing one segment of one of the 408 basic mono-syllables respectively under the "Segmental Probability Models" to calculate corresponding probabilities;
      
      multiplying the corresponding probabilities of each of the plurality of N segments to get a probability of the unknown mono-syllable with respect to the one of the 408 basic mono-syllables under the "Segmental Probability Models"; and
      
      calculating probabilities of the unknown mono-syllable with respect to each of the 408 basic mono-syllables in a way similar to the multiplying step, and determining a recognition result by selecting one of the 408 basic mono-syllables corresponding to a highest probability under the "Segmental Probability Models".
  - 7. The Mandarin speech input method of claim 1, wherein the "Word-class-based Chinese Language Models" can be used to correct some errors of the plurality of mono-syllables provided by the acoustic processing step.

8. A learning method of a Mandarin speech recognition system for quickly adapting to a voice of a new user to recognize a Mandarin speech input of the new user, the learning method training, in advance, each mono-syllable of a plurality of mono-syllables as "Segmental Probability Models" including feature parameters of each of the mono-syllables of the plurality of mono-syllables pronounced by different users, comprising:
- training a plurality of pronunciations by many speakers with respect to one mono-syllable of the plurality of mono-syllables as the "Segmental Probability Models", in which a plurality of Mixtures of Gaussian Probabilities is required to describe each state of the one mono-syllable in consideration of different feature parameters of the many speakers;
  
  pronouncing the one mono-syllable by the new user and establishing the "Segmental Probability Models" of the new user by selecting a plurality of Mixtures of Gaussian Probabilities having feature parameters close to the feature parameters of the new user from the plurality of Mixtures of Gaussian Probabilities under the "Segmental Probability Models" for the many speakers and by de-emphasizing other unnecessary Mixtures of Gaussian Probabilities;
  
  calculating new Mixtures of Gaussian Probabilities and updating new "Segmental Probability Models" by averaging feature vectors of a plurality of segments of a new pronunciation of the one mono-syllable when the new user continuously pronounces the one mono-syllable; and
  
  repeating the calculating step so that a ratio of the Mandarin speech of the new user in the new "Segmental Probability Models" will be gradually increased to result in the new "Segmental Probability Models" that can better describe the Mandarin speech of the new user.
- View Dependent Claims (9)
- - 9. The learning method of claim 8, further comprising:
    - correcting recognition errors generated by the Mandarin speech recognition system on an on-line basis by means of a screen display; and
      
      repeating the calculating step and the repeating step immediately so that the Mandarin speech recognition system can learn new speech and can use the new "Segmental Probability Models" in a next recognition to continuously increase a correct recognition rate.

10. A Mandarin dictation machine for recognizing Mandarin speech, comprising:
- an analog-to-digital converter with a filter for filtering and converting speech input signals into digital signals;
  
  a computer coupled with a digital signal processing board for receiving and processing the digital signals provided by the analog-to-digital converter;
  
  a pitch frequency detector;
  
  a feature abstraction apparatus, the feature abstraction apparatus and the pitch frequency detector both being coupled to the computer for detecting and calculating a pitch frequency and other feature parameters of the digital signals received by the computer;
  
  Segmental Probability Models processing means coupled with a Mixed Gaussian Probabilities processing means, after calculating an endpoint of each of a plurality of mono-syllables, for recognizing a basic mono-syllable of the plurality of mono-syllables and a tone thereof;
  
  word-class-based Markov Chinese Language Models processing means, which calculates probabilities with characters, for calculating probabilities of all possible homonym characters of each of the plurality of mono-syllables input and transferring recognized results to the computer; and
  
  training means for training first probabilities of all of the basic mono-syllables and tones under "Segmental Probability Models" and training second probabilities under "Word-class-based Markov Chinese Language Models", and for transferring both the first probabilities and the second probabilities to the computer.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The Mandarin dictation machine of claim 10, wherein speech is input to the Mandarin dictation machine employing an isolated mono-syllable as an input unit.
  - 12. The Mandarin dictation machine of claim 10, further comprising:
    - a display screen for displaying input phonetic symbols and Chinese characters corresponding to the input Mandarin speech; and
      
      error-correction computer code means for a user to directly correct errors on the display screen by using a mouse without touching a keyboard.
  - 13. The Mandarin dictation machine of claim 10, further comprisingdynamic short-term cache memory means for temporarily storing a vocabulary and a plurality of favorite words of the user or a plurality of specific words that are repetitively present in a block of input texts, whereinthe plurality of favorite words or the plurality of specific words can be stored in different memory areas in accordance with a respective occurrence frequency, andthe plurality of favorite words or the plurality of specific words along with respective occurrence frequency information thereof can be merged in global Chinese Language Models of the Mandarin dictation machine.
  - 14. The Mandarin dictation machine of claim 13, wherein the dynamic short-term cache memory means further comprises a Frequently Used Dictionary and a Less Frequently Used Dictionary such that the Mandarin dictation machine will first search the Frequently Used Dictionary during operation, and will then search the Less Frequently Used Dictionary if a required word can not be located in the Frequently Used Dictionary, the required word located in the Less Frequently Used Dictionary will be stored in the Frequently Used Dictionary, while some words of a plurality of words in the Frequently Used Dictionary can be moved to the Less Frequently Used Dictionary when the some words of the plurality of words have not been frequently used within a specific time period.
  - 15. The Mandarin dictation machine of claim 10, further comprisingdynamic short-term cache memory means for temporarily storing a vocabulary and a plurality of favorite words of the user or a plurality of specific words that are repetitively present in a block of input texts, whereinthe plurality of favorite words or the plurality of specific words can be stored in different memory areas in accordance with a respective occurrence frequency, andthe plurality of favorite words or the plurality of specific words along with respective occurrence frequency information thereof can be cleared from the different memory areas after a completion of inputting the block of input texts.

16. A learning method for training a Mandarin dictation machine to be adaptive to a voice of a new user, comprising:
- repetitively pronouncing a plurality of selected sentences that include all basic acoustic units of Mandarin speech including initials, finals and basic mono-syllables within a minimum number of possible characters such that frequently used basic acoustic units will occur frequently in the plurality of selected sentences, whereinthe repetitive pronouncing step better trains "Segmental Probability Models" and trains the Mandarin dictation machine to be adaptive to pronunciations of the new user, the pronunciations of the new user being stored in the Mandarin dictation machine.
- View Dependent Claims (17, 18, 19)
- - 17. The learning method of claim 16, wherein the plurality of selected sentences for training the Mandarin dictation machine to be adaptive to the voice of the new user is selected from a source text file by a computer performing steps of:
    - setting different scores for all of the basic acoustic units of the Mandarin speech;
      
      calculating a total score of each sentence of a plurality of sentences of the source text file such that a sentence of the plurality of sentences including more different basic acoustic units will obtain a higher total score;
      
      selecting, with a higher priority the plurality of sentences with higher total scores; and
      
      describing an occurrence distribution of each of all of the basic acoustic units by means of a parameter which is also used as a criterion for selection of the plurality of selected sentences.
  - 18. The learning method of claim 16, further comprising:
    - on-line learning during a learning stage or during practical use of the Mandarin dictation machine, whereinthe Mandarin dictation machine learns correct pronunciation and words when the new user corrects text errors resulting from a recognition by the Mandarin dictation machine, and the Mandarin dictation machine stores corresponding parameters of pronunciation corrected by the new user.
  - 19. The learning method of claim 16, further comprising:
    - pronouncing a mono-syllable by the new user and establishing the "Segmental Probability Models" of the new user by selecting a plurality of Mixtures of Gaussian Probabilities under the "Segmental Probability Models" for many speakers and de-emphasizing other unnecessary Mixtures of Gaussian Probabilities;
      
      calculating new Mixtures of Gaussian Probabilities and updating the "Segmental Probability Models" by averaging feature vectors of a plurality of segments of a new pronunciation of the mono-syllable when the new user continuously pronounces the mono-syllable;
      
      repeating the calculating step so that a ratio of the Mandarin speech of the new user in the "Segmental Probability Models" that can better describe the Mandarin speech of the new user;
      
      automatically averaging environmental noise in an environment of the new user into the "Segmental Probability Models" to make the Mandarin dictation machine adaptive to ambient noise in the environment of the new user, whereinthe step of automatically averaging the environmental noise is performed at a same time as the pronouncing step, the calculating step, and the repeating step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Science Council
Original Assignee
Lee, Lin-Shan
Inventors
Lee, Lin-Shan
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/352,587
Time in Patent Office

1,327 Days
Field of Search

395/2.44, 395/2, 395/2.6, 395/2.5, 395/2.65, 395/2.42, 395/2.49, 395/2.64, 395/2.62, 395/2.86, 381/43, 381/41, 381/42, 381/44, 364/419
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/197   Probabilistic grammars, e.g...

G10L 25/15   the extracted parameters be...

System and method of intelligent Mandarin speech input for Chinese computers

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

81 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of intelligent Mandarin speech input for Chinese computers

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links