Constructing Markov models of words from multiple utterances

US 4,759,068 A
Filed: 05/29/1985
Issued: 07/19/1988
Est. Priority Date: 05/29/1985
Status: Expired due to Fees

First Claim

Patent Images

1. In a speech recognition system having an acoustic processor, a method of processing multiple utterances of a word in the construction of a fenemic baseform for the word, the method comprising the steps of:

(a) providing as input a string of fenemes generated by the acoustic processor in response to an utterance of the word;

(b) repeating step (a) for each utterance of the multiple utterances; and

(c) locating a consistent point in each input string of fenemes, wherein each string of fenemes is divided by the consistent point thereof into a left portion and a right portion (i) each of the left portions corresponding to a first sound-representing model in a set of sound-representing models and (ii) each of the right portions corresponding to a second sound-representing model in the set of sound-representing models.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition is improved by splitting each feneme string at a consistent point into a left portion and a right portion. The present invention addresses the problem of constructing fenemic baseforms which take into account variations in pronunciation of words from one utterance thereof to another. Specifically, the invention relates to a method of constructing a fenemic baseform for a word in a vocabulary of word segments including the steps of: (a) transforming multiple utterances of the word into respective strings of fenemes; (b) defining a set of fenemic Markov model phone machines; (c) determining the best single phone machine P₁ for producing the multiple feneme strings; (d) determining the best two phone baseform of the form P₁ P₂ or P₂ P₁ for producing the multiple feneme strings; (e) aligning the best two phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings and the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances including the further step of inhibiting further splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and (k) concatenating the unsplit single phones in an order corresponding to the order of the feneme substrings to which they correspond.

61 Citations

View as Search Results

13 Claims

1. In a speech recognition system having an acoustic processor, a method of processing multiple utterances of a word in the construction of a fenemic baseform for the word, the method comprising the steps of:
- (a) providing as input a string of fenemes generated by the acoustic processor in response to an utterance of the word;
  
  (b) repeating step (a) for each utterance of the multiple utterances; and
  
  (c) locating a consistent point in each input string of fenemes, wherein each string of fenemes is divided by the consistent point thereof into a left portion and a right portion (i) each of the left portions corresponding to a first sound-representing model in a set of sound-representing models and (ii) each of the right portions corresponding to a second sound-representing model in the set of sound-representing models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein said consistent point locating step comprises the steps of:
    - (d) storing a set of fenemic phone machines, each phone machine having (i) a plurality of states;
      
      (ii) transitions between states, each transition having a probability associated therewith; and
      
      (iii), for at least some transitions, a respective probability of producing each feneme at a given transition; and
      
      (e) determining the probability of a phone machine producing each of the input feneme strings;
      
      (f) repeating step (e) for each phone machine; and
      
      (g) selecting the phone machine that has the highest joint probability of producing the input feneme strings.
  - 3. The method of claim 2 wherein said consistent point locating step comprises the further steps of:
    - (h) appending a phone machine in front of the selected phonemachine to form an ordered pair of phone machines and determining the probability of the ordered pair of phone machines producing each of the input strings of fenemes;
      
      (j) repeating step (h) for each phone machine as the appended phone machine;
      
      (k) appending a phone machine at the end of the selected phone machine to form an ordered pair of phone machines and determining the probability of the ordered pair of phone machines producing each of the input strings of fenemes;
      
      (l) repeating step (k) for each phone machine as the appended phone machine;
      
      (m) selecting the ordered pair of the appended phone machine and the selected phone machine that has the highest joint probability of producing the input strings of fenemes.
  - 4. The method of claim 3 wherein said consistent point locating step comprises the further step of:
    - (n) performing an alignment process between the selected ordered pair of phone machines and each input string of fenemes, the most probable point in each string where the two phone machines meet being the consistent point.
  - 5. The method of claim 4 comprising the further steps of:
    - (p) splitting the left portion from the right portion of each input string of fenemes at the respective consistent point thereof;
      
      (q) finding the single phone P_L having the highest joint probability for the left portions of the input strings;
      
      (r) finding the two phone baseform, from among all two phone baseforms that include the phone P_L, which has the highest joint probability of producing the left portions;
      
      (s) if the highest probability two phone baseform including phone P_L is higher than the probability associated with the single phone P_L, (i) aligning each utterance against the found two phone baseform and (ii) splitting the found two phone baseform apart at the point of meeting into a resultant left portion and a resultant right portion; and
      
      (t) performing steps (p) through (s) with the resultant left portion and the resultant right portion being substituted for the left portion and the right portion respectively.
  - 6. The method of claim 5 comprising the further steps of:
    - (u) discontinuing the splitting when a highest probability single phone machine has a higher probability than any two phone baseform that includes the highest probability single phone and an appended phone; and
      
      (v) concatenating the unsplit single phones;
      
      the concatenated baseform forming a basic baseform of the word.
  - 7. The method of claim 6 comprising the further steps of:
    - (w) aligning each input string of fenemes against the baseform of concatenated single phones; and
      
      (x) for a phone in the concatenated baseform, determining the fenemes which are aligned thereagainst and either (i) if there are no aligned fenemes, deleting the phone from the concatenated baseform or (ii) finding the phone which maximizes the probability of producing the determined fenemes and replacing the phone in the concatenated baseform by the found phone if they differ; and
      
      (y) repeating step (x) for each phone in the concatenated baseform.
  - 8. The method of claim 7 comprising the further step of:
    - (z) repeating steps (w), (x), and (y) until each phone in the concatenated sequence has the maximum probability of producing the fenemes aligned therewith;
      
      the baseform resulting from step (z) being a refined baseform for the word.

9. A speech recognition method using a speech input subsystem which converts utterances to feneme strings and a computer, the method being characterized by the steps of:
- (a) finding a best first baseform of phone length one which maximizes the joint probability of producing the feneme strings resulting from multiple utterances of a given word in a vocabulary of words;
  
  (b) finding a best second baseform of phone length two and of the form either (i) P₁ P₂ or (ii) P₂ P₁ which has a higher joint probability than any other baseform of length two;
  
  (c) comparing the joint probability of the first baseform with the joint probability of the second baseform and, if the second baseform joint probability is higher than the joint probability of the first baseform, splitting each feneme string into a left portion and a right portion at the point which maximizes the probability that the left portion is produced by the left phone and the right portion is produced by the right phone;
  
  (d) repeating steps (a) through (c) until all baseforms are of single phone length and no second baseform has a higher probability than its respective first baseform; and
  
  (e) concatenating the baseforms of phone length one remaining after step (d) to form a basic fenemic baseform of the entire word.
- View Dependent Claims (10)
- - 10. The method of claim 9 comprising the further step of:
    - (f) aligning the concatenated baseform against the feneme strings using the Viterbi algorithm and identifying a feneme substring in each string corresponding to each phone in the concatenated baseform; and
      
      (g) determining after alignment, for each phone in the concatenated baseform, any other phone in the set having a higher joint probability of producing the feneme substrings corresponding thereto in the multiple feneme strings.

11. A method of constructing a fenemic baseform for a word in a vocabulary of word segments, the method comprising the steps of:
- (a) transforming multiple utterances of the word into respective strings of fenemes;
  
  (b) defining a set of fenemic Markov model phone machines;
  
  (c) determining the best single phone machine P₁ for producing the multiple feneme strings;
  
  (d) determining the best two phone baseform of the form P₁ P₂ or P₂ P₁ for producing the multiple feneme strings;
  
  (e) aligning the best two phone baseform against each feneme string;
  
  (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform;
  
  (g) identifying each left portion as a left substring and each right portion as a right substring;
  
  (h) processing the set of left substrings in the same manner as the set of feneme strings corresponding to the multiple utterances, including the further step of inhibiting splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform;
  
  (j) processing the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances, including the further step of inhibiting splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and
  
  (k) concatenating the unsplit single phones in an order corresponding the order of the feneme substrings to which they correspond.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11 comprising the further steps of:
    - (l) aligning the concatenated baseform against each of the feneme strings and identifying, for each phone in the concatenated baseform, the substring in each feneme string which corresponds thereto, the substrings corresponding to a given phone being a set of common substrings;
      
      (m) for each set of common substrings, determining the phone machine having the highest joint probability of producing the common substrings; and
      
      (n) for each common substring, replacing the phone therefor in the concatenated baseform by the determined phone of highest joint probability;
      
      the baseform resulting from the replacing of phones being a refined baseform.
  - 13. The method of claim 12 comprising the further step of:
    - (o) repeating steps (l) through (n) until no phones are replaced.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Picheny, Michael A., Bahl, Lalit R., Mercer, Robert L., DeSouza, Peter V.
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US06/738,933
Time in Patent Office

1,147 Days
Field of Search

381/41-43, 364/513.5
US Class Current

704/242
CPC Class Codes

G10L 15/14 using statistical models, e...

Constructing Markov models of words from multiple utterances

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

61 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Constructing Markov models of words from multiple utterances

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links