Method and apparatus for modeling words with multi-arc markov models

US 5,129,001 A
Filed: 04/25/1990
Issued: 07/07/1992
Est. Priority Date: 04/25/1990
Status: Expired due to Fees

First Claim

Patent Images

1. A method of modeling a word, said method comprising the steps of:

defining a finite set of n speech components, where n is an integer greater than or equal to two;

providing a primitive elemental model for each speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;

combining the first states of at least first and second primitive elemental models of different speech components to form a composite elemental model having at least first and second weighting factors, respectively, each weighting factor having a prior value, said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting factors;

concatenating a series of elemental models to form a word model, at least one elemental model in the series being the composite elemental model;

uttering the word one or more times, each utterance of the word producing an observed sequence of component sounds;

estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of component sounds; and

estimating a posterior value for the first weighting factor from the conditional probability.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Modeling a word is done by concatenating a series of elemental models to form a word model. At least one elemental model in the series is a composite elemental model formed by combining the starting states of at least first and second primitive elemental models. Each primitive elemental model represents a speech component. The primitive elemental models are combined by a weighted combination of their parameters in proportion to the values of the weighting factors. To tailor the word model to closely represent variations in the pronunciation of the word, the word is uttered a plurality of times by a plurality of different speakers. Constructing word models from composite elemental models, and constructing composite elemental models from primitive elemental models enables word models to represent many variations in the pronunciation of a word. Providing a relatively small set of primitive elemental models for a relatively large vocabulary of words enables models to be trained to the voice of a new speaker by having the new speaker utter only a small subset of the words in the vocabulary.

22 Citations

View as Search Results

18 Claims

1. A method of modeling a word, said method comprising the steps of:
- defining a finite set of n speech components, where n is an integer greater than or equal to two;
  
  providing a primitive elemental model for each speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;
  
  combining the first states of at least first and second primitive elemental models of different speech components to form a composite elemental model having at least first and second weighting factors, respectively, each weighting factor having a prior value, said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting factors;
  
  concatenating a series of elemental models to form a word model, at least one elemental model in the series being the composite elemental model;
  
  uttering the word one or more times, each utterance of the word producing an observed sequence of component sounds;
  
  estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of component sounds; and
  
  estimating a posterior value for the first weighting factor from the conditional probability.

2. A method of modeling a word, said method comprising the steps os:
- defining a finite set of n speech components, where n is an integer greater than or equal to two;
  
  providing a primitive elemental model for each speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;
  
  combining the first states of at least first and second primitive elemental models of different speech components to form a compositie elemental model having at least first and second weighting factors, respectively, each weighting factor having a prior value, said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting facotrs;
  
  concatenating a series of elemental models to form a word model, at least one elemental model in the series being the composite elemental model;
  
  uttering the word one or more times, each utterance of the word producing an observed sequence of component sounds;
  
  estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of component sounds; and
  
  estimating a posterior value for the first weighting factor from the conditional probability;
  
  characterized in that the step of estimating the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of component sounds comprises the steps of;
  
  estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the probability of occurrence of the composite elemental model given the occurrence of the observed sequence of component sounds;
  
  estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the joint probability of occurrence of the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of component sounds; and
  
  estimating the conditional probability as the ratio of the joint probability to the probability of occurrence of the composite elemental model given the observed sequence of component sounds.
- View Dependent Claims (3, 4, 5, 6, 7)
- - 3. A method as claimed in claim 2, characterized in that:
    - the step of estimating the probability of occurrence of the composite elemental model given the occurrence of the observed sequence of component sounds comprises the step of estimating, for each component sound in the observed sequence of component sounds, the probability that the component sound was produced by the composite elemental model given the occurrence of the observed sequence of component sounds; and
      
      the step of estimating the joint probability of occurrence of the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of component sounds comprises the step of estimating, for each component sound in the observed sequence of component sounds, the probability that the component sound was produced by the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of component sounds.
  - 4. A method as claimed in claim 3, characterized in that the method further comprises the steps of:
    - estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the second primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of component sounds; and
      
      estimating a posterior value for the second weighting factor from the second conditional probability.
  - 5. A method as claimed in claim 4, characterized in that the step of combining the first states of the first and second primitive elemental models comprises the step of combining the first states of the first and second primitive elemental models by a linear weighted combination.
  - 6. A method as claimed in claim 5, characterized in that the step of uttering the word one or more times comprises the step of uttering the word a plurality of times by a plurality of different speakers.
  - 7. A method as claimed in claim 6, characterized in that the value of the parameter of each primitive elemental model represents a probability of producing a component sound.

8. An apparatus for modeling a word, said apparatus comprising:
- means for storing a finite set of n primitive elemental models, where n is an integer greater than or equal to two, each primitive elemental model representing a speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;
  
  means for combining the first states of at least first and second primitive elemental models of different speech components to form a composite elemental model having at least first and second weighting factors, respectively, each weighting factor having a prior value, said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting factors;
  
  means for concatenating a series of elemental models to form a word model, at least one elemental model in the series being the composite elemental model;
  
  means for measuring the value of at least one feature of one or more utterances of the word, each utterance occurring over a series of successive time intervals, said means measuring the feature value of the utterance during each time interval to produce a sequence of observed acoustic vector signals representing the feature values;
  
  means for estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of acoustic vector signals; and
  
  means for estimating a posterior value for the first weighting factor from the conditional probability.

9. An apparatus for modeling a word, said apparatus comprising:
- means for storing a finite set of n primitive elemental models, where n is an integer greater than or equal to two, each primitive elemental model representing a speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;
  
  means for combining the first states of at least first and second primitive elemental models of different speech components to form a composite elemental model having at least first and second weighting factors, respectively, each weighting factor having a prior value, said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting factors;
  
  means for concatenating a series of elemental models to form a word model, at least one elemental model in the series being the composite elemental model;
  
  means for measuring the value of at least one feature of one or more utterances of the word, each utterance occurring over a series of successive time intervals, said means measuring the feature value of the utterance during each time interval to produce a sequence of observed acoustic vector signals representing the feature values;
  
  means for estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of acoustic vector signals; and
  
  means for estimating a posterior value for the first weighting factor from the conditional probability;
  
  characterized in that the means for estimating the conditional probability of occurrence of the first primitive elemental model given the occurrence of the composite elemental model and given the ocurrence of the observed sequence of acoustic vector signals comprises;
  
  means for estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the probability of occurrence of the composite elemental model given the occurrence of the observed sequence of acoustic vectors signals;
  
  means for estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the joint probability of occurrence of the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of acoustic vector signals; and
  
  means for estimating the conditional probability as the ratio of the joint probability to the probability of occurrence of the composited elemental model given the observed sequence of acoustic vector signals.
- View Dependent Claims (10, 11, 12)
- - 10. An apparatus as claimed in claim 9, characterized in that:
    - the means for estimating the probability of occurrence of the composite elemental model given the occurrence of the observed sequence of acoustic vector signals comprises means for estimating, for each acoustic vector signal in the observed sequence of acoustic vector signals, the probability that the acoustic vector signal was produced by the composite elemental model given the occurrence of the observed sequence of acoustic vector signals; and
      
      the step of estimating the joint probability of occurrence of the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of acoustic vector signals comprises means for estimating, for each acoustic vector signal in the observed sequence of acoustic vector signals, the probability that the acoustic vector signal was produced by the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of acoustic vector signals.
  - 11. An apparatus as claimed in claim 10, further comprising:
    - means for estimating, from the prior values of the first and second weighting factors and from the values of the parameters of the first and second primitive elemental models, the conditional probability of occurrence of the second primitive elemental model given the occurrence of the composite elemental model and given the occurrence of the observed sequence of acoustic vector signals; and
      
      means for estimating a posterior value for the second weighting factor from the second conditional probability.
  - 12. An apparatus as claimed in claim 11, characrterized in that the means for combining th first states of the first and second primitive elemental models comprises means for combining the first states of the first and second primitive elemental models by a linear weighted combination.

13. A method of modeling a word, said method comprising the steps of:
- defining a finite set of n speech components, where n is an integer greater than or equal to two;
  
  providing a primitive elemental model for each speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;
  
  combining the first states of all n primitive elemental models to form a set of composite elemental models, each composite elemental model having n weighting factors Wⁿ for the n primitive elemental models, respectively, each weighting factor having a prior value, for each composite elemental model said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting factors;
  
  concatenating a series of composite elemental models to form a word model;
  
  uttering the word one or more times, each utterance of the word producing an observed sequence of component sounds;
  
  estimating, from the prior values of the weighting factors and from the values of the parameters of the primitive elemental models, the conditional probability of occurrence of each primitive elemental model given the occurrence of each composite elemental model and given the occurrence of the observed sequence of component sounds; and
  
  estimating a posterior value for each weighting factor from the conditional probabilities.

14. A method of modeling a word, said method comprising the steps of:
- defining a finite set of n speech components, where n is an integer greater than or equal to two;
  
  providing a primitive elemental model for each speech component, each primitive elemental model having at least first and second states, at least one transition from the first state to the second state, and at least one parameter having a value;
  
  combining the first states of all n primitive elemental models to form a set of composite elemental models, each composite elemental model having n weighting factors Wⁿ for the n primitive elemental models, respectively, each weighting factor having a prior value, for each composite elemental model said primitive elemental models being combined by a weighted combination of their parameters in proportion to the values of the weighting factors;
  
  concatenating a series of composite elemental models to form a word model;
  
  uttering the word one or more times, each utterance of the word producing an observed sequence component sounds;
  
  estimating, from the prior values of the weighting factors and from the values of the parameters of the primitive elemental models, the conditional probability of occurrence of each primitive elemental model given the occurrence of each composite elemental model and given the occurrence of the observed sequence of component sounds; and
  
  estimating a posterior value for each weighting factor from the conditional probabilities;
  
  characterized in that the step of estimating the conditional probability of occurrence of a primitive elemental model given the occurrence of a composite elemental model and given the occurrence of the observed sequence of component sounds comprises the steps of;
  
  estimating, from the prior values of the weighting factors and from the values of the parameters of the primitive elemental models, the probability of occurrence of the composite elemental model given the occurrence of the observed sequence of component sounds;
  
  estimating, from the prior values of the weighting factors and from the values of the parameters of the primitive elemental models, the joint probability of occurrence of the primitive elemental model and the composite elemental model given the occurrence of the observed sequence of component sounds; and
  
  estimating the conditional probability as the ratio of the joint probability to the probability of occurrence of the composite elemental model given the observed sequence of component sounds.
- View Dependent Claims (15, 16, 17, 18)
- - 15. A method as claimed in claim 14, characterized in that:
    - the step of estimating the probability of occurrence of a composite elemental model given the occurrence of the observed sequence of component sounds comprises the step of estimating, for each component sound in the observed sequence of component sounds, the probability that the component sound was produced by the composite elemental model given the occurrence of the observed sequence of component sounds; and
      
      the step of estimating the joint probability of occurrence of a primitive elemental model and a composite elemental model given the occurrence of the observed sequence of component sounds comprises the step of estimating, for each component sound in the observed sequence of component sounds, the probability that the component sound was produced by the first primitive elemental model and the composite elemental model given the occurrence of the observed sequence of component sounds.
  - 16. A method as claimed in claim 15, characterized in that the step of combining the first states of the primitive elemental models comprises the step of combining the first states of the primitive elemental models by a linear weighted combination.
  - 17. A method as claimed in claim 16, characterized in that the step of uttering the word one or more times comprises the step of uttering the word a plurality of times by a plurality of different speakers.
  - 18. A method as claimed in claim 17, characterized in that the value of the parameter of each primitive elemental model represents a probability of producing a component sound.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bellegarda, Jerome R., Nahamoo, David, Picheny, Michael A., Gopalakrishnan, Ponani S., Bahl, Lalit R., De Souza, Peter V.
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/514,075
Time in Patent Office

804 Days
Field of Search

364/513.5, 381/41-45, 395/2
US Class Current

704/251
CPC Class Codes

G10L 15/144 Training of HMMs

Method and apparatus for modeling words with multi-arc markov models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for modeling words with multi-arc markov models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others