×

Phonetic Hidden Markov model speech synthesizer

  • US 5,230,037 A
  • Filed: 06/07/1991
  • Issued: 07/20/1993
  • Est. Priority Date: 10/16/1990
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for generating synthesized speech wherein an acoustic ergodic hidden Markov model (AEHMM) reflecting constraints on the acoustic arrangement of speech is correlated to a phonetic ergodic hidden Markov model (PhEHMM), the method comprising the steps ofa) building an AEHMM in which an observations sequence comprises speech features vectors extracted from frames in which the speech uttered during the training of said AEHMM is divided, and in which a hidden sequence comprises a sequence of sources that most probably emitted the speech utterance frames;

  • b) initializing said AEHMM by a vector quantization clustering scheme having the same size as said AEHMM;

    c) training said AEHMM by the Forward-Backward algorithm and Baum-Welch re-estimation formulas;

    d) associating with each frame a label representing a most probable source;

    e) building a PhEHMM of the same size as said AEHMM in which an observations sequence comprises phoneme sequence obtained from a written text, and in which a hidden sequence comprises a sequence of labels;

    f) initializing a PhEHMM transition probability matrix by assigning to state transition probabilities the same values as the transition probabilities of the corresponding states of said AEHMM;

    g) initializing PhEHMM observation probability functions by;

    (g.1) using a speech corpus aligned with a sequence of phonemes,(g.2) generating for said speech corpus a sequence of most probable labels, using said AEHMM, and(g.3) computing the observations probability function for each phoneme, counting the number of occurrences of the phoneme in a state divided by the total number of phonemes emitted by said state;

    h) training said PhEHMM by the Baum-Welch algorithm on a proper synthetic observations corpus;

    h.1) providing an input text of one or more words to be synthesized;

    i) determining for each word to be synthesized a phoneme sequence and through said PhEHMM a sequence of labels corresponding to the word to be synthesized by means of a proper optimality criterion;

    j) determining from the input text a set of additional parameters, as energy, prosody contours and voicing, by a prosodic processor;

    k) determining, for the sequence of labels corresponding to the word to be synthesized, a set of speech features vectors corresponding to the word to be synthesized through said AEHMM;

    l) transforming said speech features vectors corresponding to the word to be synthesized into a set of filter coefficients representing spectral information; and

    m) using said set of filter coefficients and said additional parameters in a synthesis filter to produce a synthetic speech output.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×