×

Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system

  • US 10,255,903 B2
  • Filed: 10/06/2015
  • Issued: 04/09/2019
  • Est. Priority Date: 05/28/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by a processing circuit for creating parametric models for use in training a speech synthesis system, wherein the system comprises at least a training text corpus, a speech database, and a model training module, the method comprising:

  • a. obtaining, by the model training module, speech data from the speech database wherein the speech data comprises recorded speech signals and corresponding portions of the training text corpus;

    b. converting, by the model training module, the training text corpus into context dependent phone labels;

    c. extracting, by the model training module, for each frame of speech in the speech signal from the speech data, at least one of;

    spectral features, a plurality of band excitation energy coefficients, and fundamental frequency values using the context dependent phone labels;

    d. forming, by the model training module, a feature vector stream for each frame of speech in the speech signal from the speech data using the at least one of;

    the spectral features, the plurality of band excitation energy coefficients, and the fundamental frequency values;

    e. labeling, by the model training module, each frame of speech in the speech signal with the context dependent phone labels;

    f. extracting, by the model training module, durations of each of the context dependent phone labels from the labeled speech;

    g. forming, by the model training module, context dependent Hidden Markov Models (HMMs) using the feature vector streams and the context dependent phone labels from the labeled speech;

    h. performing, by a parameter generation module, parameter estimation of the speech signal, wherein the parameter estimation is performed comprising the feature vector streams, the HMMs, and decision trees;

    i. identifying a plurality of sub-band Eigen glottal pulses from the speech signal, wherein the sub-band Eigen glottal pulses comprise separate models used to form excitation during synthesis; and

    j. applying the identified plurality of sub-band Eigen glottal pulses from the speech signal to form an excitation signal, wherein the excitation signal is applied in the speech synthesis system to synthesize speech.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×