SPEECH PROCESSING SYSTEM
First Claim
Patent Images
1. A method of deriving speech synthesis parameters from an audio signal, the method comprising:
- receiving an input speech signal;
estimating the position of glottal closure incidents from said audio signal;
deriving a pulsed excitation signal from the position of the glottal closure incidents;
segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal;
processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum;
reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter;
comparing said reconstructed speech signal with said input speech signal; and
calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstructed speech signal and the input speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of deriving speech synthesis parameters from an audio signal, the method comprising:
- receiving an input speech signal;
- estimating the position of glottal closure incidents from said audio signal;
- deriving a pulsed excitation signal from the position of the glottal closure incidents;
- segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal;
- processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum;
- reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter;
- comparing said reconstructed speech signal with said input speech signal; and
- calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstructed speech signal and the input speech.
-
Citations
18 Claims
-
1. A method of deriving speech synthesis parameters from an audio signal, the method comprising:
-
receiving an input speech signal; estimating the position of glottal closure incidents from said audio signal; deriving a pulsed excitation signal from the position of the glottal closure incidents; segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal; processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum; reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter; comparing said reconstructed speech signal with said input speech signal; and calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstructed speech signal and the input speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 16, 17)
-
-
11. A text to speech method, the method comprising:
-
receiving input text; extracting labels from said input text; using said labels to extract speech parameters which have been stored in a memory, generating a speech signal from said extracted speech parameters wherein said speech signal is generated using a source filter model which produces speech using an excitation signal and a synthesis filter, said speech parameters comprising complex cepstrum parameters. - View Dependent Claims (12, 18)
-
-
14. A system for extracting speech synthesis parameters from an audio signal, the system comprising a processor adapted to:
-
receive an input speech signal; estimate the position of glottal closure incidents from said audio signal; derive a pulsed excitation signal from the position of the glottal closure incidents; segment said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal; process the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum; reconstruct said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter; compare said reconstructed speech signal with said input speech signal; and calculate the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstructed speech signal and the input speech.
-
-
15. A text to speech system, the system comprising a memory and a processor adapted to:
-
receive input text; extract labels from said input text; use said labels to extract speech parameters which have been stored in the memory; and generate a speech signal from said extracted speech parameters wherein said speech signal is generated using a source filter model which produces speech using an excitation signal and a synthesis filter, said speech parameters comprising complex cepstrum parameters.
-
Specification