Speech synthesis using complex spectral modeling
First Claim
Patent Images
1. A method for processing a speech signal, comprising:
- dividing the speech signal into a succession of frames;
identifying one or more of the frames as click frames;
extracting phase information from the click frames; and
encoding the speech signal using the phase information.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for processing a speech signal includes dividing the speech signal into a succession of frames, identifying one or more of the frames as click frames, and extracting phase information from the click frames. The speech signal is encoded using the phase information. Methods are also provided for modeling phase spectra of voiced frames and click frames.
67 Citations
24 Claims
-
1. A method for processing a speech signal, comprising:
-
dividing the speech signal into a succession of frames;
identifying one or more of the frames as click frames;
extracting phase information from the click frames; and
encoding the speech signal using the phase information. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for processing a speech signal, comprising:
-
dividing the speech signal into a succession of frames;
identifying some of the frames as unvoiced frames;
processing the unvoiced frames to identify one or more click frames among the unvoiced frames; and
encoding the speech signal by applying a first modeling method to the click frames and a second modeling method, different from the first modeling method, to the unvoiced frames that are not click frames. - View Dependent Claims (7, 8)
-
-
9. A method for processing a speech signal, comprising:
-
dividing the speech signal into a succession of frames;
identifying some of the frames as voiced frames;
modeling a phase spectrum of each of at least some of the voiced frames as a linear combination of basis functions covering different, respective frequency channels, wherein the model parameters correspond to respective coefficients of the basis functions; and
encoding the speech signal using the modeled phase spectrum. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 18)
-
-
17. A method for processing a speech signal, comprising:
-
dividing the speech signal into a succession of frames;
identifying some of the frames as voiced frames;
computing a time-domain model of a phase spectrum of each of at least some of the voiced frames; and
encoding the speech signal using the modeled phase spectrum.
-
-
19. A method for synthesizing speech, comprising:
-
receiving spectral model parameters with respect to a voiced frame of the speech to be synthesized, the parameters comprising high-frequency parameters and low-frequency parameters;
determining a pitch frequency of the voiced frame;
applying the low-frequency parameters to one or more low harmonics of the pitch frequency in order to generate a low-frequency speech component;
applying the high-frequency parameters to one or more high harmonics of the pitch frequency while applying a frequency jitter to the high harmonics in order to generate a high-frequency speech component; and
combining the low- and high-frequency components of the voiced frame into a sequence of frames of the speech in order to generate an output speech signal.
-
-
20. Apparatus for processing a speech signal, comprising a speech processor, which is arranged to divide the speech signal into a succession of frames, to identify one or more of the frames as click frames, to extract phase information from the click frames, and to encode the speech signal using the phase information.
-
21. Apparatus for processing a speech signal, comprising a speech processor, which is arranged to divide the speech signal into a succession of frames, to identify some of the frames as voiced frames, to model a phase spectrum of each of at least some of the voiced frames as a linear combination of basis functions covering different, respective frequency channels, wherein the model parameters correspond to respective coefficients of the basis functions, and to encode the speech signal using the modeled phase spectrum.
-
22. Apparatus for processing a speech signal, comprising a speech processor, which is arranged to divide the speech signal into a succession of frames, to identify some of the frames as voiced frames, to compute a time-domain model of a phase spectrum of each of at least some of the voiced frames, and to encode the speech signal using the modeled phase spectrum.
-
23. Apparatus for synthesizing a speech signal, comprising:
-
a memory, which is arranged to store a database of speech segments, each segment comprising a succession of frames, such that at least some of the frames are identified as voiced frames, and the database comprises an encoded model of a phase spectrum of each of at least some of the voiced frames; and
a speech synthesizer, which is arranged to synthesize a speech output comprising one or more of the voiced frames using the encoded model of the phase spectrum in the database.
-
-
24. A computer software product for processing a speech signal, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to divide the speech signal into a succession of frames, to identify one or more of the frames as click frames, to extract phase information from the click frames, and to encoded the speech signal using the phase information.
Specification