Apparatus and method for creating pitch wave signals, apparatus and method for compressing, expanding, and synthesizing speech signals using these pitch wave signals and text-to-speech conversion using unit pitch wave signals
First Claim
1. A speech synthesizing apparatus, the apparatus comprising:
- division means for dividing an input speech signal into a plurality of unit speech samples;
signal creating means for creating a pitch wave signal from each of the unit speech samples, the pitch wave signal comprising a plurality of normalized pitch wave elements which have a substantially identical time length and uniform phase, wherein the pitch wave signal is created in such a way that a pitch signal representing pitch periods in the unit speech sample is generated and the phase of a speech wave in each pitch period is shifted so as to maximize the correlation between the speech wave in the pitch period and the pitch signal and that the phase shifted speech wave in each pitch period is resampled with the same number of samples to make uniform the time length of the speech wave in each pitch period to the same time length;
storage means for storing rhythm information representing the rhythm of each unit speech sample, pitch information representing the pitch of the sample, the spectrum information showing variation with time in the fundamental frequency component and harmonic wave component of the pitch wave signal in such a manner that each of the rhythm information, the pitch information and the spectrum information corresponds to the sample;
prediction means for inputting text information representing a text, and creating prediction information representing the result of predicting the pitch and spectrum of a unit speech constituting the text based on the text information;
retrieval means for identifying a sample having a pitch and spectrum having the highest correlation with the pitch and spectrum of the unit speech constituting the text based on the pitch information, spectrum information and prediction information; and
signal synthesizing means for creating a synthesized speech signal representing a speech in which the speech has a rhythm represented by the rhythm information brought into correspondence with the sample identified by the retrieval means, the variation with time in the fundamental frequency component and harmonic wave component is represented by the spectrum information brought into correspondence with the sample identified by the retrieval means, and the time length of one pitch period is a time length represented by the pitch information brought into correspondence with the sample identified by the retrieval means.
3 Assignments
0 Petitions
Accused Products
Abstract
A pitch wave signal creation method as a preliminary process for efficiently coding a speech wave signal having a fluctuated pitch period is provided. A speech signal compressing/expanding apparatus and a speech signal synthesizing apparatus using the method, and a signal processing associated therewith are further provided. The pitch wave creation method of the invention is essentially comprised of a method of detecting the instantaneous pitch period of each pitch wave element of the speech wave signal, and a process of converting a corresponding pitch wave element into a normalized pitch wave element having a predetermined fixed time length by expanding and compressing the pitch wave element on a time axis while retaining its wave pattern based on the each detected instantaneous pitch period. The speech signal having a pitch fluctuation can be compressed in high quality and high efficiency by coding or synthesizing the speech wave signal using the pitch wave signal creation method of the invention. Text-to-speech conversion using pitch wave signals.
17 Citations
4 Claims
-
1. A speech synthesizing apparatus, the apparatus comprising:
-
division means for dividing an input speech signal into a plurality of unit speech samples; signal creating means for creating a pitch wave signal from each of the unit speech samples, the pitch wave signal comprising a plurality of normalized pitch wave elements which have a substantially identical time length and uniform phase, wherein the pitch wave signal is created in such a way that a pitch signal representing pitch periods in the unit speech sample is generated and the phase of a speech wave in each pitch period is shifted so as to maximize the correlation between the speech wave in the pitch period and the pitch signal and that the phase shifted speech wave in each pitch period is resampled with the same number of samples to make uniform the time length of the speech wave in each pitch period to the same time length; storage means for storing rhythm information representing the rhythm of each unit speech sample, pitch information representing the pitch of the sample, the spectrum information showing variation with time in the fundamental frequency component and harmonic wave component of the pitch wave signal in such a manner that each of the rhythm information, the pitch information and the spectrum information corresponds to the sample; prediction means for inputting text information representing a text, and creating prediction information representing the result of predicting the pitch and spectrum of a unit speech constituting the text based on the text information; retrieval means for identifying a sample having a pitch and spectrum having the highest correlation with the pitch and spectrum of the unit speech constituting the text based on the pitch information, spectrum information and prediction information; and signal synthesizing means for creating a synthesized speech signal representing a speech in which the speech has a rhythm represented by the rhythm information brought into correspondence with the sample identified by the retrieval means, the variation with time in the fundamental frequency component and harmonic wave component is represented by the spectrum information brought into correspondence with the sample identified by the retrieval means, and the time length of one pitch period is a time length represented by the pitch information brought into correspondence with the sample identified by the retrieval means. - View Dependent Claims (2)
-
-
3. A speech synthesizing method, the method comprising the steps of:
-
dividing an input speech signal into a plurality of unit speech samples; creating a pitch wave signal from each of the unit speech samples, the pitch wave signal comprising a plurality of normalized pitch wave elements which have a substantially identical time length and uniform phase, wherein the pitch wave signal is created in such a way that a pitch signal representing pitch periods in the unit speech sample is generated and the phase of a speech wave in each pitch period is shifted so as to maximize the correlation between the speech wave in the pitch period and the pitch signal and that the phase shifted speech wave in each pitch period is resampled with the same number of samples to make uniform the time length of the speech wave in each pitch period to the same time length; storing rhythm information representing the rhythm of each unit speech sample, pitch information representing the pitch of the sample, and spectrum information showing variation with time in the fundamental frequency component and harmonic wave component of the pitch wave signal in such a manner that each of the rhythm information, the pitch information and the spectrum information corresponds to the sample; inputting text information representing a text is inputted to create prediction information representing the result of predicting the pitch and spectrum of a unit speech constituting the text on the basis of the text information; identifying a sample having a pitch and spectrum having the highest correlation with the pitch and spectrum of the unit speech constituting the text on the basis of the pitch information, spectrum information and prediction information; and creating a synthesized speech signal representing a speech in which the speech has a rhythm represented by the rhythm information brought into correspondence with the identified sample, the variation with time in the fundamental frequency component and harmonic wave component is represented by the spectrum information brought into correspondence with the sample identified by the retrieval means, and the time length of one pitch period is a time length represented by the pitch information brought into correspondence with the sample identified by the retrieval means. - View Dependent Claims (4)
-
Specification