Method and apparatus for speech analysis and synthesis
First Claim
Patent Images
1. A speech analysis method, comprising the steps of:
- obtaining a speech signal and a corresponding DEGG/EGG signal;
providing the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and
estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering, wherein the Kalman filtering is a two-way, bi-directional Kalman filtering comprising a forward Kalman filtering in which a future state is estimated from a past state and a backward Kalman filtering in which a past state is estimated from a future state, and wherein the forward Kalman filtering comprises forward estimation, correction and forward recursion, the backward Kalman filtering comprises backward estimation, correction and backward recursion, and estimation results of the two-way Kalman filtering are a combination of estimation results of the forward Kalman filtering and estimation results of the backward Kalman filtering, wherein Kalman filtering is based on;
a state function
xk=xk-1+dk, andan observation function
vk=ekTxk+nk,wherein, xk=[xk(0), xk(1), . . . xk(N−
1)]T represents the state vector to be estimated of the vocal tract filter at time point k, wherein xk=[xk(0), xk(1), . . . xk(N−
1) represent N samples of the expected unit impulse response of the vocal tract filter at time k;
dk=[dk(0), dk(1), . . . dk(N−
1)]T represents the disturbance added to the state vector of the vocal tract filter at time k;
ek=[ek, ek-1, . . . , ek-N+1]T is a vector, of which the element ek represents the DEGG signal inputted at time k;
vk represents the speech signal outputted at time k; and
nk represents the observation noise added to the outputted speech signal at time k, and whereinthe forward Kalman filtering comprises the steps of;
forward estimation;
xk˜
=xk−
1*,
Pk˜
=Pk−
1+Q correction;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ekTxk˜
]
Pk=[I−
KkekT]Pk≃
forward recursion
k=k+1;
the backward Kalman filtering comprises the steps of;
backward estimation;
xk˜
=xk+1*;
Pk˜
=Pk+1+Q correction;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ek˜
xk˜
]
Pk=[I−
KkekT]Pk˜
backward recursion
k=k−
1;
wherein, xk˜
represents the estimated state value at time point k, xk* represents the corrected state value at time point k, Pk˜
represents the pre-estimated value of the covariance matrix of the estimation error, Pk represents the corrected value of the covariance matrix of the estimation error, Q represents the covariance matrix of disturbance dk, Kk represents the Kalman gain, r represents the variance of the observation noise nk, I represents the unit matrix; and
the estimation results of the two-way Kalman filtering are the combination of the estimation results of the forward Kalman filtering and those of the backward Kalman filtering using the following formula;
Pk=(Pk+−
1+Pk−
−
1)−
1,
xk*=Pk(Pk+−
1xk+*+Pk−
−
1xk−
*),wherein, Pk+, xk+ are the estimated state value and the covariance of the estimation obtained by the forward Kalman filtering respectively, and Pk−
, xk−
represent the estimated state value and the covariance of the estimation obtained by the backward Kalman filtering respectively.
8 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a speech analysis method comprising steps of obtaining a speech signal and a corresponding DEGG/EGG signal; regarding the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering.
-
Citations
8 Claims
-
1. A speech analysis method, comprising the steps of:
-
obtaining a speech signal and a corresponding DEGG/EGG signal; providing the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering, wherein the Kalman filtering is a two-way, bi-directional Kalman filtering comprising a forward Kalman filtering in which a future state is estimated from a past state and a backward Kalman filtering in which a past state is estimated from a future state, and wherein the forward Kalman filtering comprises forward estimation, correction and forward recursion, the backward Kalman filtering comprises backward estimation, correction and backward recursion, and estimation results of the two-way Kalman filtering are a combination of estimation results of the forward Kalman filtering and estimation results of the backward Kalman filtering, wherein Kalman filtering is based on; a state function
xk=xk-1+dk, andan observation function
vk=ekTxk+nk,wherein, xk=[xk(0), xk(1), . . . xk(N−
1)]T represents the state vector to be estimated of the vocal tract filter at time point k, wherein xk=[xk(0), xk(1), . . . xk(N−
1) represent N samples of the expected unit impulse response of the vocal tract filter at time k;dk=[dk(0), dk(1), . . . dk(N−
1)]T represents the disturbance added to the state vector of the vocal tract filter at time k;ek=[ek, ek-1, . . . , ek-N+1]T is a vector, of which the element ek represents the DEGG signal inputted at time k; vk represents the speech signal outputted at time k; and nk represents the observation noise added to the outputted speech signal at time k, and wherein the forward Kalman filtering comprises the steps of; forward estimation;
xk˜
=xk−
1*,
Pk˜
=Pk−
1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ekTxk˜
]
Pk=[I−
KkekT]Pk≃forward recursion
k=k+1;the backward Kalman filtering comprises the steps of; backward estimation;
xk˜
=xk+1*;
Pk˜
=Pk+1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ek˜
xk˜
]
Pk=[I−
KkekT]Pk˜backward recursion
k=k−
1;wherein, xk˜
represents the estimated state value at time point k, xk* represents the corrected state value at time point k, Pk˜
represents the pre-estimated value of the covariance matrix of the estimation error, Pk represents the corrected value of the covariance matrix of the estimation error, Q represents the covariance matrix of disturbance dk, Kk represents the Kalman gain, r represents the variance of the observation noise nk, I represents the unit matrix; andthe estimation results of the two-way Kalman filtering are the combination of the estimation results of the forward Kalman filtering and those of the backward Kalman filtering using the following formula;
Pk=(Pk+−
1+Pk−
−
1)−
1,
xk*=Pk(Pk+−
1xk+*+Pk−
−
1xk−
*),wherein, Pk+, xk+ are the estimated state value and the covariance of the estimation obtained by the forward Kalman filtering respectively, and Pk−
, xk−
represent the estimated state value and the covariance of the estimation obtained by the backward Kalman filtering respectively.- View Dependent Claims (2)
-
-
3. A speech synthesis method, comprising the steps of:
-
obtaining a DEGG/EGG signal; obtaining the features of a vocal tract filter by; obtaining a speech signal and a corresponding DEGG/EGG signal; providing the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering, wherein the Kalman filtering is a two-way, bi-directional Kalman filtering comprising a forward Kalman filtering in which a future state is estimated from a past state and a backward Kalman filtering in which a past state is estimated from a future state, and wherein the forward Kalman filtering comprises forward estimation, correction and forward recursion, the backward Kalman filtering comprises backward estimation, correction and backward recursion, and estimation results of the two-way Kalman filtering are a combination of estimation results of the forward Kalman filtering and estimation results of the backward Kalman filtering; and synthesizing speech based on the DEGG/EGG signal and the obtained features of the vocal tract filter, wherein Kalman filtering is based on; a state function
xk=xk-1+dk, andan observation function
vk=ekTxk+nk,wherein, x=[xk(0), xk(1), . . . , xk(N−
1)]T represents the state vector to be estimated of the vocal tract filter at time point k, wherein xk(0), xk(1), . . . , xk(N−
1) represent N samples of the expected unit impulse response of the vocal tract filter at time k;dk=[dk(0), dk(1), . . . , dk(N−
1)]T represents the disturbance added to the state vector of the vocal tract filter at time k;ek=[ek, ek-1, . . . , ek-N+1]T is a vector, of which the element ek represents the DEGG signal inputted at time k; vk represents the speech at time k; and nk represents the observation noise added to the outputted speech signal at time k, and wherein the forward Kalman filtering comprises the steps of;
xk˜
=xk−
1*,
Pk˜
=Pk−
1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ekTxk˜
]
Pk=[I−
KkekT]Pk˜forward recursion
k=k+1;the backward Kalman filtering comprises the steps of; backward estimation; backward estimation;
xk˜
=xk+1*;
Pk˜
=Pk+1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ek˜
xk˜
]
Pk=[I−
KkekT]Pk˜backward recursion
k=k−
1;wherein, xk˜
represents the estimated state value at time point k, xk* represents the corrected state value at time point Pk˜
resents the re-estimated value of the covariance matrix of the estimation error, Pk represents the corrected value of the covariance matrix of the estimation error, represents the covariance matrix of disturbance dk, Kk represents the Kalman gain, r represents the variance of the observation noise nk, I represents the unit matrix; andthe estimation results of the two-way Kalman filtering are the combination of the estimation results of the forward Kalman filtering and those of the backward Kalman filtering using the following formula;
Pk=(Pk+−
1+Pk−
−
1)−
1,
xk*=Pk(Pk+*+Pk−
−
1xk−
*),wherein, Pk+, xk+ are the estimated state value and the covariance of the estimation obtained by the forward Kalman filtering respectively, and Pk−
, xk−
represent the estimated state value and the covariance of the estimation obtained by the backward Kalman filtering respectively.- View Dependent Claims (4)
-
-
5. A speech analysis apparatus, comprising:
-
a processor and a storage device encoded with modules for execution by the processor, the modules including; a module for obtaining a speech signal; a module for obtaining the corresponding DEGG/EGG signal; and an estimation module for, by regarding the speech signal as the output of a vocal tract filter in a source-filter model with the DEGG/EGG signal as the input, estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the estimation module uses the state vectors of the vocal tract filter at selected time points to express the features of the vocal tract filter, and uses Kalman filtering to perform the estimation, wherein the Kalman filtering is a two-way, bi-directional Kalman filtering comprising a forward Kalman filtering in which a future state is estimated from a past state and a backward Kalman filtering in which a past state is estimated from a future state, and wherein the forward Kalman filtering comprises forward estimation, correction and forward recursion, the backward Kalman filtering comprises backward estimation, correction and backward recursion, and estimation results of the two-way Kalman filtering are a combination of estimation results of the forward Kalman filtering and estimation results of the backward Kalman filtering, wherein the Kalman filtering is based on; a state function
xk=xk−
1+dk, andan observation function
vk=ekTxk+nk,wherein, xk=[xk(0), xk(1), . . . , xk(N−
1)]T represents the state vector to be estimated of the vocal tract filter at time point k, wherein xk(0), xk(1), . . . , xk(N−
1) resent N samples of the expected unit impulse response of the vocal tract filter at time k;dk=[dk(0), dk(1), . . . , dk(N−
1)]T represents the disturbance added to the state vector of the vocal tract filter at time k;ek=[ek, ek−
1, . . . , ek−
N+1]T is a vector, of which the element ek represents the DEGG signal inputted at time k;vk represents the speech signal outputted at time k; and nk represents the observation noise added to the outputted speech signal at time k, and wherein the forward Kalman filtering comprises the following steps; forward estimation;
xk˜
=xk−
1*,
Pk˜
=Pk−
1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ekTxk˜
]
Pk=[I−
KkekT]Pk˜forward recursion
k=k+1;the backward Kalman filtering comprises the following steps; backward estimation;
xk˜
=xk+1*;
Pk˜
=Pk+1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ek˜
xk˜
]
Pk=[I−
KkekT]Pk˜backward recursion
k=k−
1;wherein, xk˜
pre-estimated state value at time point k, xk* represents the corrected state value at time point Pk˜
represents the pre-estimated value of the covariance matrix of the estimation error, Pk represents the corrected value of the covariance matrix of the estimation error, Q represents the covariance matrix of disturbance dk, Kk represents the Kalman gain, r represents the variance of the observation noise nk, represents the unit matrix; andthe estimation results of the two-way Kalman filter are the combination of estimation results of the forward Kalman filter and those of the backward Kalman filtering using the following formula;
Pk=(Pk+−
1+Pk−
−
1)−
1,
xk*=Pk(Pk+*+Pk−
−
1xk−
*),wherein, Pk+, xk+ are the estimated state value and the covariance of the estimation obtained by the forward Kalman filtering respectively, and represent the estimated state value and the covariance of the estimation obtained by the backward Kalman filtering respectively. - View Dependent Claims (6)
-
-
7. A speech synthesis apparatus, comprising:
-
a processor and a storage device encoded with modules for execution by the processor, the modules including; a module for obtaining a DEGG/EGG signal; a speech analysis module comprising; a module for obtaining a speech signal; a module for obtaining the corresponding DEGG/EGG signal; and an estimation module for, by regarding the speech signal as the output of a vocal tract filter in a source-filter model with the DEGG/EGG signal as the input, estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the estimation module uses the state vectors of the vocal tract filter at selected time points to express the features of the vocal tract filter, and uses Kalman filtering to perform the estimation, wherein the Kalman filtering is a two-way, bi-directional Kalman filtering comprising a forward Kalman filtering in which a future state is estimated from a past state and a backward Kalman filtering in which a past state is estimated from a future state, and wherein the forward Kalman filtering comprises forward estimation, correction and forward recursion, the backward Kalman filtering comprises backward estimation, correction and backward recursion, and estimation results of the two-way Kalman filtering are a combination of estimation results of the forward Kalman filtering and estimation results of the backward Kalman filtering; and a speech synthesis module for synthesizing a speech signal based on the DEGG/EGG signal obtained by the module for obtaining a DEGG/EGG signal and the features of the vocal tract filter estimated by the speech analysis apparatus, wherein the Kalman filtering is based on; a state function
xk=xk−
1+dk, andan observation function
vk=ekTxk+nk,wherein, xk=[xk(0), xk(1), . . . , xk(N−
1)]T represents the state vector to be estimated of the vocal tract filter at time point k, wherein xk(0), xk(1), . . . , xk(N−
1) represent N samples of the expected unit impulse response of the vocal tract filter at time k;dk=[dk(0), dk(1), . . . , dk(N−
1)]T represents the disturbance added to the state vector of the vocal tract filter at time k;ek=[ek, ek−
1, . . . , ek−
N+1]T is a vector, of which the element ek represents the DEGG signal inputted at time k;vk represents the speech signal outputted at time k; and nk represents the observation noise added to the outputted speech signal at time k, and wherein the forward Kalman filtering comprises the following steps; forward estimation;
xk˜
=xk−
1*,
Pk˜
=Pk−
1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ekTxk˜
]
Pk=[I−
KkekT]Pk˜forward recursion
k=k+1;the backward Kalman filtering comprises the following steps;
xk˜
=xk+1*;
Pk˜
=Pk+1+Qcorrection;
Kk=Pk˜
ek[ekTPk˜
ek+r]−
1
xk*=xk˜
+Kk[vk−
ek˜
xk˜
]
Pk=[I−
KkekT]Pk˜backward recursion
k=k−
1;wherein, xk˜
represents the pre-estimated state value at time point k, xk* represents the corrected state value at time point k, Pk˜
represents the pre-estimated value of the covariance matrix of the estimation error Pk represents the corrected value of the covariance matrix of the estimation error, Q represents the covariance matrix of disturbance dk,Kk represents the Kalman gain, r represents the variance of the observation noise nk, I represents the unit matrix; andthe estimation results of the two-way Kalman filter are the combination of estimation results of the forward Kalman filter and those of the backward Kalman filtering using the following formula;
Pk=(Pk+−
1+Pk−
−
1)−
1,
xk*=Pk(Pk+*+Pk−
−
1xk−
*),wherein, Pk+,xk+ are the estimated state value and the covariance of the estimation obtained by the forward Kalman filtering respectively, and Pk−
, xk−
represent the estimated state value and the covariance of the estimation obtained by the backward Kalman filtering respectively.- View Dependent Claims (8)
-
Specification