Speech synthesis with dynamic constraints
First Claim
1. A method for providing speech parameters to be used for synthesis of a speech utterance, comprising:
- receiving an input time series of first speech parameter vectors {xi}1 . . . m allocated to synchronisation points 1 to m indexed by i, wherein each synchronisation point is defining a point in time or a time interval of the speech utterance and each first speech parameter vector xi consists of a number of n1 static speech parameters of a time interval of the speech utterance,preparing at least one input time series of second speech parameter vectors {Δ
i}1 . . . m allocated to the synchronisation points 1 to m, wherein each second speech parameter vector Δ
i consists of a number of n2 dynamic speech parameters of a time interval of the speech utterance,extracting from the input time series of first and second speech parameter vectors {xi}1 . . . m and {Δ
i}1 . . . m partial time series of first speech parameter vectors {xi}p . . . q and corresponding partial time series of second speech parameter vectors {Δ
i}p . . . q wherein p is the index of the first and q is the index of the last extracted speech parameter vector,converting the corresponding partial time series of first and second speech parameter vectors {xi}p . . . q and {Δ
i}p . . . q into partial time series of third speech parameter vectors {yi}p . . . q, wherein the partial time series of third speech parameter vectors {yi}p . . . q minimises differences to the partial time series of first speech parameter vectors {xi}p . . . q, the dynamic characteristics of {yi}p . . . q minimise differences to the partial time series of second speech parameter vectors {Δ
i}p . . . q, and the conversion is done independently for each partial time series of third speech parameter vectors {yi}p . . . q and can be started as soon as the vectors p to q of the input time series of the first speech parameter vectors {xi}1 . . . m have been received and corresponding vectors p to q of second speech parameter vectors {Δ
i}1 . . . m have been prepared, andcombining the speech parameter vectors of the partial time series of third speech parameter vectors {yi}p . . . q to form a time series of output speech parameter vectors {ŷ
i}1 . . . m allocated to the synchronisation points, wherein the time series of output speech parameter vectors {ŷ
i}1 . . . m is provided to be used for synthesis of the speech utterance.
8 Assignments
0 Petitions
Accused Products
Abstract
A method is disclosed for providing speech parameters to be used for synthesis of a speech utterance. In at least one embodiment, the method includes receiving an input time series of first speech parameter vectors, preparing at least one input time series of second speech parameter vectors consisting of dynamic speech parameters, extracting from the input time series of first and second speech parameter vectors partial time series of first speech parameter vectors and corresponding partial time series of second speech parameter vectors, converting the corresponding partial time series of first and second speech parameter vectors into partial time series of third speech parameter vectors, wherein the conversion is done independently for each set of partial time series and can be started as soon as the vectors of the input time series of the first speech parameter vectors have been received. The speech parameter vectors of the partial time series of third speech parameter vectors are combined to form a time series of output speech parameter vectors to be used for synthesis of the speech utterance. At least one embodiment of the method allows a continuous providing of speech parameter vectors for synthesis of the speech utterance. The latency and the memory requirements for the synthesis of a speech utterance are reduced.
-
Citations
19 Claims
-
1. A method for providing speech parameters to be used for synthesis of a speech utterance, comprising:
-
receiving an input time series of first speech parameter vectors {xi}1 . . . m allocated to synchronisation points 1 to m indexed by i, wherein each synchronisation point is defining a point in time or a time interval of the speech utterance and each first speech parameter vector xi consists of a number of n1 static speech parameters of a time interval of the speech utterance, preparing at least one input time series of second speech parameter vectors {Δ
i}1 . . . m allocated to the synchronisation points 1 to m, wherein each second speech parameter vector Δ
i consists of a number of n2 dynamic speech parameters of a time interval of the speech utterance,extracting from the input time series of first and second speech parameter vectors {xi}1 . . . m and {Δ
i}1 . . . m partial time series of first speech parameter vectors {xi}p . . . q and corresponding partial time series of second speech parameter vectors {Δ
i}p . . . q wherein p is the index of the first and q is the index of the last extracted speech parameter vector,converting the corresponding partial time series of first and second speech parameter vectors {xi}p . . . q and {Δ
i}p . . . q into partial time series of third speech parameter vectors {yi}p . . . q, wherein the partial time series of third speech parameter vectors {yi}p . . . q minimises differences to the partial time series of first speech parameter vectors {xi}p . . . q, the dynamic characteristics of {yi}p . . . q minimise differences to the partial time series of second speech parameter vectors {Δ
i}p . . . q, and the conversion is done independently for each partial time series of third speech parameter vectors {yi}p . . . q and can be started as soon as the vectors p to q of the input time series of the first speech parameter vectors {xi}1 . . . m have been received and corresponding vectors p to q of second speech parameter vectors {Δ
i}1 . . . m have been prepared, andcombining the speech parameter vectors of the partial time series of third speech parameter vectors {yi}p . . . q to form a time series of output speech parameter vectors {ŷ
i}1 . . . m allocated to the synchronisation points, wherein the time series of output speech parameter vectors {ŷ
i}1 . . . m is provided to be used for synthesis of the speech utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19)
-
-
16. A speech synthesis processor for providing output speech parameters to be used for synthesis of a speech utterance, said processor comprising:
-
receiving means for receiving an input time series of first speech parameter vectors {xi}1 . . . m allocated to synchronisation points 1 to m indexed by i, wherein each synchronisation point is defining a point in time or a time interval of the speech utterance and each first speech parameter vector xi consists of a number of n1 static speech parameters of a time interval of the speech utterance, preparing means for preparing at least one input time series of second speech parameter vectors {Δ
i}1 . . . m allocated to the synchronisation points 1 to m, wherein each second speech parameter vector Δ
i consists of a number of n2 dynamic speech parameters of a time interval of the speech utterance,extracting means for extracting from the input time series of first and second speech parameter vectors {xi}1 . . . m and {Δ
i}1 . . . m partial time series of first speech parameter vectors {xi}p . . . q and corresponding partial time series of second speech parameter vectors {Δ
i}p . . . q wherein p is the index of the first and q is the index of the last extracted speech parameter vector,converting means for converting the corresponding partial time series of first and second speech parameter vectors {xi}p . . . q and {Δ
i}p . . . q into partial time series of third speech parameter vectors {yi}p . . . q, wherein the partial time series of third speech parameter vectors {yi}p . . . q minimises differences to the partial time series of first speech parameter vectors {xi}p . . . q, the dynamic characteristics of {yi}p . . . q minimise differences to the partial time series of second speech parameter vectors {Δ
i}p . . . q, and the conversion is done independently for each partial time series of third speech parameter vectors {yi}p . . . q and can be started as soon as the vectors p to q of the input time series of the first speech parameter vectors {xi}1 . . . m have been received and corresponding vectors p to q of second speech parameter vectors {Δ
i}1 . . . m have been prepared, andcombining means for combining the speech parameter vectors of the partial time series of third speech parameter vectors {yi}p . . . q to form a time series of output speech parameter vectors {ŷ
i}1 . . . m allocated to the synchronisation points, wherein the time series of output speech parameter vectors {ŷ
i}1 . . . m is provided to be used for synthesis of the speech utterance.
-
Specification