SPEECH SYNTHESIZER, AND SPEECH SYNTHESIS METHOD AND COMPUTER PROGRAM PRODUCT
First Claim
1. A speech synthesizer comprising:
- a statistical-model sequence generator, implemented in computer hardware, configured to generate, based at least in part on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states;
a multiple-acoustic feature parameter sequence generator, implemented in computer hardware, configured to, for each speech section corresponding to each state of the statistical model sequence, select a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generate a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters; and
a waveform generator, implemented in computer hardware, configured to generate a distribution sequence based at least in part on the multiple-acoustic feature parameter sequence and generate a second speech waveform based at least in part on a second set of acoustic feature parameters generated based at least in part on the distribution sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesizer includes a statistical-model sequence generator, a multiple-acoustic feature parameter sequence generator, and a waveform generator. The statistical-model sequence generator generates, based on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states. The multiple-acoustic feature parameter sequence generator, for each speech section corresponding to each state of the statistical model sequence, selects a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generates a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters. The waveform generator generates a distribution sequence based on the multiple-acoustic feature parameter sequence and generates a second speech waveform based on a second set of acoustic feature parameters generated based on the distribution sequence.
23 Citations
16 Claims
-
1. A speech synthesizer comprising:
-
a statistical-model sequence generator, implemented in computer hardware, configured to generate, based at least in part on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states; a multiple-acoustic feature parameter sequence generator, implemented in computer hardware, configured to, for each speech section corresponding to each state of the statistical model sequence, select a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generate a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters; and a waveform generator, implemented in computer hardware, configured to generate a distribution sequence based at least in part on the multiple-acoustic feature parameter sequence and generate a second speech waveform based at least in part on a second set of acoustic feature parameters generated based at least in part on the distribution sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A speech synthesis method executed in a speech synthesizer, the speech synthesis method comprising:
-
generating, based at least in part on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states; selecting a first plurality of acoustic feature parameters out of a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generating a multiple-acoustic feature parameter sequence that comprises a sequence of the selected first plurality of acoustic feature parameters, for each speech section corresponding to each state of the statistical model sequence; and generating a distribution sequence based at least in part on the multiple-acoustic feature parameter sequence and generating a second speech waveform based at least in part on a second set of acoustic feature parameters generated from the distribution sequence.
-
-
16. A computer program product comprising a non-transitory computer-readable medium that stores therein a computer program that causes a computer to execute:
-
generating, based at least in part on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states; selecting a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generating a multiple-acoustic feature parameter sequence that comprises a sequence of the selected plurality acoustic feature parameters, for each speech section corresponding to each state of the statistical model sequence; and generating a distribution sequence based at least in part on the multiple-acoustic feature parameter sequence and generating a second speech waveform based at least in part on a second set of acoustic feature parameters generated from the distribution sequence.
-
Specification