Employing speech models in concatenative speech synthesis
First Claim
1. An arrangement for creating synthesized speech from an applied sequence of desired speech unit features parameter sets, D-SUF(i), i=2,3, . . . , comprising:
- a database that contains a plurality of sets, E(k), k=1,2, . . . ,K, where K is an integer, each set E(k) includinga plurality of associated frames in sequence, each of said frames being represented bya collection of model feature parameters, andT-D data representing a time-domain speech signalcorresponding to said frame, anda collection of unit selection parameters which characterize the model feature parameters of the speech frames in the set E(k);
a database search engine that, for each applied D-SUF(i), selects from said database a set E(i) having a collection of unit selection parameters that match best said D-SUF(i), and said plurality of frames that are associated with said E(i), thus creating a sequence of frames;
an evaluator that determines, based on assessment of information obtained from said database and pertaining to said E(i), whether modifications are needed to frames of said E(i);
a modification and synthesis module that, when said evaluator concludes that modifications to frames are needed, modifies the collection of model parameters of those frames that need modification, and generates, for each frame having a modified collection of model parameters, T-D data corresponding to said frame; and
a combiner that concatenates T-D data of successive frames in said sequence of frames, by employing, for each concatenated frame, the T-D data generated for said concatenated frame by said modification and synthesis module, if such T-D data was generated, or T-D data retrieved for said concatenated frame from said database.
10 Assignments
0 Petitions
Accused Products
Abstract
A text-to-speech synthesizer employs database that includes units. For each unit there is a collection of unit selection parameters and a plurality of frames. Each frame has a set of model parameters derived from a base speech frame, and a speech frame synthesized from the frame'"'"'s model parameters. A text to be synthesized is converted to a sequence of desired unit features sets, and for each such set the database is perused to retrieve a best-matching unit. An assessment is made whether modifications to the frames are needed, because of discontinuities in the model parameters at unit boundaries, or because of differences between the desired and selected unit features. When modifications are necessary, the model parameters of frames that need to be altered are modified, and new frames are synthesized from the modified model parameters and concatenated to the output. Otherwise, the speech frames previously stored in the database are retrieved and concatenated to the output.
51 Citations
41 Claims
-
1. An arrangement for creating synthesized speech from an applied sequence of desired speech unit features parameter sets, D-SUF(i), i=2,3, . . . , comprising:
-
a database that contains a plurality of sets, E(k), k=1,2, . . . ,K, where K is an integer, each set E(k) including a plurality of associated frames in sequence, each of said frames being represented by a collection of model feature parameters, and T-D data representing a time-domain speech signal corresponding to said frame, and a collection of unit selection parameters which characterize the model feature parameters of the speech frames in the set E(k); a database search engine that, for each applied D-SUF(i), selects from said database a set E(i) having a collection of unit selection parameters that match best said D-SUF(i), and said plurality of frames that are associated with said E(i), thus creating a sequence of frames; an evaluator that determines, based on assessment of information obtained from said database and pertaining to said E(i), whether modifications are needed to frames of said E(i); a modification and synthesis module that, when said evaluator concludes that modifications to frames are needed, modifies the collection of model parameters of those frames that need modification, and generates, for each frame having a modified collection of model parameters, T-D data corresponding to said frame; and a combiner that concatenates T-D data of successive frames in said sequence of frames, by employing, for each concatenated frame, the T-D data generated for said concatenated frame by said modification and synthesis module, if such T-D data was generated, or T-D data retrieved for said concatenated frame from said database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for creating synthesized speech from an applied sequence of desired speech unit features parameter sets, D-SUF(i), i=2,3, . . . , comprising the steps pfi:
-
for each of said D-SUF(i), selecting from a database information of an entry E(i) the E(i) having a set of speech unit characterization parameters that best match said D-SUF(i), which entry also includes a plurality of frames represented by a corresponding plurality of model parameter sets, and a corresponding plurality of time domain speech frames, said information including at least said plurality of model parameter sets, thereby resulting in a sequence of model parameter sets, corresponding to which a sequence of output speech frames is to be concatenated; determining, based on assessment of information obtained from said database and pertaining to said E(i), whether modifications are needed to said frames of said E(i); when said evaluator concludes that modifications to frames are needed, modifying the collection of model parameters of those frames that need modification; generating, for each frame having a modified collection of model parameters, T-D data corresponding to said frame; and concatenating T-D data of successive frames in said sequence of frames, by employing, for each concatenated frame, the T-D data generated for said step of generating, if such T-D data was generated, or T-D data retrieved for said concatenated frame from said database. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
Specification