Voice personalization of speech synthesizer
First Claim
1. A method of personalizing a speech synthesizer, comprising:
- obtaining a corpus of speech data expressed as a set of parameters useable by said speech synthesizer to generate synthesized speech;
decomposing said set of parameters into a set of speaker dependent parameters and a set of speaker independent parameters;
obtaining enrollment data from a new speaker and using said enrollment data to adapt said speaker dependent parameters and thereby generate adapted speaker dependent parameters by selecting a supervector in an eipenspace trained on speaker dependent parameters of multiple training speakers, said supervector selected to be most consistent with the enrollment data;
combining said speaker independent parameters and said adapted speaker dependent parameters to construct personalized synthesis parameters for use by said speech synthesizer in generating synthesized speech.
4 Assignments
0 Petitions
Accused Products
Abstract
The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.
-
Citations
21 Claims
-
1. A method of personalizing a speech synthesizer, comprising:
-
obtaining a corpus of speech data expressed as a set of parameters useable by said speech synthesizer to generate synthesized speech;
decomposing said set of parameters into a set of speaker dependent parameters and a set of speaker independent parameters;
obtaining enrollment data from a new speaker and using said enrollment data to adapt said speaker dependent parameters and thereby generate adapted speaker dependent parameters by selecting a supervector in an eipenspace trained on speaker dependent parameters of multiple training speakers, said supervector selected to be most consistent with the enrollment data;
combining said speaker independent parameters and said adapted speaker dependent parameters to construct personalized synthesis parameters for use by said speech synthesizer in generating synthesized speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of constructing a personalized speech synthesizer, comprising:
-
providing a base synthesizer employing a predetermined synthesis method and having an initial set of parameters used by said synthesis method to generate synthesized speech;
representing said initial set of parameters as speaker dependent parameters and speaker independent parameters;
obtaining enrollment data from a speaker; and
using said enrollment data to modify said speaker dependent parameters and thereby personalize said base synthesizer to mimic speech qualities of said speaker by selecting a supervector in an eipenspace trained on speaker dependent parameters of multiple training speakers, said supervector selected to be most consistent with the enrollment data.
-
-
11. A personalized speech synthesizer comprising:
-
a synthesis processor having a set of instructions for performing a predefined synthesis method that operates upon a data store of synthesis parameters represented as speaker dependent parameters and speaker independent parameters;
a memory containing a data store of synthesis parameters represented as speaker dependent parameters and speaker independent parameters;
an input for providing a set of enrollment data from a given speaker; and
an adaptation module receptive of said enrollment data that adapts said speaker dependent parameters to personalize said parameters to said given speaker by selecting a supervector in an eigenspace trained on speaker dependent parameters of multiple training sneakers, said supervector selected to be most consistent with said enrollment data. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A speech synthesis system comprising:
-
a speech synthesizer that performs a predefined synthesis method by operating upon a data store of decomposed speaker independent synthesis parameters and speaker dependent synthesis parameters;
a personalizer receptive of enrollment data from a given speaker that modifies said speaker dependent synthesis parameters to personalize the sound of the synthesizer to mimic said given speaker'"'"'s speech, wherein said personalizer extracts speaker dependent parameters from said synthesis parameters and then modifies said speaker dependent parameters using said enrollment data by constraining context independent parameters extracted from said enrollment data to an eigenspace trained on speaker dependent parameters of multiple training speakers using a maximum likelihood technique, thereby estimating context independent parameters of said given speaker by selecting a supervector in the eigenspace that is most consistent with the enrollment data. - View Dependent Claims (18, 19, 20, 21)
-
Specification