Voice personalization of speech synthesizer
First Claim
1. A method of personalizing a speech synthesizer, comprising:
- obtaining a corpus of speech data expressed as a set of parameters useable by said speech synthesizer to generate synthesized speech;
decomposing said set of parameters into a set of speaker dependent parameters and a set of speaker independent parameters;
obtaining enrollment data from a new speaker and using said enrollment data to adapt said speaker dependent parameters and thereby generate adapted speaker dependent parameters;
combining said speaker independent parameters and said adapted speaker dependent parameters to construct personalized synthesis parameters for use by said speech synthesizer in generating synthesized speech.
4 Assignments
0 Petitions
Accused Products
Abstract
The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.
-
Citations
22 Claims
-
1. A method of personalizing a speech synthesizer, comprising:
-
obtaining a corpus of speech data expressed as a set of parameters useable by said speech synthesizer to generate synthesized speech;
decomposing said set of parameters into a set of speaker dependent parameters and a set of speaker independent parameters;
obtaining enrollment data from a new speaker and using said enrollment data to adapt said speaker dependent parameters and thereby generate adapted speaker dependent parameters;
combining said speaker independent parameters and said adapted speaker dependent parameters to construct personalized synthesis parameters for use by said speech synthesizer in generating synthesized speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of constructing a personalized speech synthesizer, comprising:
-
providing a base synthesizer employing a predetermined synthesis method and having an initial set of parameters used by said synthesis method to generate synthesized speech;
representing said initial set of parameters as speaker dependent parameters and speaker independent parameters;
obtaining enrollment data from a speaker; and
using said enrollment data to modify said speaker dependent parameters and thereby personalize said base synthesizer to mimic speech qualities of said speaker.
-
-
11. A personalized speech synthesizer comprising:
-
a synthesis processor having a set of instructions for performing a predefined synthesis method that operates upon a data store of synthesis parameters represented as speaker dependent parameters and speaker independent parameters;
a memory containing a data store of synthesis parameters represented as speaker dependent parameters and speaker independent parameters;
an input for providing a set of enrollment data from a given speaker; and
an adaptation module receptive of said enrollment data that operates upon said speaker dependent parameters to personalize said parameters to said given speaker. - View Dependent Claims (12, 13, 14, 15, 16, 18, 19, 20, 21, 22)
-
-
17. A speech synthesis system comprising:
-
a speech synthesizer that performs a predefined synthesis method by operating upon a data store of synthesis parameters;
a personalizer receptive of enrollment data from a given speaker that modifies at least a portion of said synthesis parameters to personalize the sound of the synthesizer to mimic said given speaker'"'"'s speech.
-
Specification