Voice personalization of speech synthesizer

US 6,970,820 B2
Filed: 02/26/2001
Issued: 11/29/2005
Est. Priority Date: 02/26/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method of personalizing a speech synthesizer, comprising:

obtaining a corpus of speech data expressed as a set of parameters useable by said speech synthesizer to generate synthesized speech;

decomposing said set of parameters into a set of speaker dependent parameters and a set of speaker independent parameters;

obtaining enrollment data from a new speaker and using said enrollment data to adapt said speaker dependent parameters and thereby generate adapted speaker dependent parameters by selecting a supervector in an eipenspace trained on speaker dependent parameters of multiple training speakers, said supervector selected to be most consistent with the enrollment data;

combining said speaker independent parameters and said adapted speaker dependent parameters to construct personalized synthesis parameters for use by said speech synthesizer in generating synthesized speech.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.

Citations

21 Claims

1. A method of personalizing a speech synthesizer, comprising:
- obtaining a corpus of speech data expressed as a set of parameters useable by said speech synthesizer to generate synthesized speech;
  
  decomposing said set of parameters into a set of speaker dependent parameters and a set of speaker independent parameters;
  
  obtaining enrollment data from a new speaker and using said enrollment data to adapt said speaker dependent parameters and thereby generate adapted speaker dependent parameters by selecting a supervector in an eipenspace trained on speaker dependent parameters of multiple training speakers, said supervector selected to be most consistent with the enrollment data;
  
  combining said speaker independent parameters and said adapted speaker dependent parameters to construct personalized synthesis parameters for use by said speech synthesizer in generating synthesized speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein the number of speaker independent parameters exceeds the number of speaker dependent parameters.
  - 3. The method of claim 1 wherein said decomposing step is performed by identifying context dependent information and using said context dependent to represent said speaker independent parameters.
  - 4. The method of claim 1 wherein said decomposing step is performed by identifying context independent information and using said context independent to represent said speaker dependent parameters.
  - 5. The method of claim 1 wherein said speech data comprise a set of frequency parameters corresponding to formant trajectories associated with human speech.
  - 6. The method of claim 1 wherein said speech data comprise a set of time domain parameters corresponding to glottal source information associated with human speech.
  - 7. The method of claim 1 wherein said speech data comprise set of parameters corresponding to prosody information associated with human speech.
  - 8. The method of claim 1 further comprising constructing an eigenspace using speaker dependent parameters from a population of training speakers and using said eigenspace and said enrollment data to adapt said speaker dependent parameters.
  - 9. The method of claim 1 further comprising constructing an eigenspace using speaker dependent parameters from a population of training speakers and using said eigenspace and said enrollment data to adapt said speaker dependent parameters if said enrollment data alone does not represent all phonemes used by the synthesizer.

10. A method of constructing a personalized speech synthesizer, comprising:
- providing a base synthesizer employing a predetermined synthesis method and having an initial set of parameters used by said synthesis method to generate synthesized speech;
  
  representing said initial set of parameters as speaker dependent parameters and speaker independent parameters;
  
  obtaining enrollment data from a speaker; and
  
  using said enrollment data to modify said speaker dependent parameters and thereby personalize said base synthesizer to mimic speech qualities of said speaker by selecting a supervector in an eipenspace trained on speaker dependent parameters of multiple training speakers, said supervector selected to be most consistent with the enrollment data.

11. A personalized speech synthesizer comprising:
- a synthesis processor having a set of instructions for performing a predefined synthesis method that operates upon a data store of synthesis parameters represented as speaker dependent parameters and speaker independent parameters;
  
  a memory containing a data store of synthesis parameters represented as speaker dependent parameters and speaker independent parameters;
  
  an input for providing a set of enrollment data from a given speaker; and
  
  an adaptation module receptive of said enrollment data that adapts said speaker dependent parameters to personalize said parameters to said given speaker by selecting a supervector in an eigenspace trained on speaker dependent parameters of multiple training sneakers, said supervector selected to be most consistent with said enrollment data.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The synthesizer of claim 11 wherein said synthesis parameters are context independent parameters.
  - 13. The synthesizer of claim 11 wherein said synthesis parameters are context dependent parameters.
  - 14. The synthesizer of claim 11 wherein said input includes microphone for acquisition of said enrollment data from provided speech utterances of said given speaker.
  - 15. The synthesizer of claim 11 wherein said adaptation module includes estimation system employing an eigenspace developed from a training corpus.
  - 16. The synthesizer of claim 15 wherein said enrollment data comprises extracted parameters taken from speech utterances of said given speaker and wherein said estimation system estimates sound units not found in said enrollment data by constraining said extracted parameters from the speech utterance of said given speaker to said eigenspace.

17. A speech synthesis system comprising:
- a speech synthesizer that performs a predefined synthesis method by operating upon a data store of decomposed speaker independent synthesis parameters and speaker dependent synthesis parameters;
  
  a personalizer receptive of enrollment data from a given speaker that modifies said speaker dependent synthesis parameters to personalize the sound of the synthesizer to mimic said given speaker'"'"'s speech, wherein said personalizer extracts speaker dependent parameters from said synthesis parameters and then modifies said speaker dependent parameters using said enrollment data by constraining context independent parameters extracted from said enrollment data to an eigenspace trained on speaker dependent parameters of multiple training speakers using a maximum likelihood technique, thereby estimating context independent parameters of said given speaker by selecting a supervector in the eigenspace that is most consistent with the enrollment data.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The system of claim 17 wherein said personalizer decomposes said synthesis parameters into speaker dependent parameters and speaker independent parameters and then modifies said speaker dependent parameters using said enrollment data, and said speech synthesizer performs speech synthesis by combining said speaker independent parameters with modified speaker dependent parameters.
  - 19. The system of claim 17 further comprising parameter estimation system for augmenting said enrollment data to supply estimates of parameters corresponding to sound units that are missing in said enrollment data.
  - 20. The system of claim 19 wherein said estimation system employs an eigenspace trained upon a population of training speakers.
  - 21. The system of claim 19 wherein said estimation system employs an eigenspace trained upon a population of training speakers and uses said eigenspace to supply said estimates of parameters by constraining said enrollment data to said eigenspace.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Junqua, Jean-Claude, Perronnin, Florent, Nguyen, Patrick, Kuhn, Roland
Primary Examiner(s)
McFadden, Susan
Assistant Examiner(s)
Vo, Huyen X.

Application Number

US09/792,928
Publication Number

US 20020120450A1
Time in Patent Office

1,737 Days
Field of Search

704/258, 704/246, 704/261, 704/266, 704/250, 704/262, 704/245, 704/255, 704/274, 395/2, 395/25.9, 705/17, 715/530, 359/430
US Class Current

704/258
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 2021/0135 Voice conversion or morphing

Voice personalization of speech synthesizer

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Voice personalization of speech synthesizer

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links