Cross-lingual speaker adaptation for multi-lingual speech synthesis
First Claim
Patent Images
1. A method comprising:
- receiving input speech data from a speaker in a first language;
estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data;
accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language;
modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and
generating speech data in the second language using the speaker-specific speech model.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language. The method further includes modifying the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generating speech data in the second language using the speaker-specific speech model.
50 Citations
26 Claims
-
1. A method comprising:
-
receiving input speech data from a speaker in a first language; estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
a speech synthesis engine including a processor, the speech synthesis engine configured to; receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. A computer program product comprising computer readable instructions encoded on a storage device, the instructions configured to cause one or more processors to:
-
receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A method comprising:
-
receiving input speech data from a speaker in a first language; estimating, by a processor, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, wherein the speaker transform is one of a linear transform and a non-linear transform; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying the speaker-independent speech model using the estimated speaker transform coefficients to obtain a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. - View Dependent Claims (26)
-
Specification