Frame mapping approach for cross-lingual voice transformation
First Claim
1. A computer-readable memory storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- performing formant-based frequency warping on fundamental frequencies and linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums;
generating warped parameter trajectories based at least on the transformed fundamental frequencies and the transformed LPC spectrums; and
producing transformed target speech waveforms with voice characteristics of the first language that retain at least some voice characteristics of a target speaker using the warped parameter trajectories and features from target speech waveforms of the target speaker in a second language.
2 Assignments
0 Petitions
Accused Products
Abstract
Frame mapping-based cross-lingual voice transformation may transform a target speech corpus in a particular language into a transformed target speech corpus that remains recognizable, and has the voice characteristics of a target speaker that provided the target speech corpus. A formant-based frequency warping is performed on the fundamental frequencies and the linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums. The transformed fundamental frequencies and the transformed LPC spectrums are then used to generate warped parameter trajectories. The warped parameter trajectories are further used to transform the target speech waveforms in the second language to produce transformed target speech waveform with voice characteristics of the first language that nevertheless retain at least some voice characteristics of the target speaker.
51 Citations
20 Claims
-
1. A computer-readable memory storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
-
performing formant-based frequency warping on fundamental frequencies and linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums; generating warped parameter trajectories based at least on the transformed fundamental frequencies and the transformed LPC spectrums; and producing transformed target speech waveforms with voice characteristics of the first language that retain at least some voice characteristics of a target speaker using the warped parameter trajectories and features from target speech waveforms of the target speaker in a second language. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method, comprising:
-
under control of one or more computing systems configured with executable instructions, performing formant-based frequency warping on fundamental frequencies and coding spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed coding spectrums; generating warped parameter trajectories based at least on the transformed fundamental frequencies and the transformed coding spectrums; and producing transformed target speech waveforms with voice characteristics of the first language that retain at least some voice characteristics of a target speaker using the warped parameter trajectories and features from target speech waveforms of the target speaker in the second language; training models based at least on the transformed speech target waveforms; and generating synthesized speech for an input text using the trained models. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
one or more processors; and a memory that includes a plurality of computer-executable components, the plurality of computer-executable components comprising; a frequency warping component to perform formant-based frequency warping on fundamental frequencies and coding spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed coding spectrums; a trajectory generation component to generate warped parameter trajectories based at least on the transformed fundamental frequencies and the transformed coding spectrums; and a trajectory tiling component to produce transformed target speech waveforms with voice characteristics of the first language that retain at least some voice characteristics of a target speaker using the warped parameter trajectories and features from target speech waveforms of the target speaker in the second language. - View Dependent Claims (17, 18, 19, 20)
-
Specification