Codebook-less speech conversion method and system
First Claim
1. A method of speech conversion comprising the steps of:
- dividing a source signal into multiple source frames;
for each source frame,deriving at least one line spectral frequency (LSF) vector, andmapping said at least one LSF vector to a LSF vector of a respective target frame; and
assembling said respective target frames into a target source signal.
0 Assignments
0 Petitions
Accused Products
Abstract
The conversion of speech can be used to transform an utterance by a source speaker to match the speech characteristic of a target speaker, for applications such as dubbing a motion picture. During a training phase, utterances corresponding to the same sentences by both the target speaker and source speaker are force aligned according to the phonemes within the sentences. A transformation or mapping is trained so that each frame of the source utterances is mapped to a corresponding frame of the target utterance. After the completion of the training phase, a source utterance is divided into frames, which are transformed into target frames. After all target frames are created from the sequence of frames from the source utterance, a target utterance is created having the speech of the source speaker, but with the vocal characteristics of the target speaker.
-
Citations
14 Claims
-
1. A method of speech conversion comprising the steps of:
-
dividing a source signal into multiple source frames; for each source frame, deriving at least one line spectral frequency (LSF) vector, and mapping said at least one LSF vector to a LSF vector of a respective target frame; and assembling said respective target frames into a target source signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of speech conversion comprising the steps of:
-
training a source to target frame transformation using a source training set of source utterances and a target training set of target utterances that transforms frames with vocal characteristics of the source speaker to frames with vocal characteristics of the target speaker; recognizing phonemes in a source utterance spoken by a source speaker having vocal source speaker vocal characteristics; subdividing the source utterance into at least one source frames comprising only one phoneme; transforming each of said at least one source frame into a target frame based on a source to target frame transformation that transforms frames with vocal characteristics of the source speaker to frames with vocal characteristics of the target speaker; and assembling the target frames transformed from each of said at least one source frame into a target utterance. - View Dependent Claims (11)
-
-
12. A system for speech conversion comprising:
-
a processor; a communication bus coupled to the processor; a main memory coupled to the communication bus; an audio input coupled to the communication bus; an audio output coupled to the communication bus; wherein the processor receives a source utterance spoken by a source speaker having source speaker vocal characteristics from the audio input;
the processor receives instructions from the main memory which causes the processor to;recognize phonemes in a source utterance spoken by a source speaker having vocal source speaker vocal characteristics; subdivide the source utterance into at least one source frames comprising only one phoneme; transform each of said at least one source frame into a target frame based on a frame transformation that transforms frames with vocal characteristics of the source speaker to frames with vocal characteristics of the target speaker; and assemble the target frames transformed from each of said at least one source frame into a target utterance.
-
-
13. A method of creating a dubbed soundtrack, the method comprising the steps:
-
receiving a first soundtrack comprising a first vocal track of a first speaker'"'"'s speech, wherein said first vocal track includes vocal characteristics of said first speaker'"'"'s speech; receiving a second soundtrack comprising a second vocal track of a second speaker'"'"'s speech, wherein said second vocal track includes vocal characteristics of said second speaker'"'"'s speech; and converting said second soundtrack into a dubbed soundtrack, wherein said dubbed soundtrack includes a third vocal track of said second speaker'"'"'s speech, wherein said third vocal track includes vocal characteristics of said first speaker'"'"'s speech. - View Dependent Claims (14)
-
Specification