Speech transformation system
First Claim
1. For use with a costume depicting a character having a defined voice with a pre-established voice characteristic, a voice transformation system comprising:
- a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal;
a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tend to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask;
a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the defined voice of the character depicted by the costume; and
a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal representing the utterances of the source speech signals in the defined voice of the character depicted by the costume;
wherein the voice transformation device stores a plurality of representations of the defined voice and transforms the voice of the person wearing the costume into the same defined voice of the character depicted by the costume, based upon association of the voice of the particular person with particular ones of the stored representations.
6 Assignments
0 Petitions
Accused Products
Abstract
A high quality voice transformation system and method operates during a training mode to store voice signal characteristics representing target and source voices. Thereafter, during a real time transformation mode, a signal representing source speech is segmented into overlapping segments, analyzed to separate the excitation spectrum from the tone quality spectrum. A stored target tone quality spectrum is substituted for the source spectrum and then convolved with the actual source speech excitation spectrum to produce a transformed speech signal having the word and excitation content of the source, but the acoustical characteristics of a target speaker. The system may be used to enable a talking, costumed character, or in other applications where a source speaker wishes to imitate the voice characteristics of a different, target speaker.
412 Citations
11 Claims
-
1. For use with a costume depicting a character having a defined voice with a pre-established voice characteristic, a voice transformation system comprising:
-
a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal; a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tend to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask; a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the defined voice of the character depicted by the costume; and a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal representing the utterances of the source speech signals in the defined voice of the character depicted by the costume; wherein the voice transformation device stores a plurality of representations of the defined voice and transforms the voice of the person wearing the costume into the same defined voice of the character depicted by the costume, based upon association of the voice of the particular person with particular ones of the stored representations. - View Dependent Claims (2)
-
-
3. A voice transformation system comprising:
-
a preprocessing subsystem receiving a source voice signal and digitizing and segmenting the source voice signal to generate a segmented time domain signal; an analysis subsystem responding to each segment of the segmented time domain signal by generating a source speech pitch signal representative of a pitch thereof, an excitation signal representative of the excitation thereof and a source vector that is representative of a smoothed spectrum of the segment; a transformation subsystem storing a plurality of source and target vectors and voice pitch indications for the source voice and a target voice different from the source voice, a correspondence between the source and target vectors and the source and target voice pitch indications, the transformation subsystem using the stored information to substitute a target vector for each received source vector, adjusting the pitch of the frequency domain excitation spectrum in response to the source and target pitch indications to generate a pitch adjusted excitation spectrum, and convolving the pitch adjusted excitation spectrum with a signal represented by the substituted target vector to generate a sequence of segmented target voice segments defining a segmented target voice signal; and a post processing subsystem converting the segmented target voice signal into a segmented time domain target voice signal that represents the words of the source signal with vocal characteristics of the different target voice. - View Dependent Claims (4, 5, 6, 7, 8, 9)
-
-
10. A method of transforming a source signal representing a source voice to a target signal representing a target voice comprising the steps of:
-
preprocessing the source signal to produce a time domain sampled and segmented source signal in response thereto; analyzing the sampled and segmented source signal, the analysis including executing a transformation of the source signal to the frequency domain, generating a cepstrum vector representation of a smoothed spectrum of each segment of the source signal, generating an excitation signal representing the excitation of each segment of the source signal, determining a pitch for each segment of the source signal, and adjusting the excitation signal for each segment of the source signal in response to the pitch for each segment of the source signal; transforming each segment by storing cepstrum vectors representing target speech and corresponding cepstrum vectors representing source speech, substituting a stored target speech cepstrum vector for an analyzed source cepstrum vector and convolving the substituted target cepstrum vector with the excitation signal to generate a target segmented frequency domain signal; and post processing the target segmented frequency domain signal to provide transformation to the time domain and inverse segmentation to generate the target voice signal.
-
-
11. For use with a costume depicting a predefined character having a voice with a pre-established voice characteristic, a voice transformation system comprising:
-
a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal; a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tent to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask; a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the voice of the character depicted by the costume; and a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal by replacing vocal characteristics of the speaker, represented by the signal, with predefined and stored substitute vocal characteristics of the voice of the character depicted by the costume, the target speech signal being communication to the speaker to be transduced and acoustically broadcast by the speaker.
-
Specification