Speech transformation system

US 5,327,521 A
Filed: 08/31/1993
Issued: 07/05/1994
Est. Priority Date: 03/02/1992
Status: Expired due to Term

First Claim

Patent Images

1. For use with a costume depicting a character having a defined voice with a pre-established voice characteristic, a voice transformation system comprising:

a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal;

a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tend to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask;

a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the defined voice of the character depicted by the costume; and

a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal representing the utterances of the source speech signals in the defined voice of the character depicted by the costume;

wherein the voice transformation device stores a plurality of representations of the defined voice and transforms the voice of the person wearing the costume into the same defined voice of the character depicted by the costume, based upon association of the voice of the particular person with particular ones of the stored representations.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A high quality voice transformation system and method operates during a training mode to store voice signal characteristics representing target and source voices. Thereafter, during a real time transformation mode, a signal representing source speech is segmented into overlapping segments, analyzed to separate the excitation spectrum from the tone quality spectrum. A stored target tone quality spectrum is substituted for the source spectrum and then convolved with the actual source speech excitation spectrum to produce a transformed speech signal having the word and excitation content of the source, but the acoustical characteristics of a target speaker. The system may be used to enable a talking, costumed character, or in other applications where a source speaker wishes to imitate the voice characteristics of a different, target speaker.

412 Citations

11 Claims

1. For use with a costume depicting a character having a defined voice with a pre-established voice characteristic, a voice transformation system comprising:
- a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal;
  
  a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tend to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask;
  
  a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the defined voice of the character depicted by the costume; and
  
  a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal representing the utterances of the source speech signals in the defined voice of the character depicted by the costume;
  
  wherein the voice transformation device stores a plurality of representations of the defined voice and transforms the voice of the person wearing the costume into the same defined voice of the character depicted by the costume, based upon association of the voice of the particular person with particular ones of the stored representations.
- View Dependent Claims (2)
- - 2. A voice transformation system according to claim 1, wherein the voice transformation device includes:
    - a processing subsystem segmenting and windowing the received source speech signal to generate a sequence of preprocessed speech signal segments;
      
      an analysis subsystem processing the received preprocessed speech signal segments to generate for each segment a pitch signal indicating a dominant pitch of the segment, a frequency domain vector representing a smoothed frequency characteristic of the segment and an excitation signal representing excitation characteristics of the segment;
      
      a transformation subsystem storing target frequency domain vectors that are representative of the target speech, substituting a corresponding target frequency domain vector for the frequency domain vector derived by the analysis subsystem, adjusting the pitch of the target excitation spectrum in response to the pitch signal derived by the analysis subsystem, and convolving the substituted target frequency domain vector with the adjusted excitation spectrum to produce a segmented frequency domain representation of the target voice; and
      
      a post processing subsystem performing an inverse Fourier transform and an inverse segmenting and windowing operation on each segmented frequency domain representation of the target voice to generate a time domain signal representing the source speech in the voice of the character depicted by the costume.

3. A voice transformation system comprising:
- a preprocessing subsystem receiving a source voice signal and digitizing and segmenting the source voice signal to generate a segmented time domain signal;
  
  an analysis subsystem responding to each segment of the segmented time domain signal by generating a source speech pitch signal representative of a pitch thereof, an excitation signal representative of the excitation thereof and a source vector that is representative of a smoothed spectrum of the segment;
  
  a transformation subsystem storing a plurality of source and target vectors and voice pitch indications for the source voice and a target voice different from the source voice, a correspondence between the source and target vectors and the source and target voice pitch indications, the transformation subsystem using the stored information to substitute a target vector for each received source vector, adjusting the pitch of the frequency domain excitation spectrum in response to the source and target pitch indications to generate a pitch adjusted excitation spectrum, and convolving the pitch adjusted excitation spectrum with a signal represented by the substituted target vector to generate a sequence of segmented target voice segments defining a segmented target voice signal; and
  
  a post processing subsystem converting the segmented target voice signal into a segmented time domain target voice signal that represents the words of the source signal with vocal characteristics of the different target voice.
- View Dependent Claims (4, 5, 6, 7, 8, 9)
- - 4. A voice transformation system according to claim 3, wherein the preprocessing subsystem includes a digitizing sampling circuit that samples the source voice signal to produce digital samples that are representative thereof and a segmenting and windowing circuit that devices the digital samples into overlapping segments having a shift distance of at most 1/4 of a segment and applies a windowing function to each segment that reduces aliasing during a subsequent transformation to the frequency domain to produce a sequence of windowed source segments.
  - 5. A voice transformation system according to claim 4, wherein each of the segments represent 256 voice samples.
  - 6. A voice transformation system according to claim 3, wherein the analysis subsystem includes:
    - a discrete Fourier transform unit generating a frequency domain representation of each segment;
      
      an LPC cepstrum parametrization unit generating source cepstrum coefficient voice vectors representing a smoothed spectrum of each frequency domain segment;
      
      an inverse convolution unit deconvolving each frequency domain segment with the smoothed cepstrum coefficient representation thereof to produce the excitation signal in the form of a frequency domain excitation spectrum;
      
      a pitch adjustment unit responding to the source speech pitch signal and adjusting the pitch of the excitation spectrum to generate a pitch adjusted excitation spectrum;
      
      a substitution unit substituting target cepstrum coefficient voice vectors for the source cepstrum coefficient voice vectors for each corresponding segment; and
      
      a convolver convolving the pitch adjusted excitation spectrum with the substituted target cepstrum coefficient voice vectors.
  - 7. A voice transformation system according to claim 3, wherein the transformation subsystem includes:
    - a store storing the target voice pitch information, a plurality of the target vectors, a plurality of the source vectors and the correspondence between the source and target vectors;
      
      a pitch adjustment unit adjusting the pitch of the frequency domain excitation spectrum to generate a pitch adjusted excitation spectrum;
      
      a substitution unit receiving source vectors and responsive to the stored voice and target vectors and substituting one of the stored target vectors for each received source vector; and
      
      a convolver convolving each substituted target vector with the corresponding pitch adjusted excitation spectrum to generate a segmented frequency domain target voice signal.
  - 8. A voice transformation system according to claim 3, wherein the post processing subsystem includes:
    - an inverse Fourier transform unit transforming the segmented target voice signal to the segmented time domain target voice signal;
      
      an inverse segmenting and windowing unit converting the segmented time domain target voice signal to a sampled nonsegmented target voice signal; and
      
      a time duration adjustment unit adjusting the time duration of representations of the sampled nonsegmented target voice signal.
  - 9. A voice transformation system according to claim 8, further comprising a digital-to-analog converter converting the time duration adjusted sampled nonsegmented target voice signal to a continuous time varying signal representing spoken utterances of the source voice with acoustical characteristics of the target voice.

10. A method of transforming a source signal representing a source voice to a target signal representing a target voice comprising the steps of:
- preprocessing the source signal to produce a time domain sampled and segmented source signal in response thereto;
  
  analyzing the sampled and segmented source signal, the analysis including executing a transformation of the source signal to the frequency domain, generating a cepstrum vector representation of a smoothed spectrum of each segment of the source signal, generating an excitation signal representing the excitation of each segment of the source signal, determining a pitch for each segment of the source signal, and adjusting the excitation signal for each segment of the source signal in response to the pitch for each segment of the source signal;
  
  transforming each segment by storing cepstrum vectors representing target speech and corresponding cepstrum vectors representing source speech, substituting a stored target speech cepstrum vector for an analyzed source cepstrum vector and convolving the substituted target cepstrum vector with the excitation signal to generate a target segmented frequency domain signal; and
  
  post processing the target segmented frequency domain signal to provide transformation to the time domain and inverse segmentation to generate the target voice signal.

11. For use with a costume depicting a predefined character having a voice with a pre-established voice characteristic, a voice transformation system comprising:
- a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal;
  
  a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tent to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask;
  
  a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the voice of the character depicted by the costume; and
  
  a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal by replacing vocal characteristics of the speaker, represented by the signal, with predefined and stored substitute vocal characteristics of the voice of the character depicted by the costume, the target speech signal being communication to the speaker to be transduced and acoustically broadcast by the speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Disney Enterprises Incorporated (The Walt Disney Company)
Original Assignee
The Walt Disney Company
Inventors
Tan, Seow-Hwee, Nam, Il-Hyun, Savic, Michael I.
Primary Examiner(s)
Knepper, David D.

Application Number

US08/114,603
Time in Patent Office

308 Days
Field of Search

381/61, 381/62, 381/36-40, 381/43, 381/45, 381/49, 381/50, 381/53, 381/54, 395/2.67, 395/2, 395/2.7, 395/2.79, 395/2.81, 395/2.87, 395/2.12
US Class Current

704/272
CPC Class Codes

G10L 2021/0135 Voice conversion or morphing

G10L 21/00 Speech or voice signal proc...

Speech transformation system

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

412 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Speech transformation system

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

412 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links