Targeted vocal transformation

US 6,336,092 B1
Filed: 04/28/1997
Issued: 01/01/2002
Est. Priority Date: 04/28/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method of transforming the voice of a source individual so as to adopt characteristics of a target individual, comprising:

providing a spectral envelope derived from the voice of the source individual;

providing an excitation signal component derived from the voice of the target individual; and

applying the spectral envelope from the source individual to the excitation signal component from the target individual.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention is a method for transforming a source individual'"'"'s voice so as to adopt the characteristics of a target individual'"'"'s voice. The excitation signal component of the target individual'"'"'s voice is extracted and the spectral envelope of the source individual'"'"'s voice is extracted. The transformed voice is synthesized by applying the spectral envelope of the source individual to the excitation signal component of the voice of the target individual. A higher quality transformation is achieved using an enhanced excitation signal created by replacing unvoiced regions of the signal with interpolated data from adjacent voiced regions. Various methods of transforming the spectral characteristics of the source individual'"'"'s voice are also disclosed.

Citations

39 Claims

1. A method of transforming the voice of a source individual so as to adopt characteristics of a target individual, comprising:
- providing a spectral envelope derived from the voice of the source individual;
  
  providing an excitation signal component derived from the voice of the target individual; and
  
  applying the spectral envelope from the source individual to the excitation signal component from the target individual.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 2. The method according to claim 1 further comprising the step of extracting and storing the excitation signal component from the voice of the target individual.
  - 3. The method according to claim 2 wherein the step of extracting the excitation signal is performed by flattening the spectral envelope of the target vocal signal.
  - 4. The method according to claim 2 further comprising the step of storing said extracted excitation signal.
  - 5. The method of claim 4 wherein said step of storing comprises storing said extracted excitation signal in compressed form.
  - 6. The method according to claim 2 wherein the step of extracting the excitation signal comprises the steps of:
7. The method according to claim 6 further comprising the steps of identifying voiced and unvoiced signal segments in the excitation signal component and replacing unvoiced signal segments with interpolated data from the voiced signal segments.
8. The method according to claim 7 wherein unvoiced segments in the signal are identified by comparing the parameters of the segments to thresholds selected from among the group of parameters comprising:
- average segment power, average low-band segment power, zero crossings per segment.
9. The method according to claim 7 wherein said step of replacing with interpolated data comprises using sinusoidal synthesis to morph between the edges of the voiced signals adjacent said silence portions.
10. The method according to claim 1 further comprising the steps of storing said excitation signal;
- andperforming spectral analysis on a vocal signal representative of the voice of the source individual so as to determine the spectral envelope of said vocal signal.
11. The method according to claim 1 or 10 further comprising the step of transforming the spectral envelope of said vocal signal prior to applying said spectral envelope of said vocal signal to said excitation signal.
12. The method according to claim 10 further comprising the steps of:
- obtaining a digital transfer function corresponding to the spectral envelope of said vocal signal;
  
  decomposing said digital transfer function into a plurality of lower order sections; and
  
  , modifying the spectral characteristics of at least one of said lower-order sections.
13. The method according to claim 10 further comprising the step of transforming the spectral envelope by applying conformal mapping to the difference equation of the time-varying synthesis filter.
14. The method according to claim 13 wherein said vocal signal represents singing.
15. The method according to any of claim 23 or 13 further comprising the steps of splitting said vocal signal into a plurality of frequency bands and independently transforming the spectral envelopes corresponding to said bands.
16. The method according to claim 10 wherein at least one of the source individual and the target individual is a singer and further comprising the step of applying conformal mapping to the difference equation of the time-varying synthesis filter.
17. The method according to claim 1 further comprising the step of determining the pitch of the vocal signal representative of the target individual.
18. The method according to claim 17 further comprising the step of transforming the pitch of the target excitation signal to match the pitch of the source vocal signal.
19. The method according to claim 18 further comprising the step of determining the average pitch of the vocal signal of the source individual over periods of at least 50 milliseconds.
20. The method according to claim 1 further comprising the steps of:
- segmenting a signal representative of the voice of said source individual into voiced and non-voiced regions;
  
  if a given region represents voiced input, generating output by applying a spectral envelope derived from said region to said excitation signal component; and
  
  , if said given region represents unvoiced input, generating output based on said region without reference to said excitation signal component.
21. The method according to claim 1 further comprising the steps of:
- transforming the spectral envelope of said second signal prior to applying said spectral envelope of said second signal to said excitation signal;
  
  determining the amplitude envelope of the source vocal signal; and
  
  , applying said amplitude envelope to an output signal resulting from applying the spectral envelope of the voice of the source individual to an excitation signal derived from the voice of the target individual.
22. The method according to claim 1 wherein said source individual and said target individual are singers.
23. The method according to claim 1 further comprising the step of transforming the spectral envelope of said second vocal signal prior to applying said spectral envelope of said vocal signal to said excitation signal and wherein said step of transforming comprises modifying the temporal extent of a block of samples of vocal signals representative of the voice of the source individual prior to the step of performing spectral analysis.
24. The method according to claim 1 further comprising the step of splitting the vocal signal representative of the voice of the source individual into a low frequency band and a high frequency band and processing only said low frequency band according to the method of claim 1.
25. The method according to claim 24 further comprising the steps of:
- decimating the low frequency portion;
  
  analyzing the low frequency portion and generating reflection coefficients k_i;
  
  sampling the excitation signal at the same rate as a rate at which the source vocal signal is sampled;
  
  filtering the sampled excitation signal using an interpolated lattice filter;
  
  post-filtering the excitation signal by a lowpass filter to remove the spectral image of the interpolated lattice filter; and
  
  , applying gain compensation.
26. The method according to claim 24 further comprising the steps of:
- decimating the low frequency portion;
  
  analyzing the low frequency portion and generating reflection coefficients k_i;
  
  sampling the excitation signal at a rate matching the decimated rate of the low frequency portion; and
  
  , applying gain compensation.
27. The method according to claim 1 wherein said step of applying a spectral envelope derived from the voice of a source individual comprises the steps of splitting said vocal signal into plurality of frequency bands, independently transforming the spectral envelopes corresponding to said bands and applying said transformed spectral envelopes to said bands.
28. The method according to claim 27 where the steps of transforming and applying the spectral envelope in any band comprises the following steps:
- resampling said signal in said band to create a resampled signal S_D(t) with a lower effective sampling rate;
  
  performing a low-order spectral analysis on S_D(t) and computing the direct-form filter coefficients a_D(i);
  
  modifying the coefficients a_D(i) using conformal-mapping to scale the spectrum in proportion to the ratio between the pitch of the target vocal signal and pitch of the source vocal signal; and
  
  , applying the resulting filter to the target excitation signal.
29. The method according to claim 27 where the steps of transforming and applying the spectral envelope in any band comprises the following steps:
- resampling said signal in said band to create a resampled signal S_D(t) with a lower effective sampling rate;
  
  performing a temporal scaling of the said signal in said band;
  
  performing a low-order spectral analysis on S_D(t); and
  
  , applying the resulting filter to the target excitation signal.
30. The method according to claim 1 further comprising the step of extracting and storing the excitation signal component from the voice of the target individual and wherein unvoiced regions of said excitation signal component are replaced with interpolated voiced data.
31. The method according to claim 30 further comprising the step of determining a pitch contour for the excitation signal.
32. The method according to claim 30 further comprising the steps of:
- segmenting the excitation signal component into analysis segments; and
  
  , determining whether each of said analysis segments represents voiced or unvoiced signal by comparing parameters of the segments to thresholds selected from among the group of parameters comprising;
  
  average segment power, average low-band segment power, zero crossings per segment.
33. The method according to claim 30 wherein said step of replacing unvoiced regions with interpolated voiced data comprises using sinusoidal synthesis to morph between the edges of voiced signal portions adjacent unvoiced regions.
34. The method according to claim 33 further comprising the use of a random pitch component.
35. The method according to claim 33 further comprising the step of storing parameters characterizing said excitation signal component, said parameters being selected from among the group comprising pitch contour and location of unvoiced regions and using said parameters in performing said step of replacing with interpolated voiced data.

36. A method of transforming the voice of a source individual so as to adopt characterstics of a target individual, comprising:
- providing a vocal signal representative of the voice of a target individual;
  
  extracting an excitation signal component of said vocal signal;
  
  storing the excitation signal component of said vocal signal; and
  
  applying the excitation signal component of said vocal signal to a signal derived from the voice of the source individual.
- View Dependent Claims (37)
- - 37. The method according to claim 36 further comprising the step of storing said extracted excitation signal.

38. A method of transforming the voice of a source individual so as to adopt characteristics of the voices of at least two target individuals comprising:
- providing a spectral envelope derived from the voice of the source individual;
  
  providing a combined excitation signal derived from the voices of the at least two target individuals; and
  
  applying the spectral envelope from the source individual to the combined excitation signal from the at least two target individuals.
- View Dependent Claims (39)
- - 39. The method according to claim 38 further comprising the steps of:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
IVL Audio Inc.
Original Assignee
IVL Technologies, Ltd.
Inventors
Lupini, Peter Ronald, Gibson, Brian Charles, Shpak, Dale John
Primary Examiner(s)
Korzuch, William
Assistant Examiner(s)
Azad, Abul K.

Application Number

US08/848,050
Time in Patent Office

1,709 Days
Field of Search

704/276, 704/207, 704/209, 704/278, 704/265, 704/200, 704/203, 704/205, 704/208, 704/213, 704/214, 704/270, 704/266, 704/267, 704/268, 704/269, 846/02, 846/03
US Class Current

704/268
CPC Class Codes

G10H 1/366   with means for modifying or...

G10H 2210/331   Note pitch correction, i.e....

G10H 2250/065   Lattice filter, Zobel netwo...

G10H 2250/545   Aliasing, i.e. preventing, ...

G10L 13/033   Voice editing, e.g. manipul...

G10L 2021/0135   Voice conversion or morphing

Targeted vocal transformation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Targeted vocal transformation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links