Targeted vocal transformation
First Claim
1. A method of transforming the voice of a source individual so as to adopt characteristics of a target individual, comprising:
- providing a spectral envelope derived from the voice of the source individual;
providing an excitation signal component derived from the voice of the target individual; and
applying the spectral envelope from the source individual to the excitation signal component from the target individual.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention is a method for transforming a source individual'"'"'s voice so as to adopt the characteristics of a target individual'"'"'s voice. The excitation signal component of the target individual'"'"'s voice is extracted and the spectral envelope of the source individual'"'"'s voice is extracted. The transformed voice is synthesized by applying the spectral envelope of the source individual to the excitation signal component of the voice of the target individual. A higher quality transformation is achieved using an enhanced excitation signal created by replacing unvoiced regions of the signal with interpolated data from adjacent voiced regions. Various methods of transforming the spectral characteristics of the source individual'"'"'s voice are also disclosed.
-
Citations
39 Claims
-
1. A method of transforming the voice of a source individual so as to adopt characteristics of a target individual, comprising:
-
providing a spectral envelope derived from the voice of the source individual;
providing an excitation signal component derived from the voice of the target individual; and
applying the spectral envelope from the source individual to the excitation signal component from the target individual. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
performing spectral analysis on the target vocal signal to determine the time-varying spectral envelop thereof;
using said spectral envelope to produce a time-varying filter; and
using said time-varying filter to flatten said spectral envelopes.
-
-
7. The method according to claim 6 further comprising the steps of identifying voiced and unvoiced signal segments in the excitation signal component and replacing unvoiced signal segments with interpolated data from the voiced signal segments.
-
8. The method according to claim 7 wherein unvoiced segments in the signal are identified by comparing the parameters of the segments to thresholds selected from among the group of parameters comprising:
- average segment power, average low-band segment power, zero crossings per segment.
-
9. The method according to claim 7 wherein said step of replacing with interpolated data comprises using sinusoidal synthesis to morph between the edges of the voiced signals adjacent said silence portions.
-
10. The method according to claim 1 further comprising the steps of storing said excitation signal;
- and
performing spectral analysis on a vocal signal representative of the voice of the source individual so as to determine the spectral envelope of said vocal signal.
- and
-
11. The method according to claim 1 or 10 further comprising the step of transforming the spectral envelope of said vocal signal prior to applying said spectral envelope of said vocal signal to said excitation signal.
-
12. The method according to claim 10 further comprising the steps of:
-
obtaining a digital transfer function corresponding to the spectral envelope of said vocal signal;
decomposing said digital transfer function into a plurality of lower order sections; and
,modifying the spectral characteristics of at least one of said lower-order sections.
-
-
13. The method according to claim 10 further comprising the step of transforming the spectral envelope by applying conformal mapping to the difference equation of the time-varying synthesis filter.
-
14. The method according to claim 13 wherein said vocal signal represents singing.
-
15. The method according to any of claim 23 or 13 further comprising the steps of splitting said vocal signal into a plurality of frequency bands and independently transforming the spectral envelopes corresponding to said bands.
-
16. The method according to claim 10 wherein at least one of the source individual and the target individual is a singer and further comprising the step of applying conformal mapping to the difference equation of the time-varying synthesis filter.
-
17. The method according to claim 1 further comprising the step of determining the pitch of the vocal signal representative of the target individual.
-
18. The method according to claim 17 further comprising the step of transforming the pitch of the target excitation signal to match the pitch of the source vocal signal.
-
19. The method according to claim 18 further comprising the step of determining the average pitch of the vocal signal of the source individual over periods of at least 50 milliseconds.
-
20. The method according to claim 1 further comprising the steps of:
-
segmenting a signal representative of the voice of said source individual into voiced and non-voiced regions;
if a given region represents voiced input, generating output by applying a spectral envelope derived from said region to said excitation signal component; and
,if said given region represents unvoiced input, generating output based on said region without reference to said excitation signal component.
-
-
21. The method according to claim 1 further comprising the steps of:
-
transforming the spectral envelope of said second signal prior to applying said spectral envelope of said second signal to said excitation signal;
determining the amplitude envelope of the source vocal signal; and
,applying said amplitude envelope to an output signal resulting from applying the spectral envelope of the voice of the source individual to an excitation signal derived from the voice of the target individual.
-
-
22. The method according to claim 1 wherein said source individual and said target individual are singers.
-
23. The method according to claim 1 further comprising the step of transforming the spectral envelope of said second vocal signal prior to applying said spectral envelope of said vocal signal to said excitation signal and wherein said step of transforming comprises modifying the temporal extent of a block of samples of vocal signals representative of the voice of the source individual prior to the step of performing spectral analysis.
-
24. The method according to claim 1 further comprising the step of splitting the vocal signal representative of the voice of the source individual into a low frequency band and a high frequency band and processing only said low frequency band according to the method of claim 1.
-
25. The method according to claim 24 further comprising the steps of:
-
decimating the low frequency portion;
analyzing the low frequency portion and generating reflection coefficients ki;
sampling the excitation signal at the same rate as a rate at which the source vocal signal is sampled;
filtering the sampled excitation signal using an interpolated lattice filter;
post-filtering the excitation signal by a lowpass filter to remove the spectral image of the interpolated lattice filter; and
,applying gain compensation.
-
-
26. The method according to claim 24 further comprising the steps of:
-
decimating the low frequency portion;
analyzing the low frequency portion and generating reflection coefficients ki;
sampling the excitation signal at a rate matching the decimated rate of the low frequency portion; and
,applying gain compensation.
-
-
27. The method according to claim 1 wherein said step of applying a spectral envelope derived from the voice of a source individual comprises the steps of splitting said vocal signal into plurality of frequency bands, independently transforming the spectral envelopes corresponding to said bands and applying said transformed spectral envelopes to said bands.
-
28. The method according to claim 27 where the steps of transforming and applying the spectral envelope in any band comprises the following steps:
-
resampling said signal in said band to create a resampled signal SD(t) with a lower effective sampling rate;
performing a low-order spectral analysis on SD(t) and computing the direct-form filter coefficients aD(i);
modifying the coefficients aD(i) using conformal-mapping to scale the spectrum in proportion to the ratio between the pitch of the target vocal signal and pitch of the source vocal signal; and
,applying the resulting filter to the target excitation signal.
-
-
29. The method according to claim 27 where the steps of transforming and applying the spectral envelope in any band comprises the following steps:
-
resampling said signal in said band to create a resampled signal SD(t) with a lower effective sampling rate;
performing a temporal scaling of the said signal in said band;
performing a low-order spectral analysis on SD(t); and
,applying the resulting filter to the target excitation signal.
-
-
30. The method according to claim 1 further comprising the step of extracting and storing the excitation signal component from the voice of the target individual and wherein unvoiced regions of said excitation signal component are replaced with interpolated voiced data.
-
31. The method according to claim 30 further comprising the step of determining a pitch contour for the excitation signal.
-
32. The method according to claim 30 further comprising the steps of:
-
segmenting the excitation signal component into analysis segments; and
,determining whether each of said analysis segments represents voiced or unvoiced signal by comparing parameters of the segments to thresholds selected from among the group of parameters comprising;
average segment power, average low-band segment power, zero crossings per segment.
-
-
33. The method according to claim 30 wherein said step of replacing unvoiced regions with interpolated voiced data comprises using sinusoidal synthesis to morph between the edges of voiced signal portions adjacent unvoiced regions.
-
34. The method according to claim 33 further comprising the use of a random pitch component.
-
35. The method according to claim 33 further comprising the step of storing parameters characterizing said excitation signal component, said parameters being selected from among the group comprising pitch contour and location of unvoiced regions and using said parameters in performing said step of replacing with interpolated voiced data.
-
36. A method of transforming the voice of a source individual so as to adopt characterstics of a target individual, comprising:
-
providing a vocal signal representative of the voice of a target individual;
extracting an excitation signal component of said vocal signal;
storing the excitation signal component of said vocal signal; and
applying the excitation signal component of said vocal signal to a signal derived from the voice of the source individual. - View Dependent Claims (37)
-
-
38. A method of transforming the voice of a source individual so as to adopt characteristics of the voices of at least two target individuals comprising:
-
providing a spectral envelope derived from the voice of the source individual;
providing a combined excitation signal derived from the voices of the at least two target individuals; and
applying the spectral envelope from the source individual to the combined excitation signal from the at least two target individuals. - View Dependent Claims (39)
extracting the excitation signal components from the voices of each of the target individuals;
combining the extracted excitation signal components from the voices of each of the target individuals into a combined excitation signal; and
,performing spectral analysis on a vocal signal representative of the voice of the source individual so as to determine the spectral envelope of said vocal signal.
-
Specification