Converting apparatus of voice signal by modulation of frequencies and amplitudes of sinusoidal wave components

US 7,117,154 B2
Filed: 10/27/1998
Issued: 10/03/2006
Est. Priority Date: 10/28/1997
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:

extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;

separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates which are numbered sequentially in a manner the same as the sinusoidal wave components;

memory means for storing reference pitch information representative of a pitch of the reference voice signal, the pitch information including primary pitch information representative of a discrete pitch matching a music scale and secondary pitch information representative of a fractional pitch fluctuating relative to the discrete pitch, and storing reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal;

first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the primary reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates, the first modulating means further modulating the modulated frequency value coordinates of the sinusoidal wave components of the input voice signal according to the secondary reference pitch information retrieved from the memory means, to generate further modulated frequency value coordinates;

control means for setting control parameters effective to control degrees of the modulation of the frequency value coordinates by the primary reference pitch information and the secondary pitch information, respectively, so that a degree of influence of the pitch of the reference voice signal to a pitch of the output voice signal is determined according to the control parameters;

second modulating means for modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means, such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio;

combining means for combining each of the modulated frequency value coordinates and each of the further modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and

mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice converter synthesizes an output voice signal from an input voice signal and a reference voice signal. In the voice converter, an analyzer device analyzes a plurality of sinusoidal wave components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude representing each sinusoidal wave component. A source device provides reference information characteristic of the reference voice signal. A modulator device modulates the parameter set of each sinusoidal wave component according to the reference information. A regenerator device operates according to each of the parameter sets as modulated to regenerate each of the sinusoidal wave components so that at least one of the frequency and the amplitude of each sinusoidal wave component as regenerated varies from original one, and mixes the regenerated sinusoidal wave components altogether to synthesize the output voice signal.

35 Citations

View as Search Results

46 Claims

1. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
- extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates which are numbered sequentially in a manner the same as the sinusoidal wave components;
  
  memory means for storing reference pitch information representative of a pitch of the reference voice signal, the pitch information including primary pitch information representative of a discrete pitch matching a music scale and secondary pitch information representative of a fractional pitch fluctuating relative to the discrete pitch, and storing reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal;
  
  first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the primary reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates, the first modulating means further modulating the modulated frequency value coordinates of the sinusoidal wave components of the input voice signal according to the secondary reference pitch information retrieved from the memory means, to generate further modulated frequency value coordinates;
  
  control means for setting control parameters effective to control degrees of the modulation of the frequency value coordinates by the primary reference pitch information and the secondary pitch information, respectively, so that a degree of influence of the pitch of the reference voice signal to a pitch of the output voice signal is determined according to the control parameters;
  
  second modulating means for modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means, such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio;
  
  combining means for combining each of the modulated frequency value coordinates and each of the further modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and
  
  mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The apparatus as claimed in claim 1, further comprising control means for setting a control parameter effective to control a degree of modulation of the frequency of each sinusoidal wave component by the modulating means so that a degree of influence of the pitch of the reference voice signal to the pitch of the output voice signal is determined according to the control parameter.
  - 3. The apparatus as claimed in claim 1, further comprising detecting means for detecting a pitch of the input voice signal based on results of extraction of the sinusoidal wave components, and switch means operative when the detecting means does not detect the pitch from the input voice signal for outputting an original of the input voice signal in place of the synthesized output voice signal.
  - 4. The apparatus as claimed in claim 1, wherein the mixing means mixes the plurality of the sinusoidal wave components having the modulated amplitudes to synthesize the output voice signal having a timbre different from that of the input voice signal and influenced by the timbre of the reference voice signal.
  - 5. The apparatus as claimed in claim 4, further comprising means for setting a control parameter effective to control a degree of modulation of the amplitude of each sinusoidal wave component by the modulating means so that a degree of influence of the timbre of the reference voice signal to the timbre of the output voice signal is determined according to the control parameter.
  - 6. The apparatus as claimed in claim 1, further comprising means for memorizing volume information representative of a volume variation of the reference voice signal, and means for varying a volume of the output voice signal according to the volume information so that the output voice signal emulates the volume variation of the reference voice signal.
  - 7. The apparatus as claimed in claim 1, further comprising means for separating a residual component from the input voice signal after extraction of the sinusoidal wave components, and means for adding the residual component to the output voice signal.
  - 8. The apparatus as claimed in claim 1, wherein the extracting means utilizes Fast Fourier Transform and a peak detecting means to extract the plurality of sinusoidal components from the input voice signal, the Fast Fourier Transform being carried in prescribed frame units to create a frequency spectrum successively for each frame of the input voice signal, the peak detecting means detecting peaks in the frequency spectrum to extract the frequency value coordinates.
  - 9. The apparatus according to claim 1, wherein the deterministic components include peak values of the input voice signal in a frequency spectrum.
  - 10. The apparatus according to claim 1, wherein the residual components include deviation components between a synthetic voice signal and the input voice signal.

11. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
- extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
  
  (n =1, 2, 3, . . . );
  
  memory means for storing, as memorized amplitude value coordinates, reference amplitude information representative of reference amplitude value coordinates ATn (n =1, 2, 3, . . . ), which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal;
  
  modulating means for modulating the amplitude value coordinates ASn′
  
  of the sinusoidal wave components of the input voice signal extracted from the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates ATn, which are numbered correspondingly to the amplitude value coordinates of the input voice signal, and retrieved from the memory means by the following calculation (1−
  
  γ
  
  ) * ASn′
  
  +γ
  
  * ATn (n=1, 2, 3, . . . ), where the parameter γ
  
  takes a value from zero to one and represents a degree of mixing; and
  
  mixing means for mixing the plurality of the sinusoidal wave components having the modulated amplitude value coordinates to synthesize the output voice signal having a timbre different from that of the input voice signal and influenced by the timbre of the reference voice signal,wherein the modulating means comprisesnormalizing means for normalizing the amplitude value coordinates of the sinusoidal wave components of the input voice signal by a mean amplitude of the input voice signal, to generate normalized amplitude value coordinates,a second mixing means for mixing the normalized amplitude value coordinates of the input voice signal and the memorized amplitude value coordinates of the reference voice signal with one another by a predetermined ratio to produce mixed amplitude value coordinates, andmultiplying means for multiplying the normalized amplitude value coordinates of the sinusoidal wave components of the input voice signal with the mean amplitude of the input voice signal.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The apparatus as claimed in claim 11, further comprising control means for setting a control parameter effective to control a degree of modulation of the amplitude of each sinusoidal wave component by the modulating means so that a degree of influence of the timbre of the reference voice signal to the timbre of the output voice signal is determined according to the control parameter.
  - 13. The apparatus as claimed in claim 11, wherein the memory means further stores pitch information representative of a pitch of the reference voice signal, and the modulating means further modulates a frequency of each sinusoidal wave component of the input voice signal according to the pitch information, so that the mixing means mixes the plurality of the sinusoidal wave components having the modulated frequencies to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.
  - 14. The apparatus as claimed in claim 13, further comprising means for setting a control parameter effective to control a degree of modulation of the frequency of each sinusoidal wave component by the modulating means so that a degree of influence of the pitch of the reference voice signal to the pitch of the output voice signal is determined according to the control parameter.
  - 15. The apparatus as claimed in claim 11, further comprising detecting means for detecting a pitch of the input voice signal based on results of extraction of the sinusoidal wave components, and switch means operative when the detecting means does not detect the pitch from the input voice signal for outputting an original of the input voice signal in place of the synthesized output voice signal.
  - 16. The apparatus as claimed in claim 11, further comprising means for memorizing volume information representative of a volume variation of the reference voice signal, and means for varying a volume of the output voice signal according to the volume information so that the output voice signal emulates the volume variation of the reference voice signal.
  - 17. The apparatus as claimed in claim 11, further comprising means for separating a residual component from the input voice signal after extraction of the sinusoidal wave components, and means for adding the residual component to the output voice signal.
  - 18. The apparatus claimed In claim 11, wherein the extracting means utilizes Fast Fourier Transform and a peak detecting means to extract the plurality of sinusoidal component from the input voice signal, the Fast Fourier Transform being carded inprescribed frame units to create a frequency spectrum successively for each frame of the Input voice signal, the peak detecting means detecting peaks in the frequency spectrum to extract the amplitude value coordinates.
  - 19. The apparatus according to claim 11, wherein the deterministic components include peak values of the input voice signal in a frequency spectrum.
  - 20. The apparatus according to claim 11, wherein the residual components include deviation components between a synthetic voice signal and the input voice signal.

21. An apparatus for synthesizing an output voice signal from an input voice signal and a reference voice signal, the apparatus comprising:
- an analyzer device that analyzes only deterministic components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  a separating device to separate the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
  
  (n =1, 2, 3, . . . ), which are numbered sequentially in a manner the same as the sinusoidal wave components;
  
  a source device that provides reference information characteristic of the reference voice signal, the reference information being reference amplitude information representative of reference amplitude value coordinates ATn (n=1, 2, 3, . . . ), which are numbered sequentially;
  
  a modulator device that modulates the parameter set of the sinusoidal wave components according to the reference information;
  
  a regenerator device that operates according to each of the parameter sets as modulated to regenerate each of the sinusoidal wave components so that at least one of the frequency and the amplitude of each sinusoidal wave component as regenerated varies from the original one, and that mixes the regenerated sinusoidal wave components together to synthesize the output voice signal;
  
  a second modulator device to modulate the amplitude value coordinates ASn′
  
  of the sinusoidal wave components of the input voice signal according to reference amplitude information, representative of amplitudes of the sinusoidal wave components contained in the reference voice signal ATn which are numbered correspondingly to the amplitude value coordinates of the input voice signal, to generate modulated amplitude value coordinates by utilizing the following calculation (1 −
  
  γ
  
  )*ASn′
  
  +γ
  
  * ATn (n=1, 2, 3, . . . ), where the parameter γ
  
  takes a value from zero to one and represents a degree of mixing;
  
  a combining device to combine the modulated frequency value coordinates and the modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 22. The apparatus as claimed in claim 21, wherein the source device provides the reference information characteristic of a pitch of the reference voice signal, and wherein the modulator device modulates the parameter set of each sinusoidal wave component according to the reference information so that the frequency of each sinusoidal wave component as regenerated varies from the original frequency, the pitch of the output voice signal being synthesized according to the pitch of the reference voice signal.
  - 23. The apparatus as claimed in claim 21, wherein the source device provides the reference information characteristic of a timbre of the reference voice signal, and wherein the modulator device modulates the parameter set of each sinusoidal wave component according to the reference information so that the amplitude of each sinusoidal wave component as regenerated varies from the original amplitude, the timbre of the output voice signal being synthesized according to the timbre of the reference voice signal.
  - 24. The apparatus as claimed in claim 21, further comprising a control device that provides a control parameter effective to control the modulator device so that a degree of modulation of the parameter set is variably determined according to the control parameter.
  - 25. The apparatus as claimed in claim 21, further comprising a detector device that detects a pitch of the input voice signal based on analysis of the sinusoidal wave components by the analyzer device, and a switch device operative when the detector device does not detect the pitch from the input voice signal for outputting an original of the input voice signal in place of the synthesized output voice signal.
  - 26. The apparatus as claimed in claim 21, further comprising a memory device that stores volume information representative of a volume variation of the reference voice signal, and a volume device that varies a volume of the output voice signal according to the volume information so that the output voice signal emulates the volume variation of the reference voice signal.
  - 27. The apparatus as claimed in claim 21, further comprising a separator device that separates a residual component other than the sinusoidal wave components from the input voice signal, and an adder device that adds the residual component to the output voice signal.
  - 28. The apparatus as claimed in claim 21, wherein the parameter set is in the form of a plurality of frequency value and amplitude value coordinates, the frequency value coordinates representing the original frequency and the amplitude value coordinates representing the original amplitude.
  - 29. The apparatus as claimed in claim 21, wherein the analyzer device utilizes Fast Fourier Transform and a peak detecting means to derive the parameter set representing the corresponding sinusoidal wave component, the Fast Fourier Transform being carded In prescribed frame units to create a frequency spectrum successively for each frame of the input voice signal, the peak detecting means detecting peaks In the frequency spectrum to extract the parameter set.
  - 30. The apparatus according to claim 21, wherein the deterministic components include peak values of the input voice signal in a frequency spectrum.
  - 31. The apparatus according to claim 21, wherein the residual components include deviation components between a synthetic voice signal and the input voice signal.

32. A method of converting an input voice signal into an output voice signal according to a reference voice signal, the method comprising the steps of:
- extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates, which are numbered sequentially in a manner the same as the sinusoidal wave components;
  
  storing reference pitch information representative of a pitch of the reference voice signal, the pitch information including primary pitch information representative of a discrete pitch matching a music scale and secondary pitch information representative of a fractional pitch fluctuating relative to the discrete pitch, and storing reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal;
  
  modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the primary reference pitch information, to generate modulated amplitude value coordinates, and further modulating the modulated frequency value coordinates of the sinusoidal wave components of the input voice signal according to the secondary reference pitch information retrieved from the memory means, to generate further modulated frequency value coordinates;
  
  setting control parameters effective to control degrees of modulation of the frequency value coordinates by the primary reference pitch information and the secondary pitch information, respectively, so that a degree of influence of the pitch of the reference signal to a pitch of the output voice signal is determined according to the control parameters;
  
  mixing the plurality of the sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by that of the reference voice signal;
  
  modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio; and
  
  combining the modulated frequency value coordinates and the modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal.
- View Dependent Claims (33, 34, 35)
- - 33. The method as claimed in claim 32, wherein the extracting step involves utilizing Fast Fourier Transform and peak detection to extract the plurally of sinusoidal components from the input voice signal, the Fast Fourier Transform being carried in prescribed frame units to create a frequency spectrum successively for each frame of the input voice signal, the peak detection detecting peaks In the frequency spectrum to extract the frequency value coordinates.
  - 34. The method according to claim 32, wherein the deterministic components include peak values of the input voice signal in a frequency spectrum.
  - 35. The method according to claim 32, wherein the residual components include deviation components between a synthetic voice signal and the input voice signal.

36. A method of converting an input voice signal into an output voice signal according to a reference voice signal, the method comprising the steps of:
- extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
  
  (n=1, 2, 3, . . . );
  
  storing, as stored amplitude value coordinates, reference amplitude information representative of reference amplitude value coordinates ATn (n=1, 2, 3, . . . ), which are numbered sequentially in a manner the same as the sinusoidal wave components, of the sinusoidal wave components contained in the reference voice signal;
  
  modulating the amplitude value coordinates ASn′
  
  of the sinusoidal wave components of the input voice signal extracted from the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates ATn, by the following calculation (1−
  
  γ
  
  ) * ASn′
  
  +γ
  
  * ATn (n=1, 2, 3, . . . ), where the parameter γ
  
  takes a value from zero to one and represents a degree of mixing, which are numbered correspondingly to the amplitude value coordinates of the input voice signal such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude coordinate by a set ratio, retrieved from the memory means; and
  
  mixing the plurality of the sinusoidal wave components having the modulated amplitude value coordinates to synthesize the output vice signal having a timbre different from that of the input voice signal and influenced by the timbre of the reference voice signal;
  
  normalizing the amplitude value coordinates of the sinusoidal wave components of the input voice signal by a mean amplitude of the input voice signal, to generate normalized amplitude value coordinates;
  
  mixing the normalized amplitude value coordinates of the input voice signal and the stored amplitude value coordinates of the reference voice signal with one another by a predetermined ratio to produce mixed amplitude value coordinates; and
  
  multiplying the normalized amplitude value coordinates of the sinusoidal wave components of the input voice signal with the mean amplitude of the input voice signal.
- View Dependent Claims (37, 38, 39)
- - 37. The method as claimed in claim 36, wherein the extracting step involves utilizing Fast Fourier Transform and peak detection to extract the plurally of sinusoidal components from the input voice signal, the Fast Fourier Transform being carried In prescribed frame units to create a frequency spectrum successively for each frame of the input voice signal, the peak detection detecting peaks in the frequency spectrum to extract the amplitude value coordinates.
  - 38. The method according to claim 36, wherein the deterministic components include peak values of the input voice signal in a frequency spectrum.
  - 39. The method according to claim 36, wherein the residual components include deviation components between a synthetic voice signal and the input voice signal.

40. A machine readable medium used in a computer machine having a CPU for synthesizing an output voice signal from an input voice signal, the medium containing program instructions executed by the CPU for causing the computer machine to perform the method comprising the steps of:
- analyzing only deterministic components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
  
  (n=1, 2, 3, . . . );
  
  providing reference information characteristic of the reference voice signal, including reference amplitude information representative of amplitude value coordinates ATn(n=1, 2, 3, . . . );
  
  modulating the amplitude value coordinates ASn′
  
  according to the reference amplitude information representative of the amplitude value coordinates ATn by the following calculation (1 −
  
  γ
  
  ) * ASn′
  
  +γ
  
  * ATn (n=1, 2, 3, . . . ), where the parameter γ
  
  takes a value from zero to one and represents a degree of mixing, to generate modulated amplitude value coordinates;
  
  regenerating each of the sinusoidal wave components according to each of the modulated parameter sets so that at least one of the frequency and the amplitude of each regenerated sinusoidal wave components varies from the original one, andmixing the regenerated sinusoidal wave components together to synthesize the output voice signal;
  
  separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates, which are numbered sequentially in a manner the same as the sinusoidal wave components;
  
  modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information, representative of reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio, of the sinusoidal wave components contained in the reference voice signal, to generate modulated amplitude value coordinates; and
  
  combining the modulated frequency value coordinates and the modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal.
- View Dependent Claims (41, 42, 43, 44)
- - 41. The machine readable medium as claimed in claim 40, wherein the parameter set is in the form of a plurality of frequency value and amplitude value coordinates, the frequency value coordinates representing the original frequency and the amplitude value coordinates representing the original amplitude. detecting peaks In the frequency spectrum to extract the frequency value coordinates.
  - 42. The machine readable medium as claimed in claim 40, wherein the analyzing step involves utilizing Fast Fourier Transform and peak detection to derive the parameter set representing the corresponding sinusoidal wave component, the Fast Fourier Transform being carried in prescribed frame units to create a frequency spectrum successively for each frame of the input voice signal, the peak detection detecting peaks in the frequency spectrum to extract the parameter set.
  - 43. The machine-readable medium according to claim 40, wherein the deterministic components include peak values of the input voice signal in a frequency spectrum.
  - 44. The machine-readable medium according to claim 40, wherein the residual components include deviation components between a synthetic voice signal and the input voice signal.

45. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
- extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates which are numbered sequentially in a manner the same as the sinusoidal wave components;
  
  memory means for storing reference pitch information representative of a pitch of the reference voice signal, and reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal;
  
  first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates;
  
  second modulating means for modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means, such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio;
  
  combining means for combining each of the modulated frequency value coordinates and each of the modulated amplitude value coordinates, which are processed separately from each other and which are numbered correspondingly to each other, to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and
  
  mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.

46. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
- extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
  
  separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
  
  (n=1, 2, 3, . . . );
  
  memory means for storing reference pitch information representative of a pitch of the reference voice signal, and reference amplitude information representative of reference amplitude value coordinates ATn (n=1, 2, 3, . . . ) of the sinusoidal wave components contained in the reference voice signal;
  
  first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates;
  
  second modulating means for modulating the amplitude value coordinates ASn′
  
  of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates representative of the amplitude value coordinates ATn retrieved from the memory means by the following calculation (1−
  
  γ
  
  ) *ASn′
  
  +γ
  
  *ATn (n=1, 2, 3, . . . ), where the parameter γ
  
  takes a value from zero to one and represents a degree of mixing;
  
  combining means for combining each of the modulated frequency value coordinates and each of the modulated amplitude value coordinates, which are processed separately from each other and which are numbered correspondingly to each other, to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and
  
  mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pompeu Fabra University, Yamaha Corporation
Original Assignee
Pompeu Fabra University, Yamaha Corporation
Inventors
Yoshioka, Yasuo, Serra, Xavier
Primary Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/181,021
Publication Number

US 20010044721A1
Time in Patent Office

2,898 Days
Field of Search

704/258, 704/268, 704/270, 704/272, 704/278, 704/207, 704/205, 846/10, 846/34, 846/04, 434/307.A
US Class Current

704/258
CPC Class Codes

G10L 19/18 Vocoders using multiple modes

Converting apparatus of voice signal by modulation of frequencies and amplitudes of sinusoidal wave components

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

35 Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

Converting apparatus of voice signal by modulation of frequencies and amplitudes of sinusoidal wave components

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links