Converting apparatus of voice signal by modulation of frequencies and amplitudes of sinusoidal wave components
First Claim
1. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
- extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components;
separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates which are numbered sequentially in a manner the same as the sinusoidal wave components;
memory means for storing reference pitch information representative of a pitch of the reference voice signal, the pitch information including primary pitch information representative of a discrete pitch matching a music scale and secondary pitch information representative of a fractional pitch fluctuating relative to the discrete pitch, and storing reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal;
first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the primary reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates, the first modulating means further modulating the modulated frequency value coordinates of the sinusoidal wave components of the input voice signal according to the secondary reference pitch information retrieved from the memory means, to generate further modulated frequency value coordinates;
control means for setting control parameters effective to control degrees of the modulation of the frequency value coordinates by the primary reference pitch information and the secondary pitch information, respectively, so that a degree of influence of the pitch of the reference voice signal to a pitch of the output voice signal is determined according to the control parameters;
second modulating means for modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means, such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio;
combining means for combining each of the modulated frequency value coordinates and each of the further modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and
mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.
2 Assignments
0 Petitions
Accused Products
Abstract
A voice converter synthesizes an output voice signal from an input voice signal and a reference voice signal. In the voice converter, an analyzer device analyzes a plurality of sinusoidal wave components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude representing each sinusoidal wave component. A source device provides reference information characteristic of the reference voice signal. A modulator device modulates the parameter set of each sinusoidal wave component according to the reference information. A regenerator device operates according to each of the parameter sets as modulated to regenerate each of the sinusoidal wave components so that at least one of the frequency and the amplitude of each sinusoidal wave component as regenerated varies from original one, and mixes the regenerated sinusoidal wave components altogether to synthesize the output voice signal.
35 Citations
46 Claims
-
1. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
-
extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates which are numbered sequentially in a manner the same as the sinusoidal wave components; memory means for storing reference pitch information representative of a pitch of the reference voice signal, the pitch information including primary pitch information representative of a discrete pitch matching a music scale and secondary pitch information representative of a fractional pitch fluctuating relative to the discrete pitch, and storing reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal; first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the primary reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates, the first modulating means further modulating the modulated frequency value coordinates of the sinusoidal wave components of the input voice signal according to the secondary reference pitch information retrieved from the memory means, to generate further modulated frequency value coordinates; control means for setting control parameters effective to control degrees of the modulation of the frequency value coordinates by the primary reference pitch information and the secondary pitch information, respectively, so that a degree of influence of the pitch of the reference voice signal to a pitch of the output voice signal is determined according to the control parameters; second modulating means for modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means, such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio; combining means for combining each of the modulated frequency value coordinates and each of the further modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
-
extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
(n =1, 2, 3, . . . );memory means for storing, as memorized amplitude value coordinates, reference amplitude information representative of reference amplitude value coordinates ATn (n =1, 2, 3, . . . ), which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal; modulating means for modulating the amplitude value coordinates ASn′
of the sinusoidal wave components of the input voice signal extracted from the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates ATn, which are numbered correspondingly to the amplitude value coordinates of the input voice signal, and retrieved from the memory means by the following calculation (1−
γ
) * ASn′
+γ
* ATn (n=1, 2, 3, . . . ), where the parameter γ
takes a value from zero to one and represents a degree of mixing; andmixing means for mixing the plurality of the sinusoidal wave components having the modulated amplitude value coordinates to synthesize the output voice signal having a timbre different from that of the input voice signal and influenced by the timbre of the reference voice signal, wherein the modulating means comprises normalizing means for normalizing the amplitude value coordinates of the sinusoidal wave components of the input voice signal by a mean amplitude of the input voice signal, to generate normalized amplitude value coordinates, a second mixing means for mixing the normalized amplitude value coordinates of the input voice signal and the memorized amplitude value coordinates of the reference voice signal with one another by a predetermined ratio to produce mixed amplitude value coordinates, and multiplying means for multiplying the normalized amplitude value coordinates of the sinusoidal wave components of the input voice signal with the mean amplitude of the input voice signal. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus for synthesizing an output voice signal from an input voice signal and a reference voice signal, the apparatus comprising:
-
an analyzer device that analyzes only deterministic components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; a separating device to separate the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
(n =1, 2, 3, . . . ), which are numbered sequentially in a manner the same as the sinusoidal wave components;a source device that provides reference information characteristic of the reference voice signal, the reference information being reference amplitude information representative of reference amplitude value coordinates ATn (n=1, 2, 3, . . . ), which are numbered sequentially; a modulator device that modulates the parameter set of the sinusoidal wave components according to the reference information; a regenerator device that operates according to each of the parameter sets as modulated to regenerate each of the sinusoidal wave components so that at least one of the frequency and the amplitude of each sinusoidal wave component as regenerated varies from the original one, and that mixes the regenerated sinusoidal wave components together to synthesize the output voice signal; a second modulator device to modulate the amplitude value coordinates ASn′
of the sinusoidal wave components of the input voice signal according to reference amplitude information, representative of amplitudes of the sinusoidal wave components contained in the reference voice signal ATn which are numbered correspondingly to the amplitude value coordinates of the input voice signal, to generate modulated amplitude value coordinates by utilizing the following calculation (1 −
γ
)*ASn′
+γ
* ATn (n=1, 2, 3, . . . ), where the parameter γ
takes a value from zero to one and represents a degree of mixing;a combining device to combine the modulated frequency value coordinates and the modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A method of converting an input voice signal into an output voice signal according to a reference voice signal, the method comprising the steps of:
-
extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates, which are numbered sequentially in a manner the same as the sinusoidal wave components; storing reference pitch information representative of a pitch of the reference voice signal, the pitch information including primary pitch information representative of a discrete pitch matching a music scale and secondary pitch information representative of a fractional pitch fluctuating relative to the discrete pitch, and storing reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal; modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the primary reference pitch information, to generate modulated amplitude value coordinates, and further modulating the modulated frequency value coordinates of the sinusoidal wave components of the input voice signal according to the secondary reference pitch information retrieved from the memory means, to generate further modulated frequency value coordinates; setting control parameters effective to control degrees of modulation of the frequency value coordinates by the primary reference pitch information and the secondary pitch information, respectively, so that a degree of influence of the pitch of the reference signal to a pitch of the output voice signal is determined according to the control parameters; mixing the plurality of the sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by that of the reference voice signal; modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio; and combining the modulated frequency value coordinates and the modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal. - View Dependent Claims (33, 34, 35)
-
-
36. A method of converting an input voice signal into an output voice signal according to a reference voice signal, the method comprising the steps of:
-
extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
(n=1, 2, 3, . . . );storing, as stored amplitude value coordinates, reference amplitude information representative of reference amplitude value coordinates ATn (n=1, 2, 3, . . . ), which are numbered sequentially in a manner the same as the sinusoidal wave components, of the sinusoidal wave components contained in the reference voice signal; modulating the amplitude value coordinates ASn′
of the sinusoidal wave components of the input voice signal extracted from the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates ATn, by the following calculation (1−
γ
) * ASn′
+γ
* ATn (n=1, 2, 3, . . . ), where the parameter γ
takes a value from zero to one and represents a degree of mixing, which are numbered correspondingly to the amplitude value coordinates of the input voice signal such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude coordinate by a set ratio, retrieved from the memory means; andmixing the plurality of the sinusoidal wave components having the modulated amplitude value coordinates to synthesize the output vice signal having a timbre different from that of the input voice signal and influenced by the timbre of the reference voice signal; normalizing the amplitude value coordinates of the sinusoidal wave components of the input voice signal by a mean amplitude of the input voice signal, to generate normalized amplitude value coordinates; mixing the normalized amplitude value coordinates of the input voice signal and the stored amplitude value coordinates of the reference voice signal with one another by a predetermined ratio to produce mixed amplitude value coordinates; and multiplying the normalized amplitude value coordinates of the sinusoidal wave components of the input voice signal with the mean amplitude of the input voice signal. - View Dependent Claims (37, 38, 39)
-
-
40. A machine readable medium used in a computer machine having a CPU for synthesizing an output voice signal from an input voice signal, the medium containing program instructions executed by the CPU for causing the computer machine to perform the method comprising the steps of:
-
analyzing only deterministic components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
(n=1, 2, 3, . . . );
providing reference information characteristic of the reference voice signal, including reference amplitude information representative of amplitude value coordinates ATn(n=1, 2, 3, . . . );modulating the amplitude value coordinates ASn′
according to the reference amplitude information representative of the amplitude value coordinates ATn by the following calculation (1 −
γ
) * ASn′
+γ
* ATn (n=1, 2, 3, . . . ), where the parameter γ
takes a value from zero to one and represents a degree of mixing, to generate modulated amplitude value coordinates;regenerating each of the sinusoidal wave components according to each of the modulated parameter sets so that at least one of the frequency and the amplitude of each regenerated sinusoidal wave components varies from the original one, and mixing the regenerated sinusoidal wave components together to synthesize the output voice signal; separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates, which are numbered sequentially in a manner the same as the sinusoidal wave components; modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information, representative of reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio, of the sinusoidal wave components contained in the reference voice signal, to generate modulated amplitude value coordinates; and combining the modulated frequency value coordinates and the modulated amplitude value coordinates to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal. - View Dependent Claims (41, 42, 43, 44)
-
-
45. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
-
extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates which are numbered sequentially in a manner the same as the sinusoidal wave components; memory means for storing reference pitch information representative of a pitch of the reference voice signal, and reference amplitude information representative of reference amplitude value coordinates, which are numbered sequentially, of the sinusoidal wave components contained in the reference voice signal; first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates; second modulating means for modulating the amplitude value coordinates of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates which are numbered correspondingly to the amplitude value coordinates of the input voice signal, retrieved from the memory means, such that each amplitude value coordinate of the input voice signal is mixed with the corresponding reference amplitude value coordinate by a set ratio; combining means for combining each of the modulated frequency value coordinates and each of the modulated amplitude value coordinates, which are processed separately from each other and which are numbered correspondingly to each other, to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.
-
-
46. An apparatus for converting an input voice signal into an output voice signal according to a reference voice signal, the apparatus comprising:
-
extracting means for extracting only deterministic components from the input voice signal, the deterministic components including a plurality of sinusoidal wave components which are numbered sequentially, wherein the input voice signal includes the deterministic components and residual components; separating means for separating the sinusoidal wave components into frequency value coordinates and amplitude value coordinates ASn′
(n=1, 2, 3, . . . );memory means for storing reference pitch information representative of a pitch of the reference voice signal, and reference amplitude information representative of reference amplitude value coordinates ATn (n=1, 2, 3, . . . ) of the sinusoidal wave components contained in the reference voice signal; first modulating means for modulating the frequency value coordinates of the sinusoidal wave components of the input voice signal according to the reference pitch information retrieved from the memory means, to generate modulated frequency value coordinates; second modulating means for modulating the amplitude value coordinates ASn′
of the sinusoidal wave components of the input voice signal according to the reference amplitude information representative of the reference amplitude value coordinates representative of the amplitude value coordinates ATn retrieved from the memory means by the following calculation (1−
γ
) *ASn′
+γ
*ATn (n=1, 2, 3, . . . ), where the parameter γ
takes a value from zero to one and represents a degree of mixing;combining means for combining each of the modulated frequency value coordinates and each of the modulated amplitude value coordinates, which are processed separately from each other and which are numbered correspondingly to each other, to synthesize sinusoidal wave components of the output voice signal having an output pitch and an output timbre different from an input pitch and an input timbre of the input voice signal, and influenced by a reference pitch and a reference timbre of the reference voice signal; and mixing means for mixing the synthesized sinusoidal wave components having the modulated frequency value coordinates to synthesize the output voice signal having a pitch different from that of the input voice signal and influenced by the pitch of the reference voice signal.
-
Specification