Voice converter with extraction and modification of attribute data
First Claim
1. An apparatus for converting an input voice signal into an output voice signal according to a target voice signal, the apparatus comprising:
- an input device that provides the input voice signal composed of an original sinusoidal component and an original residual component other than the original sinusoidal component;
an extracting device that extracts original attribute data from at least the sinusoidal component of the input voice signal, the original attribute data being characteristic of the input voice signal;
a synthesizing device that synthesizes new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of a target sinusoidal component and a target residual component other than the sinusoidal component, the target attribute data being derived from at least the target sinusoidal component; and
an output device that operates based on the new attribute data and either of the original residual component and the target residual component for producing the output voice signal.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus is constructed for converting an input voice signal into an output voice signal according to a target voice signal. In the apparatus, an input device provides the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components. An extracting device extracts original attribute data from at least the sinusoidal components of the input voice signal. The original attribute data is characteristic of the input voice signal. A synthesizing device synthesizes new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of target sinusoidal components and target residual components other than the sinusoidal components. The target attribute data is derived from at least the target sinusoidal components. An output device operates based on the new attribute data and either of the original residual component and the target residual component for producing the output voice signal.
-
Citations
59 Claims
-
1. An apparatus for converting an input voice signal into an output voice signal according to a target voice signal, the apparatus comprising:
-
an input device that provides the input voice signal composed of an original sinusoidal component and an original residual component other than the original sinusoidal component;
an extracting device that extracts original attribute data from at least the sinusoidal component of the input voice signal, the original attribute data being characteristic of the input voice signal;
a synthesizing device that synthesizes new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of a target sinusoidal component and a target residual component other than the sinusoidal component, the target attribute data being derived from at least the target sinusoidal component; and
an output device that operates based on the new attribute data and either of the original residual component and the target residual component for producing the output voice signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An apparatus for converting an input voice signal into an output voice signal according to a target voice signal, the apparatus comprising:
-
an input device that provides the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components;
a separating device that separates the original sinusoidal components and the original residual components from each other;
a first modifying device that modifies the original sinusoidal components based on target sinusoidal components contained in the target voice signal so as to form new sinusoidal components having a first pitch;
a second modifying device that modifies the original residual components based on target residual components contained in the target voice signal other than the target sinusoidal components so as to form new residual components having a second pitch;
a shaping device that shapes the new residual components by removing therefrom a fundamental tone corresponding to the second pitch and overtones of the fundamental tone; and
an output device that combines the new sinusoidal components and the shaped new residual components with each other for producing the output voice signal having the first pitch. - View Dependent Claims (14, 15, 16)
-
-
17. An apparatus for converting an input voice signal into an output voice signal according to a target voice signal, the apparatus comprising:
-
an input device that provides the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components;
a separating device that separates the original sinusoidal components and the original residual components from each other;
a first modifying device that modifies the original sinusoidal components based on target sinusoidal components contained in the target voice signal so as to form new sinusoidal components;
a second modifying device that modifies the original residual components based on target residual components contained in the target voice signal other than the target sinusoidal components so as to form new residual components;
a shaping device that shapes the new residual components by introducing thereinto a fundamental tone and overtones of the fundamental tone corresponding to a desired pitch; and
an output device that combines the new sinusoidal components and the shaped new residual components with each other for producing the output voice signal. - View Dependent Claims (18, 19, 20)
-
-
21. An apparatus for converting an input voice signal into an output voice signal by modifying a spectral shape, the apparatus comprising:
-
an input device that provides the input voice signal containing wave components;
an separating device that separates sinusoidal ones of the wave components from the input voice signal such that each sinusoidal wave component is identified by a pair of a frequency and an amplitude;
a computing device that computes a spectral shape of the input voice signal based on a set of the separated sinusoidal wave components such that the spectral shape represents an envelope having a series of break points corresponding to the pairs of the frequencies and the amplitudes of the sinusoidal wave components;
a modifying device that modifies the spectral shape to form a new spectral shape having a modified envelope;
a generating device that selects a series of points along the modified envelope of the new spectral shape and that generates a set of new sinusoidal wave components each identified by each pair of a frequency and an amplitude, which corresponds to each of the series of the selected points; and
an output device that produces the output voice signal based on the set of the new sinusoidal wave components. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. An apparatus for converting an input voice signal into an output voice signal dependently on a predetermined pitch of the output voice signal, the apparatus comprising:
-
an input device that provides the input voice signal containing wave components;
an separating device that separates sinusoidal ones of the wave components from the input voice signal such that each sinusoidal wave component is identified by a pair of a frequency and an amplitude;
a computing device that computes a modification amount of at least one of the frequency and the amplitude of the separated sinusoidal wave components according to the predetermined pitch of the output voice signal;
a modifying device that modifies at least one of the frequency and the amplitude of the separated sinusoidal wave components by the computed modification amount to thereby form new sinusoidal wave components; and
an output device that produces the output voice signal based on the new sinusoidal wave components.
-
-
32. An apparatus for discriminating between a voiced state and an unvoiced state at each frame of a voice signal having a waveform oscillating around a zero level with a variable energy, the apparatus comprising:
-
a zero-cross detecting device that detects a zero-cross point at which the waveform of the voice signal crosses the zero level and that counts a number of the zero-cross points detected within each frame;
an energy detecting device that detects the energy of the voice signal per each frame; and
an analyzing device operative at each frame to determine that the voice signal is placed in the unvoiced state, when the counted number of the zero-cross points is equal to or greater than a lower zero-cross threshold and is smaller than an upper zero-cross threshold, and when the detected energy of the voice signal is equal to or greater than a lower energy threshold and is smaller than an upper energy threshold. - View Dependent Claims (33, 34)
-
-
35. An apparatus for discriminating between a voiced state and an unvoiced state at each frame of a voice signal, the apparatus comprising:
-
a wave detecting device that processes each frame of the voice signal to detect therefrom a plurality of sinusoidal wave components, each of which is identified by a pair of a frequency and an amplitude;
a separating device that separates the detected sinusoidal wave components into a higher frequency group and a lower frequency group at each frame by comparing the frequency of each sinusoidal wave component with a predetermined reference frequency; and
an analyzing device operative at each frame to determine whether the voice signal is placed in the voiced state or the unvoiced state based on an amplitude related to at least one sinusoidal wave component belonging to the higher frequency group. - View Dependent Claims (36, 37)
-
-
38. An apparatus for discriminating between a voiced state and an unvoiced state at each frame of a voice signal having a waveform composed of sinusoidal wave components and oscillating around a zero level with a variable energy, the apparatus comprising:
-
a zero-cross detecting device that detects a zero-cross point at which the waveform of the voice signal crosses the zero level and that counts a number of the zero-cross points detected within each frame;
an energy detecting device that detects the energy of the voice signal per each frame;
a first analyzing device operative at each frame to determine that the voice signal is placed in the unvoiced state, when the counted number of the zero-cross points is equal to or greater than a lower zero-cross threshold and is smaller than an upper zero-cross threshold, and when the detected energy of the voice signal is equal to or greater than a lower energy threshold and is smaller than an upper energy threshold;
a wave detecting device that processes each frame of the voice signal to detect therefrom a plurality of sinusoidal wave components, each of which is identified by a pair of a frequency and an amplitude;
a separating device that separates the detected sinusoidal wave components into a higher frequency group and a lower frequency group at each frame by comparing the frequency of each sinusoidal wave component with a predetermined reference frequency; and
a second analyzing device operative at each frame when the first analyzing device does not determine that the voice signal is placed in the unvoiced state for determining whether the voice signal is placed in the voiced state or the unvoiced state based on an amplitude related to at least one sinusoidal wave component belonging to the higher frequency group. - View Dependent Claims (39)
-
-
40. A method of converting an input voice signal into an output voice signal according to a target voice signal, the method comprising the steps of:
-
providing the input voice signal composed of an original sinusoidal component and an original residual component other than the original sinusoidal component;
extracting original attribute data from at least the sinusoidal component of the input voice signal, the original attribute data being characteristic of the input voice signal;
synthesizing new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of a target sinusoidal component and a target residual component other than the sinusoidal component, the target attribute data being derived from at least the target sinusoidal component; and
producing the output voice signal based on the new attribute data and either of the original residual component and the target residual component.
-
-
41. A method of converting an input voice signal into an output voice signal according to a target voice signal, the method comprising the steps of:
-
providing the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components;
separating the original sinusoidal components and the original residual components from each other;
modifying the original sinusoidal components based on target sinusoidal components contained in the target voice signal so as to form new sinusoidal components having a first pitch;
modifying the original residual components based on target residual components contained in the target voice signal other than the target sinusoidal components so as to form new residual components having a second pitch;
shaping the new residual components by removing therefrom a fundamental tone corresponding to the second pitch and overtones of the fundamental tone; and
combining the new sinusoidal components and the shaped new residual components with each other so as to produce the output voice signal having the first pitch. - View Dependent Claims (42)
-
-
43. A method of converting an input voice signal into an output voice signal according to a target voice signal, the method comprising the steps of:
-
providing the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components;
separating the original sinusoidal components and the original residual components from each other;
modifying the original sinusoidal components based on target sinusoidal components contained in the target voice signal so as to form new sinusoidal components;
modifying the original residual components based on target residual components contained in the target voice signal other than the target sinusoidal components so as to form new residual components;
shaping the new residual components by introducing thereinto a fundamental tone and overtones of the fundamental tone corresponding to a desired pitch; and
combining the new sinusoidal components and the shaped new residual components with each other so as to produce the output voice signal. - View Dependent Claims (44)
-
-
45. A method of converting an input voice signal into an output voice signal by modifying a spectral shape, the method comprising the steps of:
-
providing the input voice signal containing wave components;
separating sinusoidal ones of the wave components from the input voice signal such that each sinusoidal wave component is identified by a pair of a frequency and an amplitude;
computing a spectral shape of the input voice signal based on a set of the separated sinusoidal wave components such that the spectral shape represents an envelope having a series of break points corresponding to the pairs of the frequencies and the amplitudes of the sinusoidal wave components;
modifying the spectral shape to form a new spectral shape having a modified envelope;
selecting a series of points along the modified envelope of the new spectral shape;
generating a set of new sinusoidal wave components each identified by each pair of a frequency and an amplitude, which corresponds to each of the series of the selected points; and
producing the output voice signal based on the set of the new sinusoidal wave components. - View Dependent Claims (46)
-
-
47. A method of converting an input voice signal into an output voice signal dependently on a predetermined pitch of the output voice signal, the method comprising the steps of:
-
providing the input voice signal containing wave components;
separating sinusoidal ones of the wave components from the input voice signal such that each sinusoidal wave component is identified by a pair of a frequency and an amplitude;
computing a modification amount of at least one of the frequency and the amplitude of the separated sinusoidal wave components according to the predetermined pitch of the output voice signal;
modifying at least one of the frequency and the amplitude of the separated sinusoidal wave components by the computed modification amount to thereby form new sinusoidal wave components; and
producing the output voice signal based on the new sinusoidal wave components.
-
-
48. A method of discriminating between a voiced state and an unvoiced state at each frame of a voice signal having a waveform oscillating around a zero level with a variable energy, the method comprising the steps of:
-
detecting a zero-cross point at which the waveform of the voice signal crosses the zero level so as to count a number of the zero-cross points detected within each frame;
detecting the energy of the voice signal per each frame; and
determining at each frame that the voice signal is placed in the unvoiced state, when the counted number of the zero-cross points is equal to or greater than a lower zero-cross threshold and is smaller than an upper zero-cross threshold, and when the detected energy of the voice signal is equal to or greater than a lower energy threshold and Is smaller than an upper energy threshold.
-
-
49. A method of discriminating between a voiced state and an unvoiced state at each frame of a voice signal, the method comprising the steps of:
-
processing each frame of the voice signal to detect therefrom a plurality of sinusoidal wave components, each of which is identified by a pair of a frequency and an amplitude;
separating the detected sinusoidal wave components into a higher frequency group and a lower frequency group at each frame by comparing the frequency of each sinusoidal wave component with a predetermined reference frequency; and
determining at each frame whether the voice signal is placed in the voiced state or the unvoiced state based on an amplitude related to at least one sinusoidal wave component belonging to the higher frequency group.
-
-
50. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of converting an input voice signal into an output voice signal according to a target voice signal, the process comprising the steps of:
-
providing the input voice signal composed of an original sinusoidal component and an original residual component other than the original sinusoidal component;
extracting original attribute data from at least the sinusoidal component of the input voice signal, the original attribute data being characteristic of the input voice signal;
synthesizing new attribute data based on both of the original attribute data derived from the input voice signal and target attribute data being characteristic of the target voice signal composed of a target sinusoidal component and a target residual component other than the sinusoidal component, the target attribute data being derived from at least the target sinusoidal component; and
producing the output voice signal based on the new attribute data and either of the original residual component and the target residual component.
-
-
51. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of converting an input voice signal into an output voice signal according to a target voice signal, the process comprising the steps of:
-
providing the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components;
separating the original sinusoidal components and the original residual components from each other;
modifying the original sinusoidal components based on target sinusoidal components contained in the target voice signal so as to form new sinusoidal components having a first pitch;
modifying the original residual components based on target residual components contained in the target voice signal other than the target sinusoidal components so as to form new residual components having a second pitch;
shaping the new residual components by removing therefrom a fundamental tone corresponding to the second pitch and overtones of the fundamental tone; and
combining the new sinusoidal components and the shaped new residual components with each other so as to produce the output voice signal having the first pitch. - View Dependent Claims (52)
-
-
53. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of converting an input voice signal into an output voice signal according to a target voice signal, the process comprising the steps of:
-
providing the input voice signal composed of original sinusoidal components and original residual components other than the original sinusoidal components;
separating the original sinusoidal components and the original residual components from each other;
modifying the original sinusoidal components based on target sinusoidal components contained in the target voice signal so as to form new sinusoidal components;
modifying the original residual components based on target residual components contained in the target voice signal other than the target sinusoidal components so as to form new residual components;
shaping the new residual components by introducing thereinto a fundamental tone and overtones of the fundamental tone corresponding to a desired pitch; and
combining the new sinusoidal components and the shaped new residual components with each other so as to produce the output voice signal. - View Dependent Claims (54)
-
-
55. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of converting an input voice signal into an output voice signal by modifying a spectral shape, the process comprising the steps of:
-
providing the input voice signal containing wave components;
separating sinusoidal ones of the wave components from the input voice signal such that each sinusoidal wave component is identified by a pair of a frequency and an amplitude;
computing a spectral shape of the input voice signal based on a set of the separated sinusoidal wave components such that the spectral shape represents an envelope having a series of break points corresponding to the pairs of the frequencies and the amplitudes of the sinusoidal wave components;
modifying the spectral shape to form a new spectral shape having a modified envelope;
selecting a series of points along the modified envelope of the new spectral shape;
generating a set of new sinusoidal wave components each identified by each pair of a frequency and an amplitude, which corresponds to each of the series of the selected points; and
producing the output voice signal based on the set of the new sinusoidal wave components. - View Dependent Claims (56)
-
-
57. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of converting an input voice signal into an output voice signal dependently on a predetermined pitch of the output voice signal, the process comprising the steps of:
-
providing the input voice signal containing wave components;
separating sinusoidal ones of the wave components from the input voice signal such that each sinusoidal wave component is identified by a pair of a frequency and an amplitude;
computing a modification amount of at least one of the frequency and the amplitude of the separated sinusoidal wave components according to the predetermined pitch of the output voice signal;
modifying at least one of the frequency and the amplitude of the separated sinusoidal wave components by the computed modification amount to thereby form new sinusoidal wave components; and
producing the output voice signal based on the new sinusoidal wave components.
-
-
58. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of discriminating between a voiced state and an unvoiced state at each frame of a voice signal having a waveform oscillating around a zero level with a variable energy, the process comprising the steps of:
-
detecting a zero-cross point at which the waveform of the voice signal crosses the zero level so as to count a number of the zero-cross points detected within each frame;
detecting the energy of the voice signal per each frame; and
determining at each frame that the voice signal is placed in the unvoiced state, when the counted number of the zero-cross points is equal to or greater than a lower zero-cross threshold and is smaller than an upper zero-cross threshold, and when the detected energy of the voice signal is equal to or greater than a lower energy threshold and is smaller than an upper energy threshold.
-
-
59. A machine readable medium used in a computer machine having a CPU, the medium containing program instructions executable by the CPU to cause the computer machine for performing a process of discriminating between a voiced state and an unvoiced state at each frame of a voice signal, the process comprising the steps of:
-
processing each frame of the voice signal to detect therefrom a plurality of sinusoidal wave components, each of which is identified by a pair of a frequency and an amplitude;
separating the detected sinusoidal wave components into a higher frequency group and a lower frequency group at each frame by comparing the frequency of each sinusoidal wave component with a predetermined reference frequency; and
determining at each frame whether the voice signal is placed in the voiced state or the unvoiced state based on an amplitude related to at least one sinusoidal wave component belonging to the higher frequency group.
-
Specification