Singing voice synthesizing method
First Claim
1. A singing voice synthesizing method, comprising the steps of:
- (a) detecting a frequency spectrum by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit of a voice to be synthesized;
(b) detecting a plurality of local peaks of a spectrum intensity on the frequency spectrum;
(c) designating, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generating amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region;
(d) generating phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
(e) designating a pitch for the voice to be synthesized;
(f) adjusting, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
(g) adjusting, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
(h) converting the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
1 Assignment
0 Petitions
Accused Products
Abstract
A frequency spectrum is detected by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit formed of a phoneme or a phonemic chain. Local peaks are detected on the frequency spectrum, and spectrum distribution regions including the local peaks are designated. For each spectrum distribution region, amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis and phase spectrum data representing a phase spectrum distribution depending on the frequency axis are generated. The amplitude spectrum data is adjusted to move the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis based on an input note pitch, and the phase spectrum data is adjusted corresponding to the adjustment. Spectrum intensities are adjusted to be along with a spectrum envelope corresponding to a desired tone color. The adjusted amplitude and phase spectrum data are converted into a synthesized voice signal.
-
Citations
17 Claims
-
1. A singing voice synthesizing method, comprising the steps of:
-
(a) detecting a frequency spectrum by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit of a voice to be synthesized; (b) detecting a plurality of local peaks of a spectrum intensity on the frequency spectrum; (c) designating, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generating amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region; (d) generating phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; (e) designating a pitch for the voice to be synthesized; (f) adjusting, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; (g) adjusting, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and (h) converting the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
2. A singing voice synthesizing method, comprising the steps of:
-
(a) obtaining amplitude spectrum data and phase spectrum data corresponding to a voice synthesis unit of a voice to be synthesized, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of the voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; (b) designating a pitch for the voice to be synthesized; (c) adjusting, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; (d) adjusting, for each said spectrum distribution regions, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and (e) converting the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
-
-
9. A singing voice synthesizing apparatus, comprising:
-
a designating device that designates a voice synthesis unit and a pitch for a voice to be synthesized; a reading device that reads voice waveform data representing a waveform corresponding to the voice synthesis unit as voice synthesis unit data from a voice synthesis unit database; a first detecting device that detects a frequency spectrum by analyzing a frequency of the voice waveform represented by the voice waveform data; a second detecting device that detects a plurality of local peaks of a spectrum intensity on the frequency spectrum; a first generating device that designates, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generates amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region; a second generating device that generates phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; a first adjusting device that adjusts, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; a second adjusting device that adjusts, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and a converting device that converts the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region. - View Dependent Claims (11, 12)
-
-
10. A singing voice synthesizing apparatus, comprising:
-
a designating device that designates a voice synthesis unit and a pitch for a voice to be synthesized; a reading device that reads amplitude spectrum data and phase spectrum data corresponding to the voice synthesis unit as voice synthesis unit data from a voice synthesis unit database, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of the voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; a first adjusting device that adjusts, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; a second adjusting device that adjusts, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and a converting device that converts the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
-
-
13. A singing voice synthesizing apparatus, comprising:
-
a designating device that designates a voice synthesis unit and a pitch for each of the voices to be sequentially synthesized; a reading device that reads voice waveform data corresponding to each voice synthesis unit designated by the designating device from a voice synthesis unit database; a first detecting device that detects a frequency spectrum by analyzing a frequency of the voice waveform corresponding to each voice waveform; a second detecting device that detects a plurality of local peaks of a spectrum intensity on the frequency spectrum corresponding to each said voice waveform; a first generating device that designates, for each of the plurality of the local peaks for each said voice synthesis unit, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generates amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region; a second generating device that generates phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region of each said voice synthesis unit; a first adjusting device that adjusts, for each said spectrum distribution region of each said voice synthesis unit, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; a second adjusting device that adjusts, for each said spectrum distribution region of each said voice synthesis unit, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; a first connecting device that connects the adjusted amplitude spectrum data to connect sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the spectrum intensities are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units; a second connecting device that connects the adjusted phase spectrum data to connect the sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the phases are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units; and a converting device that converts the connected amplitude spectrum data and the connected phase spectrum data into a synthesized voice signal of a time region.
-
-
14. A singing voice synthesizing apparatus, comprising:
-
a designating device that designates a voice synthesis unit and a pitch for each voice to be sequentially synthesized; a reading device that reads voice waveform data corresponding to each voice synthesis unit designated by the designating device from a voice synthesis unit database, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of each said voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; a first adjusting device that adjusts, for each said spectrum distribution region of each said voice synthesis unit, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; a second adjusting device that adjusts, for each said spectrum distribution regions of each said voice synthesis unit, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; a first connecting device that connects the adjusted amplitude spectrum data to connect sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the spectrum intensities are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units; a second connecting device that connects the adjusted phase spectrum data to connect the sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the phases are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units; and a converting device that converts the connected amplitude spectrum data and the connected phase spectrum data into a synthesized voice signal of a time region.
-
-
15. A storage medium storing a program for a singing voice synthesizing apparatus, the program when executed causes a computer to:
-
(a) detect a frequency spectrum by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit of a voice to be synthesized; (b) detect a plurality of local peaks of a spectrum intensity on the frequency spectrum; (c) designate, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generating amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region; (d) generate phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; (e) designate a pitch for the voice to be synthesized; (f) adjust, for each said spectrum distribution regions, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; (g) adjust, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and (h) convert the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
-
-
16. A storage medium storing a program for a singing voice synthesizing apparatus, the program when executed causes a computer to:
-
(a) obtain amplitude spectrum data and phase spectrum data corresponding to a voice synthesis unit of a voice to be synthesized, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of the voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; (b) designate a pitch for the voice to be synthesized; (c) adjust, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch; (d) adjust, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and (e) convert the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
-
-
17. A singing voice synthesizing apparatus, comprising:
-
a reading device that reads voice waveform data representing a waveform corresponding to a voice synthesis unit as voice synthesis unit data from a voice synthesis unit database; a first detecting device that detects a frequency spectrum by analyzing a freguency of the voice waveform represented by the voice waveform data; a second detecting device that detects a plurality of local peaks of a spectrum intensity on the frequency spectrum; a first generating device that designates, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generates amplitude spectrum data representing an amplitude spectrum distribution depending on a freguency axis for each spectrum distribution region; a second generating device that generates phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; and a database for storing the amplitude spectrum data and the phase spectrum data corresponding to the voice synthesis unit of the voice to be synthesized.
-
Specification