Singing voice synthesizing method

US 6,992,245 B2
Filed: 02/27/2003
Issued: 01/31/2006
Est. Priority Date: 02/27/2002
Status: Active Grant

First Claim

Patent Images

1. A singing voice synthesizing method, comprising the steps of:

(a) detecting a frequency spectrum by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit of a voice to be synthesized;

(b) detecting a plurality of local peaks of a spectrum intensity on the frequency spectrum;

(c) designating, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generating amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region;

(d) generating phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;

(e) designating a pitch for the voice to be synthesized;

(f) adjusting, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;

(g) adjusting, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and

(h) converting the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A frequency spectrum is detected by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit formed of a phoneme or a phonemic chain. Local peaks are detected on the frequency spectrum, and spectrum distribution regions including the local peaks are designated. For each spectrum distribution region, amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis and phase spectrum data representing a phase spectrum distribution depending on the frequency axis are generated. The amplitude spectrum data is adjusted to move the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis based on an input note pitch, and the phase spectrum data is adjusted corresponding to the adjustment. Spectrum intensities are adjusted to be along with a spectrum envelope corresponding to a desired tone color. The adjusted amplitude and phase spectrum data are converted into a synthesized voice signal.

Citations

17 Claims

1. A singing voice synthesizing method, comprising the steps of:
- (a) detecting a frequency spectrum by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit of a voice to be synthesized;
  
  (b) detecting a plurality of local peaks of a spectrum intensity on the frequency spectrum;
  
  (c) designating, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generating amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region;
  
  (d) generating phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  (e) designating a pitch for the voice to be synthesized;
  
  (f) adjusting, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  (g) adjusting, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
  
  (h) converting the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
- View Dependent Claims (3, 4, 5, 6, 7, 8)
- - 3. A singing voice synthesizing method according to claim 1, wherein the pitch designating step (e) designates the pitch in accordance with pitch throb data representing a variation of the pitch in a time sequence.
  - 4. A singing voice synthesizing method according to claim 3, wherein the pitch throb data corresponds to a control parameter for controlling a musical expression of the voice to be synthesized.
  - 5. A singing voice synthesizing method according to claim 1, wherein the amplitude spectrum data adjusting step (f) adjusts the spectrum intensity of the local peak that is not along with a spectrum envelope corresponding to a line connecting each, of the plurality of the local peaks before the adjustment to be along with the spectrum envelope.
  - 6. A singing voice synthesizing method according to claim 1, wherein the amplitude spectrum data adjusting step (f) adjusts intensity of the local peak that is not along with a predetermined spectrum envelope to be along with the predetermined spectrum envelope.
  - 7. A singing voice synthesizing method according to claim 5, wherein the amplitude spectrum data adjusting step (f) sets the spectrum envelope that varies in a time sequence by adjusting the intensity in accordance with spectrum envelope throb data representing a variation of the spectrum envelope for a time sequence for sequential time frames.
  - 8. A singing voice synthesizing method according to claim 7, wherein the spectrum envelope throb data corresponds to a control parameter for controlling a musical expression of the voice to be synthesized.

2. A singing voice synthesizing method, comprising the steps of:
- (a) obtaining amplitude spectrum data and phase spectrum data corresponding to a voice synthesis unit of a voice to be synthesized, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of the voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  (b) designating a pitch for the voice to be synthesized;
  
  (c) adjusting, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  (d) adjusting, for each said spectrum distribution regions, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
  
  (e) converting the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.

9. A singing voice synthesizing apparatus, comprising:
- a designating device that designates a voice synthesis unit and a pitch for a voice to be synthesized;
  
  a reading device that reads voice waveform data representing a waveform corresponding to the voice synthesis unit as voice synthesis unit data from a voice synthesis unit database;
  
  a first detecting device that detects a frequency spectrum by analyzing a frequency of the voice waveform represented by the voice waveform data;
  
  a second detecting device that detects a plurality of local peaks of a spectrum intensity on the frequency spectrum;
  
  a first generating device that designates, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generates amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region;
  
  a second generating device that generates phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  a first adjusting device that adjusts, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  a second adjusting device that adjusts, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
  
  a converting device that converts the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.
- View Dependent Claims (11, 12)
- - 11. A singing voice synthesizing apparatus according to claim 9, whereinthe designating device designates a control parameter for controlling a musical expression of the voice to be synthesized, andthe reading device reads voice synthesis unit data corresponding to the voice synthesis unit and the control parameter.
  - 12. A singing voice synthesizing apparatus according to claim 9, whereinthe designating device designates at least one of a note length or a tempo for the voice to be synthesized, andthe reading device continues to read the voice synthesis unit data for a time corresponding to at least one the note length or the tempo by omitting a part of or repeating a part or whole of the voice synthesis unit data.

10. A singing voice synthesizing apparatus, comprising:
- a designating device that designates a voice synthesis unit and a pitch for a voice to be synthesized;
  
  a reading device that reads amplitude spectrum data and phase spectrum data corresponding to the voice synthesis unit as voice synthesis unit data from a voice synthesis unit database, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of the voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  a first adjusting device that adjusts, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  a second adjusting device that adjusts, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
  
  a converting device that converts the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.

13. A singing voice synthesizing apparatus, comprising:
- a designating device that designates a voice synthesis unit and a pitch for each of the voices to be sequentially synthesized;
  
  a reading device that reads voice waveform data corresponding to each voice synthesis unit designated by the designating device from a voice synthesis unit database;
  
  a first detecting device that detects a frequency spectrum by analyzing a frequency of the voice waveform corresponding to each voice waveform;
  
  a second detecting device that detects a plurality of local peaks of a spectrum intensity on the frequency spectrum corresponding to each said voice waveform;
  
  a first generating device that designates, for each of the plurality of the local peaks for each said voice synthesis unit, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generates amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region;
  
  a second generating device that generates phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region of each said voice synthesis unit;
  
  a first adjusting device that adjusts, for each said spectrum distribution region of each said voice synthesis unit, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  a second adjusting device that adjusts, for each said spectrum distribution region of each said voice synthesis unit, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data;
  
  a first connecting device that connects the adjusted amplitude spectrum data to connect sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the spectrum intensities are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units;
  
  a second connecting device that connects the adjusted phase spectrum data to connect the sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the phases are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units; and
  
  a converting device that converts the connected amplitude spectrum data and the connected phase spectrum data into a synthesized voice signal of a time region.

14. A singing voice synthesizing apparatus, comprising:
- a designating device that designates a voice synthesis unit and a pitch for each voice to be sequentially synthesized;
  
  a reading device that reads voice waveform data corresponding to each voice synthesis unit designated by the designating device from a voice synthesis unit database, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of each said voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  a first adjusting device that adjusts, for each said spectrum distribution region of each said voice synthesis unit, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  a second adjusting device that adjusts, for each said spectrum distribution regions of each said voice synthesis unit, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data;
  
  a first connecting device that connects the adjusted amplitude spectrum data to connect sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the spectrum intensities are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units;
  
  a second connecting device that connects the adjusted phase spectrum data to connect the sequential voice synthesis units respectively corresponding to the voices to be sequentially synthesized in a pronunciation order, wherein the phases are adjusted to be agreed or approximately agreed with each another at connection points of the sequential voice synthesis units; and
  
  a converting device that converts the connected amplitude spectrum data and the connected phase spectrum data into a synthesized voice signal of a time region.

15. A storage medium storing a program for a singing voice synthesizing apparatus, the program when executed causes a computer to:
- (a) detect a frequency spectrum by analyzing a frequency of a voice waveform corresponding to a voice synthesis unit of a voice to be synthesized;
  
  (b) detect a plurality of local peaks of a spectrum intensity on the frequency spectrum;
  
  (c) designate, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generating amplitude spectrum data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region;
  
  (d) generate phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  (e) designate a pitch for the voice to be synthesized;
  
  (f) adjust, for each said spectrum distribution regions, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  (g) adjust, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
  
  (h) convert the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.

16. A storage medium storing a program for a singing voice synthesizing apparatus, the program when executed causes a computer to:
- (a) obtain amplitude spectrum data and phase spectrum data corresponding to a voice synthesis unit of a voice to be synthesized, wherein the amplitude spectrum data is data representing an amplitude spectrum distribution depending on a frequency axis for each spectrum distribution region for each of a plurality of local peaks of a spectrum intensity including the local peak and spectrums therebefore and thereafter in a frequency spectrum obtained by a frequency analysis of a voice waveform of the voice synthesis unit, and the phase spectrum data is data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region;
  
  (b) designate a pitch for the voice to be synthesized;
  
  (c) adjust, for each said spectrum distribution region, the amplitude spectrum data by moving the amplitude spectrum distribution represented by the amplitude spectrum data along the frequency axis in accordance with the pitch;
  
  (d) adjust, for each said spectrum distribution region, the phase spectrum distribution represented by the phase spectrum data in accordance with the adjustment of the amplitude spectrum data; and
  
  (e) convert the adjusted amplitude spectrum data and the adjusted phase spectrum data into a synthesized voice signal of a time region.

17. A singing voice synthesizing apparatus, comprising:
- a reading device that reads voice waveform data representing a waveform corresponding to a voice synthesis unit as voice synthesis unit data from a voice synthesis unit database;
  
  a first detecting device that detects a frequency spectrum by analyzing a freguency of the voice waveform represented by the voice waveform data;
  
  a second detecting device that detects a plurality of local peaks of a spectrum intensity on the frequency spectrum;
  
  a first generating device that designates, for each of the plurality of the local peaks, a spectrum distribution region including the local peak and spectrums therebefore and thereafter on the frequency spectrum and generates amplitude spectrum data representing an amplitude spectrum distribution depending on a freguency axis for each spectrum distribution region;
  
  a second generating device that generates phase spectrum data representing a phase spectrum distribution depending on the frequency axis for each said spectrum distribution region; and
  
  a database for storing the amplitude spectrum data and the phase spectrum data corresponding to the voice synthesis unit of the voice to be synthesized.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
Loscos, Alex, Bonada, Jordi, Kenmochi, Hideki
Primary Examiner(s)
Fletcher, Marlon T.

Application Number

US10/375,420
Publication Number

US 20030221542A1
Time in Patent Office

1,069 Days
Field of Search

84602-606, 84609-612, 84622-627, 84649-652, 84659-663
US Class Current

84/622
CPC Class Codes

G10H 2240/056   MIDI or other note-oriented...

G10H 2240/311   MIDI transmission G10H2240/...

G10H 2250/235   Fourier transform; Discrete...

G10H 2250/455   Gensound singing voices, i....

G10H 7/002   using a common processing f...

G10L 13/02   Methods for producing synth...

Singing voice synthesizing method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Singing voice synthesizing method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links