Singing voice synthesizing apparatus, singing voice synthesizing method, and program for realizing singing voice synthesizing method

US 7,016,841 B2
Filed: 12/27/2001
Issued: 03/21/2006
Est. Priority Date: 12/28/2000
Status: Active Grant

First Claim

Patent Images

1. A singing voice synthesizing apparatus comprising:

a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;

an input device that inputs lyrics;

a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics;

a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;

an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said adjusting device being configured to adjust the stochastic component by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and

a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A singing voice synthesizing apparatus is provided, which enables achievement of a natural sounding synthesized singing voice with a good level of comprehensibility. A phoneme database stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component. A readout device that reads out from the phoneme database the voice fragment data corresponding to inputted lyrics. A duration time adjusting device adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing. An adjusting device adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch. A synthesizing device synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by the duration time adjusting device and the adjusting device.

56 Citations

View as Search Results

25 Claims

1. A singing voice synthesizing apparatus comprising:
- a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
  
  an input device that inputs lyrics;
  
  a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics;
  
  a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
  
  an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said adjusting device being configured to adjust the stochastic component by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and
  
  a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A singing voice synthesizing apparatus according to claim 1, wherein said phoneme database stores a plurality of voice fragment data having different musical expressions for a single phoneme or phoneme chain.
  - 3. A singing voice synthesizing apparatus according to claim 2, wherein said musical expressions include at least one parameter selected from the group consisting of pitch, dynamics and tempo.
  - 4. A singing voice synthesizing apparatus according to claim 1, wherein said phoneme database stores voice fragment data comprising elongated sounds that are each enunciated by elongating a single phoneme, voice fragment data comprising consonant-to-vowel phoneme chains and vowel-to-consonant phoneme chains, voice fragment data comprising consonant-to-consonant phoneme chains, and voice fragment data comprising vowel-to-vowel phoneme chains.
  - 5. A singing voice synthesizing apparatus according to claim 1, wherein each of said voice fragment data comprises a plurality of data corresponding respectively to a plurality of frames of a frame string formed by segmenting a corresponding one of the voice fragments, and wherein the data of the deterministic component and the data of the stochastic component of each of said voice fragment data each comprise a series of frequency domain data corresponding respectively to the plurality of frames of the frame string corresponding to each of the voice fragments.
  - 6. A singing voice synthesizing apparatus according to claim 5, wherein said duration time adjusting device generates a frame string of a desired time length by repeating at least one frame of the plurality of frames of the frame string corresponding to each of the voice fragments, or by thinning out a predetermined number of frames of the plurality of frames of the frame string corresponding to each of the voice fragments.
  - 7. A singing voice synthesizing apparatus according to claim 5, further comprising a deterministic component generating device that changes only pitch of the deterministic component to a desired pitch while preserving the spectral envelope shape of the deterministic component contained in each of the voice fragment data when the voice fragment data are sequentially concatenated by said synthesizing device.
  - 8. A singing voice synthesizing apparatus according to claim 1, further comprising a fragment level adjusting device that performs smoothing processing or level adjusting processing on the deterministic component and the stochastic component contained in each of the voice fragment data when the voice fragment data are sequentially concatenated by said synthesizing device.
  - 9. A singing voice synthesizing apparatus according to claim 1, wherein said adjusting device adjusts the stochastic component by using an original amplitude spectrum for a high frequency region of the amplitude spectrum of the stochastic component.
  - 10. A singing voice synthesizing apparatus according to claim 1, wherein said adjusting device varies the low frequency region of the amplitude spectrum by compressing or expanding a frequency axis for the low frequency region of the amplitude spectrum of the stochastic component according to the desired pitch, with a general shape of the amplitude spectrum preserved.

11. A singing voice synthesizing apparatus comprising:
- a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
  
  an input device that inputs lyrics;
  
  a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics;
  
  a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
  
  an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch; and
  
  a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device,wherein;
  
  each of said voice fragment data comprises a plurality of data corresponding respectively to a plurality of frames of a frame string formed by segmenting a corresponding one of the voice fragments;
  
  the data of the deterministic component and the data of the stochastic component of each of said voice fragment data each comprise a series of frequency domain data corresponding respectively to the plurality of frames of the frame string corresponding to each of the voice fragments; and
  
  said duration time adjusting device generates a frame string of a desired time length by repeating a plurality of frames of the frame string corresponding to each of the voice fragments, said duration time adjusting device repeating the plurality of frames in a first direction in which the frame string of a desired time length is generated and in a second direction opposite thereto.
- View Dependent Claims (12)
- - 12. A singing voice synthesizing apparatus according to claim 11, wherein when repeating the plurality of frames of the frame string corresponding to the data of the stochastic component of each of the voice fragments in the first and second directions, said duration time adjusting device reverses a phase of a phase spectrum of the stochastic component.

13. A singing voice synthesizing apparatus comprising:
- a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
  
  an input device that inputs lyrics;
  
  a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics;
  
  a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
  
  an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch; and
  
  a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device,wherein;
  
  each of said voice fragment data comprises a plurality of data corresponding respectively to a plurality of frames of a frame string formed by segmenting a corresponding one of the voice fragments;
  
  the data of the deterministic component and the data of the stochastic component of each of said voice fragment data each comprise a series of frequency domain data corresponding respectively to the plurality of frames of the frame string corresponding to each of the voice fragments; and
  
  said phoneme database stores voice fragment data comprising elongated sounds that are each enunciated by elongating a single phoneme, said phoneme database further storing a flat spectrum as an amplitude spectrum of the stochastic component of each of the voice fragment data comprising each of the elongated sounds, obtained by multiplying the amplitude spectrum thereof by an inverse of a typical spectrum within an interval of the elongated sound.
- View Dependent Claims (14, 15, 16)
- - 14. A singing voice synthesizing apparatus according to claim 13, wherein the amplitude spectrum of the stochastic component of each of the voice fragment data comprising each of the elongated sounds is obtained by multiplying an amplitude spectrum of the stochastic component calculated based on an amplitude spectrum of the deterministic component of the voice fragment data of the elongated sound, by the flat spectrum.
  - 15. A singing voice synthesizing apparatus according to claim 14, wherein said phoneme database does not store amplitude spectra of stochastic components of voice fragment data comprising certain elongated sounds, and the flat spectrum stored as an amplitude spectrum of voice fragment data comprising at least one other elongated sound is used for synthesis of the certain sounds.
  - 16. A singing voice synthesizing apparatus according to claim 14, wherein the amplitude spectrum of the stochastic component calculated based on the amplitude spectrum of the deterministic component has a gain thereof at 0 Hz controlled according to a parameter for controlling a degree of huskiness.

17. A singing voice synthesizing method comprising the steps of:
- storing in a phoneme database a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of said plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
  
  reading out from said phoneme database the voice fragment data corresponding to lyrics inputted by an input device;
  
  adjusting time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
  
  adjusting the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said stochastic component being adjusted by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and
  
  synthesizing a singing sound by sequentially concatenating the voice fragment data that have been adjusted in respect of the time duration and the deterministic component and the stochastic component thereof.
- View Dependent Claims (18, 19)
- - 18. A singing voice synthesizing method according to claim 17, wherein, in said step of adjusting the deterministic and stochastic components, the stochastic component is adjusted by using an original amplitude spectrum for a high frequency region of the amplitude spectrum of the stochastic component.
  - 19. A singing voice synthesizing method according to claim 17, wherein, in said step of adjusting the deterministic and stochastic components, the low frequency region of the amplitude spectrum is varied by compressing or expanding a frequency axis for the low frequency region of the amplitude spectrum of the stochastic component according to the desired pitch, with a general shape of the amplitude spectrum preserved.

20. A program for causing a computer to execute a singing voice synthesizing method comprising the steps of:
- storing in a phoneme database a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of said plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
  
  reading out from said phoneme database the voice fragment data corresponding to lyrics inputted by an input device;
  
  adjusting time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
  
  adjusting the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said stochastic component being adjusted by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and
  
  synthesizing a singing sound by sequentially concatenating the voice fragment data that have been adjusted in respect of the time duration and the deterministic component and the stochastic component thereof.
- View Dependent Claims (21, 22)
- - 21. A program for causing a computer to execute a singing voice synthesizing method according to claim 20, wherein, in said step of adjusting the deterministic and stochastic components, the stochastic component is adjusted by using an original amplitude spectrum for a high frequency region of the amplitude spectrum of the stochastic component.
  - 22. A program for causing a computer to execute a singing voice synthesizing method according to claim 20, wherein, in said step of adjusting the deterministic and stochastic components, the low frequency region of the amplitude spectrum is varied by compressing or expanding a frequency axis for the low frequency region of the amplitude spectrum of the stochastic component according to the desired pitch, with a general shape of the amplitude spectrum preserved.

23. A mechanically readable storage medium storing instructions for causing a machine to execute a singing voice synthesizing method comprising the steps of:
- storing in a phoneme database a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of said plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
  
  reading out from said phoneme database the voice fragment data corresponding to lyrics inputted by an input device;
  
  adjusting time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
  
  adjusting the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said stochastic component being adjusted by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and
  
  synthesizing a singing sound by sequentially concatenating the voice fragment data that have been adjusted in respect of the time duration and the deterministic component and the stochastic component thereof.
- View Dependent Claims (24, 25)
- - 24. A mechanically readable storage medium storing instructions for causing a machine to execute a singing voice synthesizing method according to claim 23, wherein, in said step of adjusting the deterministic and stochastic components, the stochastic component is adjusted by using an original amplitude spectrum for a high frequency region of the amplitude spectrum of the stochastic component.
  - 25. A mechanically readable storage medium storing instructions for causing a machine to execute a singing voice synthesizing method according to claim 23, wherein, in said step of adjusting the deterministic and stochastic components, the low frequency region of the amplitude spectrum is varied by compressing or expanding a frequency axis for the low frequency region of the amplitude spectrum of the stochastic component according to the desired pitch, with a general shape of the amplitude spectrum preserved.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
Bonada, Jordi, Serra, Xavier, Kenmochi, Hideki
Primary Examiner(s)
Lerner, Martin

Application Number

US10/034,359
Publication Number

US 20030009336A1
Time in Patent Office

1,545 Days
Field of Search

704/201, 704/203, 704/205, 704/206, 704/207, 704/211, 704/221, 704/258, 704/266, 704/267, 704/268, 704/269, 846/04, 846/09, 846/22, 846/23, 846/24, 846/27
US Class Current

704/258
CPC Class Codes

G10L 13/07 Concatenation rules

Singing voice synthesizing apparatus, singing voice synthesizing method, and program for realizing singing voice synthesizing method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

56 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Singing voice synthesizing apparatus, singing voice synthesizing method, and program for realizing singing voice synthesizing method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links