Singing voice synthesizing apparatus, singing voice synthesizing method, and program for realizing singing voice synthesizing method
First Claim
1. A singing voice synthesizing apparatus comprising:
- a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component;
an input device that inputs lyrics;
a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics;
a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing;
an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said adjusting device being configured to adjust the stochastic component by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and
a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device.
1 Assignment
0 Petitions
Accused Products
Abstract
A singing voice synthesizing apparatus is provided, which enables achievement of a natural sounding synthesized singing voice with a good level of comprehensibility. A phoneme database stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component. A readout device that reads out from the phoneme database the voice fragment data corresponding to inputted lyrics. A duration time adjusting device adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing. An adjusting device adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch. A synthesizing device synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by the duration time adjusting device and the adjusting device.
56 Citations
25 Claims
-
1. A singing voice synthesizing apparatus comprising:
-
a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component; an input device that inputs lyrics; a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics; a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing; an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said adjusting device being configured to adjust the stochastic component by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A singing voice synthesizing apparatus comprising:
-
a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component; an input device that inputs lyrics; a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics; a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing; an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch; and a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device, wherein; each of said voice fragment data comprises a plurality of data corresponding respectively to a plurality of frames of a frame string formed by segmenting a corresponding one of the voice fragments; the data of the deterministic component and the data of the stochastic component of each of said voice fragment data each comprise a series of frequency domain data corresponding respectively to the plurality of frames of the frame string corresponding to each of the voice fragments; and said duration time adjusting device generates a frame string of a desired time length by repeating a plurality of frames of the frame string corresponding to each of the voice fragments, said duration time adjusting device repeating the plurality of frames in a first direction in which the frame string of a desired time length is generated and in a second direction opposite thereto. - View Dependent Claims (12)
-
-
13. A singing voice synthesizing apparatus comprising:
-
a phoneme database that stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component; an input device that inputs lyrics; a readout device that reads out from said phoneme database the voice fragment data corresponding to the inputted lyrics; a duration time adjusting device that adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing; an adjusting device that adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch; and a synthesizing device that synthesizes a singing sound by sequentially concatenating the voice fragment data that have been adjusted by said duration time adjusting device and said adjusting device, wherein; each of said voice fragment data comprises a plurality of data corresponding respectively to a plurality of frames of a frame string formed by segmenting a corresponding one of the voice fragments; the data of the deterministic component and the data of the stochastic component of each of said voice fragment data each comprise a series of frequency domain data corresponding respectively to the plurality of frames of the frame string corresponding to each of the voice fragments; and said phoneme database stores voice fragment data comprising elongated sounds that are each enunciated by elongating a single phoneme, said phoneme database further storing a flat spectrum as an amplitude spectrum of the stochastic component of each of the voice fragment data comprising each of the elongated sounds, obtained by multiplying the amplitude spectrum thereof by an inverse of a typical spectrum within an interval of the elongated sound. - View Dependent Claims (14, 15, 16)
-
-
17. A singing voice synthesizing method comprising the steps of:
-
storing in a phoneme database a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of said plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component; reading out from said phoneme database the voice fragment data corresponding to lyrics inputted by an input device; adjusting time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing; adjusting the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said stochastic component being adjusted by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and synthesizing a singing sound by sequentially concatenating the voice fragment data that have been adjusted in respect of the time duration and the deterministic component and the stochastic component thereof. - View Dependent Claims (18, 19)
-
-
20. A program for causing a computer to execute a singing voice synthesizing method comprising the steps of:
-
storing in a phoneme database a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of said plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component; reading out from said phoneme database the voice fragment data corresponding to lyrics inputted by an input device; adjusting time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing; adjusting the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said stochastic component being adjusted by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and synthesizing a singing sound by sequentially concatenating the voice fragment data that have been adjusted in respect of the time duration and the deterministic component and the stochastic component thereof. - View Dependent Claims (21, 22)
-
-
23. A mechanically readable storage medium storing instructions for causing a machine to execute a singing voice synthesizing method comprising the steps of:
-
storing in a phoneme database a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of said plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component; reading out from said phoneme database the voice fragment data corresponding to lyrics inputted by an input device; adjusting time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing; adjusting the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch, said stochastic component being adjusted by varying a low frequency region of an amplitude spectrum of the stochastic component according to the desired pitch; and synthesizing a singing sound by sequentially concatenating the voice fragment data that have been adjusted in respect of the time duration and the deterministic component and the stochastic component thereof. - View Dependent Claims (24, 25)
-
Specification