Apparatus and method for creating singing synthesizing database, and pitch curve generation apparatus and method
First Claim
1. A singing synthesizing database creation apparatus comprising:
- an input section to which are input learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
a pitch extraction section which analyzes the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
a separation section which analyzes the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separates the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
a first learning section which generates, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, and which stores, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
a second learning section which generates, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, and which stores, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.
1 Assignment
0 Petitions
Accused Products
Abstract
Variation over time in fundamental frequency in singing voices is separated into a melody-dependent component and a phoneme-dependent component, modeled for each of the components and stored into a singing synthesizing database. In execution of singing synthesis, a pitch curve indicative of variation over time in fundamental frequency of the melody is synthesized in accordance with an arrangement of notes represented by a singing synthesizing score and the melody-dependent component, and the pitch curve is corrected, for each of pitch curve sections corresponding to phonemes constituting lyrics, using a phoneme-dependent component model corresponding to the phoneme. Such arrangements can accurately model a singing expression, unique to a singing person and appearing in a melody singing style of the person, while taking into account phoneme-dependent pitch variation, and thereby permits synthesis of singing voices that sound more natural.
42 Citations
10 Claims
-
1. A singing synthesizing database creation apparatus comprising:
-
an input section to which are input learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes; a pitch extraction section which analyzes the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices; a separation section which analyzes the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separates the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics; a first learning section which generates, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, and which stores, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and a second learning section which generates, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, and which stores, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A singing synthesizing database creation method comprising:
-
a step of inputting learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes; a step of analyzing the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices; a step of analyzing the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separating the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics; a first learning step of generating, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, said first learning step storing, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and a second learning step of generating, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, said second learning step storing, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.
-
-
10. A non-transitory computer-readable storage medium containing a program for causing a computer to perform a singing synthesizing database creation method, said singing synthesizing database creation method:
-
a step of inputting learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes; a step of analyzing the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices; a step of analyzing the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separating the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics; a first learning step of generating, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, said first learning step storing, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and a second learning step of generating, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, said second learning step storing, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.
-
Specification