Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method

US 20110004476A1
Filed: 07/01/2010
Published: 01/06/2011
Est. Priority Date: 07/02/2009
Status: Active Grant

First Claim

Patent Images

1. A singing synthesizing database creation apparatus comprising:

an input section to which are input learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;

a pitch extraction section which analyzes the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;

a separation section which analyzes the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separates the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;

a first learning section which generates, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, and which stores, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and

a second learning section which generates, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, and which stores, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Variation over time in fundamental frequency in singing voices is separated into a melody-dependent component and a phoneme-dependent component, modeled for each of the components and stored into a singing synthesizing database. In execution of singing synthesis, a pitch curve indicative of variation over time in fundamental frequency of the melody is synthesized in accordance with an arrangement of notes represented by a singing synthesizing score and the melody-dependent component, and the pitch curve is corrected, for each of pitch curve sections corresponding to phonemes constituting lyrics, using a phoneme-dependent component model corresponding to the phoneme. Such arrangements can accurately model a singing expression, unique to a singing person and appearing in a melody singing style of the person, while taking into account phoneme-dependent pitch variation, and thereby permits synthesis of singing voices that sound more natural.

43 Citations

View as Search Results

14 Claims

1. A singing synthesizing database creation apparatus comprising:
- an input section to which are input learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
  
  a pitch extraction section which analyzes the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
  
  a separation section which analyzes the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separates the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
  
  a first learning section which generates, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, and which stores, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
  
  a second learning section which generates, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, and which stores, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said second learning sectionsegments the phoneme-dependent component data into data sections corresponding to individual ones of the phonemes of the lyrics included in the learning score data,executes, for each of the segmented data sections, a predetermined machine learning algorithm using individual phonemes included in the learning score data and the phoneme-dependent component, andas a result of the machine learning, generates, for each individual unique phoneme, phoneme-dependent component parameters defining a phoneme-dependent component model that represents, with a highest probability, pitch variation represented by the phoneme-dependent component data, andwherein the phoneme-dependent component parameters generated by said second learning section are associated with the phoneme identifier uniquely identifying the unique phoneme.
  - 3. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said first learning sectionsegments the melody component data into a plurality of data sections in such a manner that one or more notes are contained in each of the segmented data sections,executes, for each of the segmented data sections, a predetermined machine learning algorithm using the melody component data and the learning score data corresponding to the data section, andas a result of the machine learning, generates, in association with a combination of the notes in each individual one of the data sections, the melody component parameters that define a melody component model for the data section, andwherein the melody component parameters defining the melody component model are associated with one or more said identifiers each indicative of the combination of notes.
  - 4. The singing synthesizing database creation apparatus as claimed in claim 1, wherein the predetermined machine learning includes executing a Baum-Welch algorithm.
  - 5. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said separation section extracts, from the pitch data, melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and extracts the phoneme-dependent component data on the basis of a difference between the pitch data and the extracted melody component data.
  - 6. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said input section, as the learning waveform data, a plurality of sets of learning waveform data representative of sound waveforms of respective singing voices of a plurality of singing persons, andsaid first learning section classifies melody component parameters, generated on the basis of respective ones of the sets of learning waveform data, according to the singing persons and stores the classified melody component parameters into the singing synthesizing database.
  - 7. The singing synthesizing database creation apparatus as claimed in claim 6, wherein said second learning section classifies phoneme-dependent component parameters, generated on the basis of the respective sets of learning waveform data, according to the singing persons and stores the classified phoneme-dependent component parameters into the singing synthesizing database.
  - 8. The singing synthesizing database creation apparatus as claimed in claim 6, wherein said second learning section stores phoneme-dependent component parameters, generated on the basis of the set of learning waveform data of at least one of the singing persons, into the singing synthesizing database as common phoneme-dependent component parameters for individual ones of the singing persons.

9. A singing synthesizing database creation method comprising:
- a step of inputting learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
  
  a step of analyzing the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
  
  a step of analyzing the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separating the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
  
  a first learning step of generating, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, said first learning step storing, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
  
  a second learning step of generating, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, said second learning step storing, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.

10. A computer-readable storage medium containing a program for causing a computer to perform a singing synthesizing database creation method, said singing synthesizing database creation method:
- a step of inputting learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
  
  a step of analyzing the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
  
  a step of analyzing the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separating the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
  
  a first learning step of generating, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, said first learning step storing, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
  
  a second learning step of generating, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, said second learning step storing, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.

11. A pitch curve generation apparatus comprising:
- a singing synthesizing database storing therein, separately for each individual one of a plurality of singing persons,
  
  1) melody component parameters defining a melody component model that represents a variation component presumed to be representative of a melody among variation over time in fundamental frequency between notes in singing voices of the singing person, and
  
  2) an identifier indicative of a combination of one or more notes of which fundamental frequency component variation over time is represented by the melody component model, said singing synthesizing database storing therein sets of the melody component parameters and the identifiers in a form classified according to the singing persons, said singing synthesizing database also storing therein, in association with phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component dependent on a phoneme among variation over time in the fundamental frequency, an identifier indicative of the phoneme for which the variation component is represented by the phoneme-dependent component model;
  
  an input section to which are input singing synthesizing score data representative of a musical score of a singing music piece and information designating any one of the singing persons for which the melody component parameters are prestored in said singing synthesizing database;
  
  a pitch curve generation section which synthesizes a pitch curve of a melody of a singing music piece, represented by the singing synthesizing score data, on the basis of a melody component model defined by the melody component parameters, stored in said singing synthesizing database for the singing person designated by the information inputted via said input section, and a time series of notes represented by the singing synthesizing score data; and
  
  a phoneme-dependent component correction section which, for each of pitch curve sections corresponding to phonemes constituting lyrics represented by the singing synthesizing score data, corrects the pitch curve, in accordance with the phoneme-dependent component model defined by the phoneme-dependent component parameters stored for the phoneme in said singing synthesizing database, and outputs the corrected pitch curve.
- View Dependent Claims (14)
- - 14. A singing synthesizing apparatus for synthesizing singing by use of the pitch curve generation apparatus recited in claim 11, said singing synthesizing apparatus comprises:
    - a sound source which generates a sound signal in accordance with a pitch curve of a melody of a singing music piece, represented by the singing synthesizing score data, generated by the pitch curve generation apparatus; and
      
      a filter section which performs a filter process, corresponding to phonemes constituting lyrics of the singing music piece, on the sound signal outputted from said sound source.

12. A method for generating a pitch curve by use of a singing synthesizing database storing therein, separately for each individual one of a plurality of singing persons, 1) melody component parameters defining a melody component model that represents a variation component presumed to be representative of a melody among variation over time in fundamental frequency between notes in singing voices of the singing person, and 2) an identifier indicative of a combination of one or more notes of which fundamental frequency component variation over time is represented by the melody component model, said singing synthesizing database storing therein sets of the melody component parameters and the identifiers in a form classified according to the singing persons, said singing synthesizing database also storing therein, in association with phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component dependent on a phoneme among variation over time in the fundamental frequency, an identifier indicative of the phoneme for which the variation component is represented by the phoneme-dependent component model, said method comprising:
- a step of inputting singing synthesizing score data representative of a musical score of a singing music piece and information designating any one of the singing persons for which the melody component parameters are prestored in said singing synthesizing database;
  
  a step of synthesizing a pitch curve of a melody of a singing music piece, represented by the singing synthesizing score data, on the basis of a melody component model defined by the melody component parameters, stored in said singing synthesizing database for the singing person designated by the information inputted via said input section, and a time series of notes represented by the singing synthesizing score data; and
  
  a step of, for each of pitch curve sections corresponding to phonemes constituting lyrics represented by the singing synthesizing score data, correcting the pitch curve, in accordance with the phoneme-dependent component model defined by the phoneme-dependent component parameters stored for the phoneme in said singing synthesizing database, and outputting the corrected pitch curve.

13. A computer-readable storage medium containing a program for causing a computer to perform a method for generating a pitch curve by use of a singing synthesizing database storing therein, separately for each individual one of a plurality of singing persons, 1) melody component parameters defining a melody component model that represents a variation component presumed to be representative of a melody among variation over time in fundamental frequency between notes in singing voices of the singing person, and 2) an identifier indicative of a combination of one or more notes of which fundamental frequency component variation over time is represented by the melody component model, said singing synthesizing database storing therein sets of the melody component parameters and the identifiers in a form classified according to the singing persons, said singing synthesizing database also storing therein, in association with phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component dependent on a phoneme among variation over time in the fundamental frequency, an identifier indicative of the phoneme for which the variation component is represented by the phoneme-dependent component model, said method comprising:
- a step of inputting singing synthesizing score data representative of a musical score of a singing music piece and information designating any one of the singing persons for which the melody component parameters are prestored in said singing synthesizing database;
  
  a step of synthesizing a pitch curve of a melody of a singing music piece, represented by the singing synthesizing score data, on the basis of a melody component model defined by the melody component parameters, stored in said singing synthesizing database for the singing person designated by the information inputted via said input section, and a time series of notes represented by the singing synthesizing score data; and
  
  a step of, for each of pitch curve sections corresponding to phonemes constituting lyrics represented by the singing synthesizing score data, correcting the pitch curve, in accordance with the phoneme-dependent component model defined by the phoneme-dependent component parameters stored for the phoneme in said singing synthesizing database, and outputting the corrected pitch curve.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
Bonada, Jordi, Saino, Keijiro

Granted Patent

US 8,423,367 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/267
CPC Class Codes

G10H 1/0008   Associated control or indic...

G10H 2210/066   for pitch analysis as part ...

G10H 2210/086   for transcription of raw au...

G10H 2240/155   Library update, i.e. making...

G10H 2250/015   Markov chains, e.g. hidden ...

G10H 2250/455   Gensound singing voices, i....

G10H 2250/481   Formant synthesis, i.e. sim...

G10L 13/10   Prosody rules derived from ...

Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

43 Citations

14 Claims

Specification

Use Cases

Quick Links

Others

Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others