Apparatus and method for creating singing synthesizing database, and pitch curve generation apparatus and method

US 8,423,367 B2
Filed: 07/01/2010
Issued: 04/16/2013
Est. Priority Date: 07/02/2009
Status: Active Grant

First Claim

Patent Images

1. A singing synthesizing database creation apparatus comprising:

an input section to which are input learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;

a pitch extraction section which analyzes the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;

a separation section which analyzes the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separates the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;

a first learning section which generates, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, and which stores, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and

a second learning section which generates, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, and which stores, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Variation over time in fundamental frequency in singing voices is separated into a melody-dependent component and a phoneme-dependent component, modeled for each of the components and stored into a singing synthesizing database. In execution of singing synthesis, a pitch curve indicative of variation over time in fundamental frequency of the melody is synthesized in accordance with an arrangement of notes represented by a singing synthesizing score and the melody-dependent component, and the pitch curve is corrected, for each of pitch curve sections corresponding to phonemes constituting lyrics, using a phoneme-dependent component model corresponding to the phoneme. Such arrangements can accurately model a singing expression, unique to a singing person and appearing in a melody singing style of the person, while taking into account phoneme-dependent pitch variation, and thereby permits synthesis of singing voices that sound more natural.

42 Citations

View as Search Results

10 Claims

1. A singing synthesizing database creation apparatus comprising:
- an input section to which are input learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
  
  a pitch extraction section which analyzes the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
  
  a separation section which analyzes the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separates the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
  
  a first learning section which generates, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, and which stores, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
  
  a second learning section which generates, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, and which stores, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said second learning sectionsegments the phoneme-dependent component data into data sections corresponding to individual ones of the phonemes of the lyrics included in the learning score data,executes, for each of the segmented data sections, a predetermined machine learning algorithm using individual phonemes included in the learning score data and the phoneme-dependent component, andas a result of the machine learning, generates, for each individual unique phoneme, phoneme-dependent component parameters defining a phoneme-dependent component model that represents, with a highest probability, pitch variation represented by the phoneme-dependent component data, andwherein the phoneme-dependent component parameters generated by said second learning section are associated with the phoneme identifier uniquely identifying the unique phoneme.
  - 3. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said first learning sectionsegments the melody component data into a plurality of data sections in such a manner that one or more notes are contained in each of the segmented data sections,executes, for each of the segmented data sections, a predetermined machine learning algorithm using the melody component data and the learning score data corresponding to the data section, andas a result of the machine learning, generates, in association with a combination of the notes in each individual one of the data sections, the melody component parameters that define a melody component model for the data section, andwherein the melody component parameters defining the melody component model are associated with one or more said identifiers each indicative of the combination of notes.
  - 4. The singing synthesizing database creation apparatus as claimed in claim 1, wherein the predetermined machine learning includes executing a Baum-Welch algorithm.
  - 5. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said separation section extracts, from the pitch data, melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and extracts the phoneme-dependent component data on the basis of a difference between the pitch data and the extracted melody component data.
  - 6. The singing synthesizing database creation apparatus as claimed in claim 1, wherein said input section, as the learning waveform data, a plurality of sets of learning waveform data representative of sound waveforms of respective singing voices of a plurality of singing persons, andsaid first learning section classifies melody component parameters, generated on the basis of respective ones of the sets of learning waveform data, according to the singing persons and stores the classified melody component parameters into the singing synthesizing database.
  - 7. The singing synthesizing database creation apparatus as claimed in claim 6, wherein said second learning section classifies phoneme-dependent component parameters, generated on the basis of the respective sets of learning waveform data, according to the singing persons and stores the classified phoneme-dependent component parameters into the singing synthesizing database.
  - 8. The singing synthesizing database creation apparatus as claimed in claim 6, wherein said second learning section stores phoneme-dependent component parameters, generated on the basis of the set of learning waveform data of at least one of the singing persons, into the singing synthesizing database as common phoneme-dependent component parameters for individual ones of the singing persons.

9. A singing synthesizing database creation method comprising:
- a step of inputting learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
  
  a step of analyzing the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
  
  a step of analyzing the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separating the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
  
  a first learning step of generating, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, said first learning step storing, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
  
  a second learning step of generating, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, said second learning step storing, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.

10. A non-transitory computer-readable storage medium containing a program for causing a computer to perform a singing synthesizing database creation method, said singing synthesizing database creation method:
- a step of inputting learning waveform data representative of sound waveforms of singing voices of a singing music piece and learning score data representative of a musical score of the singing music piece, the learning score data including note data representative of a melody and lyrics data representative of lyrics associated with individual ones of the notes;
  
  a step of analyzing the learning waveform data to generate pitch data indicative of variation over time in fundamental frequency in the singing voices;
  
  a step of analyzing the pitch data, for each of pitch data sections corresponding to phonemes constituting the lyrics of the singing music piece, by use of the learning score data and separating the pitch data into melody component data representative of a variation component of the fundamental frequency dependent on the melody of the singing music piece and phoneme-dependent component data representative of a variation component of the fundamental frequency dependent on the phoneme constituting the lyrics;
  
  a first learning step of generating, in association with a combination of notes constituting the melody of the singing music piece, melody component parameters by performing predetermined machine learning using the learning score data and the melody component data, said melody component parameters defining a melody component model that represents a variation component presumed to be representative of the melody among the variation over time in fundamental frequency between notes in the singing voices, said first learning step storing, into a singing synthesizing database, the generated melody component parameters and an identifier, indicative of the combination of notes to be associated with the melody component parameters, in association with each other; and
  
  a second learning step of generating, for each of the phonemes, phoneme-dependent component parameters by performing predetermined machine learning using the learning score data and the phoneme-dependent component data, said phoneme-dependent component parameters defining a phoneme-dependent component model that represents a variation component of the fundamental frequency dependent on the phoneme in the singing voices, said second learning step storing, into the singing synthesizing database, the generated phoneme-dependent component parameters and a phoneme identifier, indicative of the phoneme to be associated with the phoneme-dependent component parameters, in association with each other.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
Saino, Keijiro, Bonada, Jordi
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Nguyen, Timothy

Application Number

US12/828,409
Publication Number

US 20110004476A1
Time in Patent Office

1,020 Days
Field of Search

None
US Class Current

704/267
CPC Class Codes

G10H 1/0008   Associated control or indic...

G10H 2210/066   for pitch analysis as part ...

G10H 2210/086   for transcription of raw au...

G10H 2240/155   Library update, i.e. making...

G10H 2250/015   Markov chains, e.g. hidden ...

G10H 2250/455   Gensound singing voices, i....

G10H 2250/481   Formant synthesis, i.e. sim...

G10L 13/10   Prosody rules derived from ...

Apparatus and method for creating singing synthesizing database, and pitch curve generation apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

42 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for creating singing synthesizing database, and pitch curve generation apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links