Prosody Generation Using Syllable-Centered Polynomial Representation of Pitch Contours
First Claim
1. A method for building databases for prosody generation in speech synthesis using one or more processors comprising:
- A) compile a text corpus of sentences containing all the prosody phenomena of interest;
B) for each phrase in each said sentence, identify the phrase type;
C) segment each sentence into syllables, identify the property and context information of each said syllable;
D) read the sentences by a reference speaker to make a recording of voice signals with simultaneous electroglottograph signals if an electroglottograph instrument is available;
E) segment the voice signals and electroglottograph signals of each sentence into syllables, each said syllable is aligned with a syllable in the text;
F) identify the voiced section in each syllable of the voice recording;
G) calculate pitch values in the said voiced section;
H) generate a polynomial expansion of the pitch contour of each said voiced section in each syllable by least-squares fitting, comprising the use of Gegenbauer polynomials, which at least have a constant term representing the average pitch of the said syllable;
I) for all phrases of a given type, generate a polynomial expansion of the values of said average pitch of all syllables in the said phrases using least-squares fitting, to generate an average global pitch contour of the given phrase type;
J) form a set of syllable pitch parameters for each said syllable by subtracting the value of the global pitch profile at that point from the value of the average pitch of the said syllable together with the rest of polynomial expansion coefficients for the said syllable;
K) correlate the syllable pitch parameters with the property and context information of the said syllable from an analysis of the text to form a database of syllable pitch parameters;
L) correlate the intensity and duration parameters of a syllable to the property and context information of the said syllable from an analysis of the text to form a database of intensity and duration.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention discloses a parametrical representation of prosody based on polynomial expansion coefficients of the pitch contour near the center of each syllable. The said syllable pitch expansion coefficients are generated from a recorded speech database, read from a number of sentences by a reference speaker. By correlating the stress level and context information of each syllable in the text with the polynomial expansion coefficients of the corresponding spoken syllable, a correlation database is formed. To generate prosody for an input text, stress level and context information of each syllable in the text is identified. The prosody is generated by using the said correlation database to find the best set of pitch parameters for each syllable. By adding to global pitch contours and using interpolation formulas, complete pitch contour for the input text is generated. Duration and intensity profile are generated using a similar procedure.
-
Citations
20 Claims
-
1. A method for building databases for prosody generation in speech synthesis using one or more processors comprising:
-
A) compile a text corpus of sentences containing all the prosody phenomena of interest; B) for each phrase in each said sentence, identify the phrase type; C) segment each sentence into syllables, identify the property and context information of each said syllable; D) read the sentences by a reference speaker to make a recording of voice signals with simultaneous electroglottograph signals if an electroglottograph instrument is available; E) segment the voice signals and electroglottograph signals of each sentence into syllables, each said syllable is aligned with a syllable in the text; F) identify the voiced section in each syllable of the voice recording; G) calculate pitch values in the said voiced section; H) generate a polynomial expansion of the pitch contour of each said voiced section in each syllable by least-squares fitting, comprising the use of Gegenbauer polynomials, which at least have a constant term representing the average pitch of the said syllable; I) for all phrases of a given type, generate a polynomial expansion of the values of said average pitch of all syllables in the said phrases using least-squares fitting, to generate an average global pitch contour of the given phrase type; J) form a set of syllable pitch parameters for each said syllable by subtracting the value of the global pitch profile at that point from the value of the average pitch of the said syllable together with the rest of polynomial expansion coefficients for the said syllable; K) correlate the syllable pitch parameters with the property and context information of the said syllable from an analysis of the text to form a database of syllable pitch parameters; L) correlate the intensity and duration parameters of a syllable to the property and context information of the said syllable from an analysis of the text to form a database of intensity and duration. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for building databases for prosody generation in speech synthesis using one or more processors comprising:
-
A) compile a text corpus of sentences containing all the prosody phenomena of interest; B) for each phrase in each said sentence, identify the phrase type; C) segment each sentence into syllables, identify the property and context information of each said syllable; D) read the sentences by a reference speaker to make a recording of voice signals with simultaneous electroglottograph signals if an electroglottograph instrument is available; E) segment the voice signals and electroglottograph signals of each sentence into syllables, each said syllable is aligned with a syllable in the text; F) identify the voiced section in each syllable of the voice recording; G) calculate pitch values in the said voiced section; H) generate a polynomial expansion of the pitch contour of each said voiced section in each syllable by least-squares fitting, comprising the use of Gegenbauer polynomials, which at least have a constant term representing the average pitch of the said syllable; I) for all phrases of a given type, generate a polynomial expansion of the values of said average pitch of all syllables in the said phrases using least-squares fitting, to generate an average global pitch contour of the given phrase type; J) form a set of syllable pitch parameters for each said syllable by subtracting the value of the global pitch profile at that point from the value of the average pitch of the said syllable together with the rest of polynomial expansion coefficients for the said syllable; K) correlate the syllable pitch parameters with the property and context information of the said syllable from an analysis of the text to form a database of syllable pitch parameters; L) correlate the intensity and duration parameters of a syllable to the property and context information of the said syllable from an analysis of the text to form a database of intensity and duration. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification