Prosodic databases holding fundamental frequency templates for use in speech synthesis
First Claim
1. In a system for synthesizing speech, a method comprising the computer-implemented steps of:
- providing text for which speech is to be synthesized;
providing prosodic templates created from a corpus of words spoken in a prosodic manner where each template holds a sequence of fundamental frequency values for units of speech;
selecting one of the templates for use in establishing prosody for the synthesized speech for the text; and
synthesizing speech for the text using at least one of the fundamental frequencies from the selected template in establishing prosody for the speech.
2 Assignments
0 Petitions
Accused Products
Abstract
Prosodic databases hold fundamental frequency templates for use in a speech synthesis system. Prosodic database templates may hold fundamental frequency values for syllables in a given sentence. These fundamental frequency values may be applied in synthesizing a sentence of speech. The templates are indexed by tonal pattern markings. A predicted tonal marking pattern is generated for each sentence of text that is to be synthesized, and this predicted pattern of tonal markings is used to locate a best-matching template. The templates are derived by calculating fundamental frequencies on a pursuable basis for sentences that are spoken by a human trainer for a given unlabeled corpus.
93 Citations
38 Claims
-
1. In a system for synthesizing speech, a method comprising the computer-implemented steps of:
-
providing text for which speech is to be synthesized; providing prosodic templates created from a corpus of words spoken in a prosodic manner where each template holds a sequence of fundamental frequency values for units of speech; selecting one of the templates for use in establishing prosody for the synthesized speech for the text; and synthesizing speech for the text using at least one of the fundamental frequencies from the selected template in establishing prosody for the speech. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a system for synthesizing speech, a computer-readable storage medium holding instructions for performing a method comprising the computer-implemented steps of:
-
providing text for which speech is to be synthesized; providing prosodic templates created from a corpus of words spoken in a prosodic manner where each template holds a sequence of fundamental frequency values for units of speech; selecting one of the templates for use in establishing prosody for the synthesized speech for the text; and synthesizing speech for the text using at least one of the fundamental frequencies from the selected template in establishing prosody for the speech. - View Dependent Claims (9, 10, 11, 12)
-
-
13. In a system for synthesizing speech, a method comprising the computer-implemented steps of:
-
providing a prosodic database of fundamental frequencies for units of speech, each entry in said prosodic database being indexed by a pattern of tonal markings that correspond with a degree of emphasis for the units of speech for which fundamental frequencies are held; performing a natural language parse on a given text; based on results of the natural language parse, predicting a predicted pattern of tonal markings for the units of speech in the text; identifying a best-matching index in the prosodic database by comparing the predicted pattern of tonal markings for the units of speech in the text with the indices of the entries of the prosodic database; and using at least one of the fundamental frequency values in the entry in the prosodic database that is indexed by the best-matching index to establish prosody in synthesizing speech for the text. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. In a system for synthesizing speech, a computer-readable storage medium holding instructions for performing a method comprising the computer-implemented steps of:
-
providing a prosodic database at fundamental frequencies for units of speech, each entry in said prosodic database being indexed by a pattern of tonal markings that correspond with a degree of emphasis for the units of speech for which fundamental frequencies are held; performing a natural language parse on a given text; based on results of the natural language parse, predicting a predicted pattern of tonal markings for the units of speech in the text; identifying a best-matching index in the prosodic database by comparing the predicted pattern of tonal markings for the units of speech in the text with the indices of the entries of the prosodic database; and using at least one of the fundamental frequency values in the entry in the prosodic database that is indexed by the best-matching index to establish prosody in synthesizing speech for the text. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. In a computer system, a method of building a prosodic database, comprising the computer-implemented steps of:
-
obtaining an acoustical signal for each of multiple corresponding portions of spoken text that are spoken by a human trainer, each said acoustical signal being the signal that results when the human trainer speaks the corresponding portion of text; obtaining a laryngograph signal for each portion of spoken text from a laryngograph worn by the human trainer when the portions of text are spoken; segmenting the acoustical signal into segments representing syllables in the text where each syllable includes a vowel section; segmenting the laryngograph signal into segments that match the segments of the acoustical signal; calculating a weighted sum of instantaneous fundamental frequencies for the vowel section of each syllable in each portion of text wherein the fundamental frequencies are obtained from the laryngograph signal and weights are obtained from the acoustical signal; for each portion of text, storing the weighted sum of instantaneous fundamental frequencies for each syllable of the portion of text in the prosodic database; and using the weighted sums of instantaneous fundamental frequencies in the prosodic database to establish prosody of synthesized speech. - View Dependent Claims (30, 31)
-
-
32. In a computer system, a computer-readable storage medium holding instructions for performing a method of building a prosodic database comprising the steps of:
-
obtaining an acoustical signal for each of multiple corresponding portions of spoken text that are spoken by a human trainer, each said acoustical signal being the signal that results when the human trainer speaks the corresponding portion of text; obtaining a laryngograph signal for each portion of spoken text from a laryngograph worn by the human trainer when the portions of text are spoken; segmenting the acoustical signal into segments representing syllables in the text where each syllable includes a vowel section; segmenting the laryngograph signal into segments that match the segments of the acoustical signal; calculating a weighted sum of instantaneous fundamental frequencies for the vowel section of each syllable in each portion of text wherein the fundamental frequencies are obtained from the laryngograph signal and weights are obtained from the acoustical signal; for each portion of text, storing the weighted sum of instantaneous fundamental frequencies for each syllable of the portion of text in the prosodic database; and using the weighted sums of instantaneous fundamental frequencies in the prosodic database to establish prosody of synthesized speech.
-
-
33. A text to speech system, comprising:
-
a parser for parsing input text into units of speech; a prosodic database holding prosodic templates created from a corpus of words spoken in a prosodic manner wherein each prosodic template holds a sequence of fundamental frequency values for units of speech; and a speech synthesizer for generating speech corresponding to the input text by using a selected one of the templates in the prosodic database to obtain fundamental frequency values for units of speech in the input text. - View Dependent Claims (34)
-
-
35. In a system for generating speech, a method comprising the computer-implemented steps of:
-
providing a prosodic database holding prosodic templates for different styles of speech, each prosodic template comprising a sequence of fundamental frequencies associated with using the template'"'"'s respective style of speech while speaking a sequence of language units; determining what prosodic style to apply to a portion of speech to be generated; and using at least one of the templates in the prosodic database for the determined prosodic style to generate the portion of speech with the determined prosodic style.
-
-
36. In a system for generating speech, a computer-readable medium holding computer-executable instructions for performing a method comprising the computer-implemented steps of:
-
providing a prosodic database holding prosodic templates for different styles of speech, each prosodic template comprising a sequence of fundamental frequencies associated with using the template'"'"'s respective style of speech while speaking a sequence of language units; determining what prosodic style to apply to a portion of speech to be generated; and using at least one of the templates in the prosodic database for the determined prosodic style to generate the portion of speech with the determined prosodic style.
-
-
37. In a system for generating speech, a method comprising the computer-implemented steps of:
-
providing a prosodic database holding prosodic templates of different prosodic styles for a single speaker, each prosodic template comprising a series of fundamental frequencies associated with the single speaker using the template'"'"'s associated prosodic style while speaking a sequence of language units; determining which of the prosodic styles is to be applied to a portion of speech that is to be generated; and using at least one of the templates in the prosodic database for the determined prosodic style to generate the portion of speech with the determined prosodic style.
-
-
38. In a system for generating speech, a computer-readable medium holding computer-executable instructions for performing a method comprising the computer-implemented steps of:
-
providing a prosodic database holding prosodic templates of different prosodic styles for a single speaker, each prosodic template comprising a series of fundamental frequencies associated with the single speaker using the template'"'"'s associated prosodic style while speaking a sequence of language units; determining which of the prosodic styles is to be applied to a portion of speech that is to be generated; and using at least one of the templates in the prosodic database for the determined prosodic style to generate the portion of speech with the determined prosodic style.
-
Specification