Method and apparatus for speech synthesis based on prosodic analysis
First Claim
1. A system for synthesizing a speech signal from strings of words, comprising:
- means for entering into the system strings of characters comprising words;
a first memory, wherein predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags;
parsing means, in communication with the entering means and the first memory, for grouping syntax tags of entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another and for verifying the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another, wherein the sequences of the phrases correspond to the entered words;
first means, in communication with the parsing means, for retrieving from the first memory the phonetic transcriptions associated with the syntax tags grouped into phrases conforming to the second set of rules, for translating predetermined strings of entered characters into words, and for generating strings of phonetic transcriptions and prosody markers corresponding to respective strings of entered and translated words;
second means, in communication with the first means, for adding markers for rhythm and stress to the strings of phonetic transcriptions and prosody markers and for converting the strings of phonetic transcriptions and prosody markers into arrays having prosody information on a diphone-by-diphone basis;
a second memory, wherein predetermined diphone waveforms are stored; and
third means, in communication with the second means and the second memory, for retrieving diphone waveforms corresponding to the entered and translated words from the second memory, for adjusting the retrieved diphone waveforms based on the prosody information in the arrays, and for concatenating the adjusted diphone waveforms to form the speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for synthesizing a speech signal from strings of words, which are themselves strings of characters, includes a memory in which predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags. A parser accesses the memory and groups the syntax tags of the entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another. The parser also verifies the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another. The system retrieves the phonetic transcriptions associated with the syntax tags that were grouped into phrases conforming to the second set of rules, and also translates predetermined strings of characters into words. The system generates strings of phonetic transcriptions and prosody markers corresponding to respective strings of the words, and adds markers for rhythm and stress to the strings, which are then converted into data arrays having prosody information on a diphone-by-diphone basis. Predetermined diphone waveforms are retrieved from memory that correspond to the entered words, and these retrieved waveforms are adjusted based on the prosody information in the arrays. The adjusted diphone waveforms, which may also be adjusted for coarticulation, are then concatenated to form the speech signal. Methods in a digital computer are also disclosed.
-
Citations
11 Claims
-
1. A system for synthesizing a speech signal from strings of words, comprising:
-
means for entering into the system strings of characters comprising words; a first memory, wherein predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags; parsing means, in communication with the entering means and the first memory, for grouping syntax tags of entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another and for verifying the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another, wherein the sequences of the phrases correspond to the entered words; first means, in communication with the parsing means, for retrieving from the first memory the phonetic transcriptions associated with the syntax tags grouped into phrases conforming to the second set of rules, for translating predetermined strings of entered characters into words, and for generating strings of phonetic transcriptions and prosody markers corresponding to respective strings of entered and translated words; second means, in communication with the first means, for adding markers for rhythm and stress to the strings of phonetic transcriptions and prosody markers and for converting the strings of phonetic transcriptions and prosody markers into arrays having prosody information on a diphone-by-diphone basis; a second memory, wherein predetermined diphone waveforms are stored; and third means, in communication with the second means and the second memory, for retrieving diphone waveforms corresponding to the entered and translated words from the second memory, for adjusting the retrieved diphone waveforms based on the prosody information in the arrays, and for concatenating the adjusted diphone waveforms to form the speech signal. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. In a computer, a method for synthesizing a speech signal by processing natural language sentences, each sentence having at least one word, comprising the steps of:
-
entering a sentence; storing the entered sentence; finding syntax tags associated with the words of the stored entered sentence in a word dictionary; finding in a phrase table non-terminals associated with the syntax tags associated with the entered words as each word is entered; tracking, in parallel as the words are entered, a plurality of possible sequences of the found non-terminals; verifying the conformance of sequences of the found non-terminals to rules associated with predetermined sequences of non-terminals; retrieving, from the word dictionary, phonetic transcriptions associated with the syntax tags of the entered words of one of the sequences conforming to the rules; generating a string of phonetic transcriptions and prosody markers corresponding to the entered words of said one sequence; adding markers for rhythm and stress to the string of phonetic transcriptions and prosody markers and converting said string into arrays having prosody information on a diphone-by-diphone basis; retrieving, from a memory wherein predetermined diphone waveforms are stored, diphone waveforms corresponding to said string and the entered words of said one sequence; adjusting the retrieved diphone waveforms based on the prosody information in the arrays; and concatenating the adjusted diphone waveforms to form the speech signal. - View Dependent Claims (8, 9, 10, 11)
-
Specification