Realistic Speech Synthesis System
First Claim
Patent Images
1. A method of synthesizing speech from text, comprising the steps of:
- selecting one or more scenario parameters;
inputting text parsed into corresponding phonetic components;
merging said phonetic components with breathing and non-speech effects to produce a transcript of phoneme segment strings;
producing prosody contour data from said one or more scenario parameters and said transcript of phoneme segment strings;
producing stitched filter data from said one or more scenario parameters and said transcript of phoneme segment strings;
synthesizing speech from said stitched filter data and said prosody contour data; and
outputting said synthesized speech from a playback device.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for realistic speech synthesis which converts text into synthetic human speech with qualities appropriate to the context such as the language and dialect of the speaker, as well as expanding a speaker'"'"'s phonetic inventory to produce more natural sounding speech.
-
Citations
32 Claims
-
1. A method of synthesizing speech from text, comprising the steps of:
-
selecting one or more scenario parameters; inputting text parsed into corresponding phonetic components; merging said phonetic components with breathing and non-speech effects to produce a transcript of phoneme segment strings; producing prosody contour data from said one or more scenario parameters and said transcript of phoneme segment strings; producing stitched filter data from said one or more scenario parameters and said transcript of phoneme segment strings; synthesizing speech from said stitched filter data and said prosody contour data; and outputting said synthesized speech from a playback device. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for synthesizing speech from text, comprising the steps of:
-
providing a computer having a first database and a second database stored in the memory thereof and in which data is stored, said data in said first database representing a set of signal feature candidates representative of a single speaker, and said data in said second database representing a second set of signal feature candidates; receiving a target set of phonetic components representative of text; analyzing said single speaker signal feature candidates from said first database to determine whether a corresponding single speaker signal feature candidate exists for each target phonetic component; retrieving from said second database a replacement signal feature candidate from said second set of signal feature candidates for any target phonetic component that does not have a corresponding single speaker signal feature candidate; synthesizing speech from at least one of said corresponding single signal feature candidates and said replacement signal feature candidates. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for synthesizing speech from text, comprising the steps of:
-
providing a computer having a first database and a second database stored in the memory thereof and in which data is stored, said data in said first database representing a set of signal feature candidates representative of a single speaker, and said data in said second database representing a second set of signal feature candidates; receiving a target set of phonetic components representative of text; analyzing said single speaker signal feature candidates from said first database to determine whether a corresponding single speaker signal feature candidate of sufficient quality exists for each target phonetic component; retrieving from said second database a replacement signal feature candidate from said second set of signal feature candidates for any target phonetic component that does not have a corresponding single speaker signal feature candidate of sufficient quality; and synthesizing speech from at least one of the corresponding single speaker signal feature candidates and the replacement signal feature candidates. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium containing program code comprising:
-
program code for selecting one or more scenario parameters; program code for inputting text parsed into corresponding phonetic components; program code for merging said phonetic components with breathing and non-speech effects to produce a transcript of phoneme segment strings; program code for producing prosody contour data from said one or more scenario parameters and said transcript of phoneme segment strings; program code for producing stitched filter data from said one or more scenario parameters and said transcript of phoneme segment strings; program code for synthesizing speech from said stitched filter data and said prosody contour data; program code for outputting said synthesized speech from a playback device. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A non-transitory computer-readable storage medium containing program code, comprising:
-
program code for receiving a target set of phonetic components representative of text; program code for analyzing a single speaker'"'"'s signal feature candidates, said single speaker'"'"'s signal feature candidates stored in a database, to determine whether a corresponding single speaker signal feature candidate exists for each said target phonetic component; program code for retrieving from a second set of signal feature candidates, said second set of signal feature candidates stored in database, a replacement signal feature candidate for any target phonetic component that does not have a corresponding single speaker signal feature candidate; program code for synthesizing speech from at least one of said corresponding single speaker signal feature candidates and said replacement signal feature candidates. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A non-transitory computer-readable storage medium containing program code, comprising:
-
program code for receiving a target set of phonetic components representative of text; program code for analyzing a single speaker'"'"'s signal feature candidates, said single speaker'"'"'s signal feature candidates stored in a database, to determine whether a corresponding single speaker signal feature candidate of sufficient quality exists for each said target phonetic component; program code for retrieving from a second set of signal feature candidates, said second set of signal feature candidates stored in database, a replacement signal feature candidate for any target phonetic component that does not have a corresponding single speaker signal feature candidate of sufficient quality; program code for synthesizing speech from at least one of said corresponding single speaker signal feature candidates and said replacement signal feature candidates. - View Dependent Claims (28, 29, 30, 31, 32)
-
Specification