Method and system for generating synthesized speech based on human recording
First Claim
1. A computer-implemented method for generating synthesized speech, comprising the steps of searching over a database that contains pre-recorded utterances to select a best-matched pre-recorded utterance that best matches text content to be synthesized into speech;
- dividing the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that are the same as corresponding parts of the text content and difference segments that are different from corresponding parts of the text content;
synthesizing speech for the parts of the text content corresponding to the difference segments to generate synthesized speech segments; and
splicing the synthesized speech segments of the parts of the text content corresponding to the difference segments with the remaining segments of the selected pre-recorded utterance.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system that incorporates human recording with a TTS system to generate synthesized speech with high quality by searching over a database of pre-recorded utterances to select an utterance best matching text content to be synthesized into speech; dividing the best-matched utterance into a plurality of segments to generate remaining segments that are the same as corresponding parts of the text content and difference segments that are different from corresponding parts of the text content; synthesizing speech for the parts of the text content corresponding to the difference segments; and splicing the synthesized speech segments with the remaining segments of the best-matched utterance.
-
Citations
16 Claims
-
1. A computer-implemented method for generating synthesized speech, comprising the steps of
searching over a database that contains pre-recorded utterances to select a best-matched pre-recorded utterance that best matches text content to be synthesized into speech; -
dividing the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that are the same as corresponding parts of the text content and difference segments that are different from corresponding parts of the text content;
synthesizing speech for the parts of the text content corresponding to the difference segments to generate synthesized speech segments; and
splicing the synthesized speech segments of the parts of the text content corresponding to the difference segments with the remaining segments of the selected pre-recorded utterance. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for generating synthesized speech, comprising:
-
a speech database for storing pre-recorded utterances;
a text input device for inputting text content to be synthesized into speech;
a searching means for searching over the speech database to select best-matched pre-recorded utterance that best match inputted text content;
a speech splicing means for dividing the best-matched pre-recorded utterance into a plurality of segments to generate remaining segments that are the same as corresponding parts of the text content and difference segments that are different from corresponding parts of the text content;
synthesizing speech for parts of the inputted text content corresponding to the difference segments to generated synthesized speech segments; and
splicing the synthesized speech segments with the remaining segments to generate synthesized speech; and
a speech output device for outputting the synthesized speech corresponding to the inputted text content. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A program storage device readable by machine tangibly embodying a program of instructions executable by the machined for implementing a method for generating synthesized speech, wherein the method comprises the steps of:
-
searching over a database that contains pre-recorded utterances to select a best-matched pre-recorded utterance that best matches text content to be synthesized into speech;
dividing the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that are the same as corresponding parts of the text content and difference segments that are different from corresponding parts of the text content;
synthesizing speech for the parts of the text content corresponding to the difference segments to generate synthesized speech segments; and
splicing the synthesized speech segments of the parts of the text content corresponding to the difference segments with the remaining segments of the selected pre-recorded utterance. - View Dependent Claims (13, 14, 15, 16)
-
Specification