Method and system for generating synthesized speech based on human recording
First Claim
1. A computer-implemented method for generating synthesized speech from input text, the method comprising:
- selecting a best-matched pre-recorded utterance from a plurality of pre-recorded utterances, wherein the selecting is based, at least in part, on a degree of matching between the input text and texts associated with the plurality of pre-recorded utterances;
dividing the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that match corresponding parts of the input text and difference segments that do not match corresponding parts of the input text;
synthesizing speech for parts of the input text corresponding to the difference segments in the selected best-matched pre-recorded utterance to generate synthesized speech segments; and
splicing the synthesized speech segments of the parts of the input text corresponding to the difference segments with the remaining segments of the selected best-matched pre-recorded utterance to generate the synthesized speech for the input text.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system that incorporates human recording with a TTS system to generate synthesized speech with high quality by searching over a database of pre-recorded utterances to select an utterance best matching text content to be synthesized into speech; dividing the best-matched utterance into a plurality of segments to generate remaining segments that are the same as corresponding parts of the text content and difference segments that are different from corresponding parts of the text content; synthesizing speech for the parts of the text content corresponding to the difference segments; and splicing the synthesized speech segments with the remaining segments of the best-matched utterance.
-
Citations
15 Claims
-
1. A computer-implemented method for generating synthesized speech from input text, the method comprising:
-
selecting a best-matched pre-recorded utterance from a plurality of pre-recorded utterances, wherein the selecting is based, at least in part, on a degree of matching between the input text and texts associated with the plurality of pre-recorded utterances; dividing the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that match corresponding parts of the input text and difference segments that do not match corresponding parts of the input text; synthesizing speech for parts of the input text corresponding to the difference segments in the selected best-matched pre-recorded utterance to generate synthesized speech segments; and splicing the synthesized speech segments of the parts of the input text corresponding to the difference segments with the remaining segments of the selected best-matched pre-recorded utterance to generate the synthesized speech for the input text. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for generating synthesized speech for input text, the system comprising:
-
at least one storage device comprising a plurality of pre-recorded utterances; and at least one computer configured to; select a best-matched pre-recorded utterance from a plurality of pre-recorded utterances, wherein the selecting is based, at least in part, on a degree of matching between the input text and texts associated with the plurality of pre-recorded utterances; divide the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that match corresponding parts of the input text and difference segments that do not match corresponding parts of the input text; synthesize speech for parts of the input text corresponding to the difference segments in the selected best-matched pre-recorded utterance to generate synthesized speech segments; and splice the synthesized speech segments with the remaining segments to generate synthesized speech for the input text. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A machine-readable program storage device tangibly embodying a program of instructions that, when executed by the machine, perform a method for generating synthesized speech from input text, the method comprising:
-
selecting a best-matched pre-recorded utterance from a plurality of pre-recorded utterances, wherein the selecting is based, at least in part, on a degree of matching between the input text and texts associated with the plurality of pre-recorded utterances; dividing the best-matched pre-recorded utterance into a plurality of segments comprising remaining segments that match corresponding parts of the input text and difference segments that do not match corresponding parts of the input text; synthesizing speech for parts of the input text corresponding to the difference segments in the selected best-matched pre-recorded utterance to generate synthesized speech segments; and splicing the synthesized speech segments of the parts of the input text corresponding to the difference segments with the remaining segments of the selected best-matched pre-recorded utterance to generate the synthesized speech for the input text. - View Dependent Claims (12, 13, 14, 15)
-
Specification