Systems and methods for multi-style speech synthesis
First Claim
1. A speech synthesis method, comprising:
- using at least one computer hardware processor to perform;
obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech;
identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style;
synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and
outputting the synthesized speech.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.
-
Citations
20 Claims
-
1. A speech synthesis method, comprising:
using at least one computer hardware processor to perform; obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A system, comprising:
-
at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform; obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech. - View Dependent Claims (12, 13, 14, 15)
-
-
16. At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform:
-
obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech. - View Dependent Claims (17, 18, 19, 20)
-
Specification