Systems and methods for multi-style speech synthesis
First Claim
Patent Images
1. A speech synthesis method, comprising:
- using at least one computer hardware processor to perform;
obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech;
identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising;
identifying a first speech segment recorded and/or synthesized in a first speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; and
identifying a second speech segment recorded and/or synthesized in a second speaking style different from the first speaking style based at least in part on a measure of similarity between the desired speaking style and the second speaking style;
synthesizing speech from the text in the desired speaking style, at least in part, by using the first speech segment and the second speech segment; and
outputting the synthesized speech via at least one physical device.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a first speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identified plurality of speech segments comprising a first speech segment having the first speaking style and a second speech segment having a second speaking style different from the first speaking style; and rendering the text as speech having the first speaking style, at least in part, by using the identified plurality of speech segments.
-
Citations
20 Claims
-
1. A speech synthesis method, comprising:
using at least one computer hardware processor to perform; obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising; identifying a first speech segment recorded and/or synthesized in a first speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; and identifying a second speech segment recorded and/or synthesized in a second speaking style different from the first speaking style based at least in part on a measure of similarity between the desired speaking style and the second speaking style; synthesizing speech from the text in the desired speaking style, at least in part, by using the first speech segment and the second speech segment; and outputting the synthesized speech via at least one physical device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A system, comprising:
-
at least one computer hardware processor; at least one physical device for outputting sound; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform; obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising; identifying a first speech segment recorded and/or synthesized in a first speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; and identifying a second speech segment recorded and/or synthesized in a second speaking style different from the first speaking style based at least in part on a measure of similarity between the desired speaking style and the second speaking style; synthesizing speech from the text in the desired speaking style, at least in part, by using the first speech segment and the second speech segment; and outputting the synthesized speech via the at least one physical device. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform:
-
obtaining input comprising text and an identification of a desired speaking style to use in synthesizing the text as speech; identifying a plurality of speech segments for use in synthesizing the text as speech, the identifying comprising; identifying a first speech segment recorded and/or synthesized in a first speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; and identifying a second speech segment recorded and/or synthesized in a second speaking style different from the first speaking style based at least in part on a measure of similarity between the desired speaking style and the second speaking style; synthesizing speech from the text in the desired speaking style, at least in part, by using the first speech segment and the second speech segment; and outputting the synthesized speech via the at least one physical device. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification