METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS
First Claim
1. A method for use with a speech-enabled application, the method comprising:
- receiving input from the speech-enabled application comprising a plurality of text strings;
generating, using at least one computer system, speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
providing the speech synthesis output for the speech-enabled application.
7 Assignments
0 Petitions
Accused Products
Abstract
Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
55 Citations
6 Claims
-
1. A method for use with a speech-enabled application, the method comprising:
-
receiving input from the speech-enabled application comprising a plurality of text strings; generating, using at least one computer system, speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and providing the speech synthesis output for the speech-enabled application.
-
-
2. Apparatus for use with a speech-enabled application, the apparatus comprising:
-
a memory storing a plurality of processor-executable instructions; and at least one processor, operatively coupled to the memory, that executes the instructions to; receive input from the speech-enabled application comprising a plurality of text strings; generate speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and provide the speech synthesis output for the speech-enabled application.
-
-
3. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for use with a speech-enabled application, the method comprising:
-
receiving input from the speech-enabled application comprising a plurality of text strings; generating speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and providing the speech synthesis output for the speech-enabled application.
-
-
4. A method for generating speech output via a speech-enabled application, the to method comprising:
-
generating, using at least one computer system executing the speech-enabled application, a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output; inputting the plurality of text strings to at least one software module for rendering contrastive stress; receiving output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and generating, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.
-
-
5. Apparatus for generating speech output via a speech-enabled application, the apparatus comprising:
-
a memory storing a plurality of processor-executable instructions; and at least one processor, operatively coupled to the memory, that executes the instructions to; generate a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output; input the plurality of text strings to at least one software module for rendering contrastive stress; receive output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and generate, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.
-
-
6. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for generating speech output via a speech-enabled application, the method comprising:
-
generating a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output; inputting the plurality of text strings to at least one software module for rendering contrastive stress; receiving output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and generating, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.
-
Specification