Method and apparatus for generating synthetic speech with contrastive stress
DCFirst Claim
1. A method for use with a speech-enabled application, the method comprising:
- receiving, from the speech-enabled application, input comprising a plurality of text strings;
identifying a first portion of a first text string of the plurality of text strings as differing from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string as not differing from a corresponding second portion of the second text string;
assigning contrastive stress to the first portion of the first text string and/or to the corresponding first portion of the second text string, but not to the second portion of the first text string, and not to the corresponding second portion of the second text string;
generating, using at least one computer system, speech synthesis output to render the plurality of text strings as speech having the assigned contrastive stress; and
providing the speech synthesis output for the speech-enabled application.
7 Assignments
Litigations
0 Petitions
Accused Products
Abstract
Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
-
Citations
20 Claims
-
1. A method for use with a speech-enabled application, the method comprising:
-
receiving, from the speech-enabled application, input comprising a plurality of text strings; identifying a first portion of a first text string of the plurality of text strings as differing from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string as not differing from a corresponding second portion of the second text string; assigning contrastive stress to the first portion of the first text string and/or to the corresponding first portion of the second text string, but not to the second portion of the first text string, and not to the corresponding second portion of the second text string; generating, using at least one computer system, speech synthesis output to render the plurality of text strings as speech having the assigned contrastive stress; and providing the speech synthesis output for the speech-enabled application. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for use with a speech-enabled application, the method comprising:
-
receiving, from the speech-enabled application, input comprising a plurality of text strings; identifying a first portion of a first text string of the plurality of text strings as differing from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string as not differing from a corresponding second portion of the second text string; assigning contrastive stress to the first portion of the first text string and/or to the corresponding first portion of the second text string, but not to the second portion of the first text string, and not to the corresponding second portion of the second text string; generating speech synthesis output to render the plurality of text strings as speech having the assigned contrastive stress; and providing the speech synthesis output for the speech-enabled application. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method for generating speech output via a speech-enabled application, the method comprising:
-
generating, using at least one computer system executing the speech-enabled application, a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output; inputting the plurality of text strings to at least one software module configured to identify a first portion of a first text string of the plurality of text strings as differing from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string as not differing from a corresponding second portion of the second text string; receiving, from the at least one software module, speech synthesis output to render the plurality of text strings with contrastive stress assigned to the first portion of the first text string and/or to the corresponding first portion of the second text string, but not to the second portion of the first text string, and not to the corresponding second portion of the second text string; and generating, using the speech synthesis output, an audio speech output corresponding to the desired speech output. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification