STYLIZED PROSODY FOR SPEECH SYNTHESIS-BASED APPLICATIONS
First Claim
1. In a computing environment, a method comprising, outputting a visual representation including a set of one or more waveforms and corresponding text, and changing prosody of the speech based on interaction with the visual representation to change data corresponding to the prosody.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology by which the prosody of synthesized speech may be changed by varying data associated with that speech. An interface displays a visual representation of synthesized speech as one or more waveforms, along with the corresponding text from which the speech was synthesized. The user may interact with the visual representation to change data corresponding to the prosody, e.g., to change duration, pitch and/or loudness data, with respect to a part (or all) of the speech. The part of the speech that may be varied may comprise a phoneme, a morpheme, a syllable, a word, a phrase, and/or a sentence. The changed speech can be played back to hear the change in prosody resulting from the interactive changes. The user can also change the text and hear/see newly synthesized speech, which may then be similarly edited to change data that corresponds to that speech'"'"'s prosody.
-
Citations
20 Claims
- 1. In a computing environment, a method comprising, outputting a visual representation including a set of one or more waveforms and corresponding text, and changing prosody of the speech based on interaction with the visual representation to change data corresponding to the prosody.
- 9. In a computing environment, a system comprising, a speech synthesis mechanism that outputs speech from text, and an interface coupled to the speech synthesis mechanism, the interface configured to output a visual representation including a set of one or more waveforms and corresponding text, and to receive input, including input that changes data corresponding to prosody of the speech.
-
17. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
-
outputting a visible representation of speech and corresponding text; receiving user interaction corresponding to at least part of the speech; and changing data corresponding to prosody associated with the speech based on the user interaction. - View Dependent Claims (18, 19, 20)
-
Specification