SYSTEM FOR TUNING SYNTHESIZED SPEECH
First Claim
Patent Images
1. A method of tuning synthesized speech, said method comprising:
- entering a plurality of user supplied text into a text field;
clicking a graphical user interface button to send said plurality of user supplied text to a text-to-speech engine;
synthesizing said plurality of user supplied text to produce a plurality of speech by way of said text-to-speech engine;
maintaining state information related to said plurality of speech;
allowing a user to modify a plurality of duration cost factors associated with said plurality of speech to change the duration of said plurality of speech;
allowing said user to modify a plurality of pitch cost factors associated with said plurality of speech to change the pitch of said plurality of speech;
allowing said user to indicate a plurality of speech units to skip during re-synthesis of said plurality of user supplied text; and
re-synthesizing said plurality of speech based on said plurality of user supplied text, said user modified said plurality of duration cost factors, said user modified said plurality of pitch cost factors, and said user effectuated modifications.
8 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.
-
Citations
17 Claims
-
1. A method of tuning synthesized speech, said method comprising:
-
entering a plurality of user supplied text into a text field; clicking a graphical user interface button to send said plurality of user supplied text to a text-to-speech engine; synthesizing said plurality of user supplied text to produce a plurality of speech by way of said text-to-speech engine; maintaining state information related to said plurality of speech; allowing a user to modify a plurality of duration cost factors associated with said plurality of speech to change the duration of said plurality of speech; allowing said user to modify a plurality of pitch cost factors associated with said plurality of speech to change the pitch of said plurality of speech; allowing said user to indicate a plurality of speech units to skip during re-synthesis of said plurality of user supplied text; and re-synthesizing said plurality of speech based on said plurality of user supplied text, said user modified said plurality of duration cost factors, said user modified said plurality of pitch cost factors, and said user effectuated modifications. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of tuning synthesized speech, said method comprising:
-
entering a plurality of user supplied text into a text field, said plurality of user supplied text can be text, SSML, and or extended SSML; synthesizing said plurality of user supplied text to produce a plurality of speech by way of a text-to-speech engine; allowing a user to interact with said plurality of speech by viewing said plurality of speech, replaying said plurality of speech, and manipulating a waveform associated with said plurality of speech; allowing said user to modify a plurality of duration cost factors of said plurality of speech to change the duration of said plurality of speech; allowing said user to modify a plurality of pitch cost factors of said plurality of speech to change the pitch of said plurality of speech; allowing said user to indicate a plurality of speech units to skip during re-synthesis of said plurality of speech; allowing said user to indicate a plurality of speech units to retain during re-synthesis of said plurality of speech; allowing said user to provide prosody by providing a sample recording; and re-synthesizing said plurality of speech based on said plurality of user supplied text, said user modified said plurality of duration cost factors, said user modified said plurality of pitch cost factors, and said user effectuated modifications. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification