System for tuning synthesized speech
First Claim
Patent Images
1. A method of tuning synthesized speech, said method comprising:
- synthesizing user supplied text to produce synthesized speech by a text-to-speech engine;
maintaining state information related to said synthesized speech;
receiving a user modification of duration cost factors associated with said synthesized speech to change the duration of said synthesized speech, including modifying a search of speech units when the text is re-synthesized to favor shorter speech units in response to user marking of any speech units in the synthesized speech as too long and modifying the search of speech units to favor longer speech units in response to user marking of any speech units in the synthesized speech as too short;
receiving a user modification of pitch cost factors associated with said synthesized speech to change the pitch of said synthesized speech;
receiving a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of said speech;
displaying a waveform associated with said synthesized speech and receiving user manipulations of the waveform; and
re-synthesizing said speech based on said user supplied text, said user modified duration cost factors, said user modified pitch cost factors, said user indicated segments to skip and said user manipulations of the waveform.
8 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.
25 Citations
17 Claims
-
1. A method of tuning synthesized speech, said method comprising:
-
synthesizing user supplied text to produce synthesized speech by a text-to-speech engine; maintaining state information related to said synthesized speech; receiving a user modification of duration cost factors associated with said synthesized speech to change the duration of said synthesized speech, including modifying a search of speech units when the text is re-synthesized to favor shorter speech units in response to user marking of any speech units in the synthesized speech as too long and modifying the search of speech units to favor longer speech units in response to user marking of any speech units in the synthesized speech as too short; receiving a user modification of pitch cost factors associated with said synthesized speech to change the pitch of said synthesized speech; receiving a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of said speech; displaying a waveform associated with said synthesized speech and receiving user manipulations of the waveform; and re-synthesizing said speech based on said user supplied text, said user modified duration cost factors, said user modified pitch cost factors, said user indicated segments to skip and said user manipulations of the waveform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of tuning synthesized speech, said method comprising:
-
synthesizing user supplied text to produce synthesized speech by a text-to-speech engine, said user supplied text including text, SSML or extended SSML; displaying a waveform associated with said synthesized speech and receiving user manipulations of the waveform; receiving a user modification of duration cost factors of said synthesized speech to change the duration of said synthesized speech; receiving a user modification of pitch cost factors of said synthesized speech to change the pitch of said synthesized speech, including modifying a search of speech units when the text is re-synthesized to favor lower pitched speech units in response to user marking of any speech units in the synthesized speech as too high pitched and modifying the search of speech units to favor higher pitched speech units in response to user marking of any speech units in the synthesized speech as too low pitched; receiving a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of said speech; receiving a user indication of speech units to retain during re-synthesis of said speech; and re-synthesizing said speech based on said user supplied text, said user modified duration cost factors, said user modified pitch cost factors, said user indicated segments to skip and said user manipulations of the waveform. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
Specification