Personalized voice playback for screen reader
First Claim
1. A method for automatically customizing output synthesis in a text-to-speech system, the method comprising:
- creating one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text;
analyzing said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation;
determining a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice;
storing each separate set of output synthesis parameter values as a separate speech profile element for the human;
analyzing at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and
applying the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text via the text-to-speech system to speech that includes at least some of the personal speech characteristics of the input voice.
8 Assignments
0 Petitions
Accused Products
Abstract
A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.
-
Citations
15 Claims
-
1. A method for automatically customizing output synthesis in a text-to-speech system, the method comprising:
-
creating one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text; analyzing said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation; determining a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice; storing each separate set of output synthesis parameter values as a separate speech profile element for the human; analyzing at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and applying the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text via the text-to-speech system to speech that includes at least some of the personal speech characteristics of the input voice. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for automatically customizing output synthesis in a text-to-speech system, the system comprising at least one processor programmed to:
-
create one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text; analyze said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation; determine a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice; store each separate set of output synthesis parameter values as a separate speech profile element for the human; analyze at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and apply the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text to speech that includes at least some of the personal speech characteristics of the input voice. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product for automatically customizing output synthesis in a text-to-speech system, the computer program product comprising a computer-readable storage medium having computer-readable program code embodied in the medium, the computer-readable program code comprising:
-
computer-readable program code that creates one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text; computer-readable program code that analyzes said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation; computer-readable program code that determines a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice; computer-readable program code that stores each separate set of output synthesis parameter values as a separate speech profile element for the human; computer-readable program code that analyzes at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and computer-readable program code that applies the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text to speech that includes at least some of the personal speech characteristics of the input voice. - View Dependent Claims (12, 13, 14, 15)
-
Specification