Personalized voice playback for screen reader

US 7,865,365 B2
Filed: 08/05/2004
Issued: 01/04/2011
Est. Priority Date: 08/05/2004
Status: Active Grant

First Claim

Patent Images

1. A method for automatically customizing output synthesis in a text-to-speech system, the method comprising:

creating one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text;

analyzing said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation;

determining a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice;

storing each separate set of output synthesis parameter values as a separate speech profile element for the human;

analyzing at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and

applying the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text via the text-to-speech system to speech that includes at least some of the personal speech characteristics of the input voice.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.

Citations

15 Claims

1. A method for automatically customizing output synthesis in a text-to-speech system, the method comprising:
- creating one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text;
  
  analyzing said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation;
  
  determining a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice;
  
  storing each separate set of output synthesis parameter values as a separate speech profile element for the human;
  
  analyzing at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and
  
  applying the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text via the text-to-speech system to speech that includes at least some of the personal speech characteristics of the input voice.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the creating the one or more speech samples using the input voice includes eliciting a plurality of speech patterns of said input voice, and wherein the analyzing said speech samples includes associating sound parameters with speech patterns of the plurality of speech patterns of said input voice.
  - 3. The method of claim 1, wherein the analyzing the at least a portion of the at least one second text comprises determining that the at least a portion of the at least one second text contains one or more triggers associated with the speech pattern corresponding to the selected separate speech profile element.
  - 4. The method of claim 1, wherein at least one of the first and second speech patterns represents an emotional state.
  - 5. The method of claim 2, wherein the step of eliciting the plurality of speech patterns of said input voice comprises constructing the first text to elicit the plurality of speech patterns when the human utters the first text.

6. A system for automatically customizing output synthesis in a text-to-speech system, the system comprising at least one processor programmed to:
- create one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text;
  
  analyze said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation;
  
  determine a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice;
  
  store each separate set of output synthesis parameter values as a separate speech profile element for the human;
  
  analyze at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and
  
  apply the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text to speech that includes at least some of the personal speech characteristics of the input voice.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The system of claim 6, wherein creating the one or more speech samples using the input voice comprises eliciting a plurality of speech patterns of said input voice, and wherein analyzing said speech samples comprises associating sound parameters with speech patterns of the plurality of speech patterns of said input voice.
  - 8. The system of claim 6, wherein analyzing the at least a portion of the at least one second text comprises determining that the at least a portion of the at least one second text contains one or more triggers associated with the speech pattern corresponding to the selected separate speech profile element.
  - 9. The system of claim 6, wherein at least one of the first and second speech patterns represents an emotional state.
  - 10. The system of claim 7, wherein eliciting the plurality of speech patterns of said input voice comprises constructing the first text to elicit the plurality of speech patterns when the human utters the first text.

11. A computer program product for automatically customizing output synthesis in a text-to-speech system, the computer program product comprising a computer-readable storage medium having computer-readable program code embodied in the medium, the computer-readable program code comprising:
- computer-readable program code that creates one or more speech samples using an input voice of a human, the speech samples resulting from the human uttering a first text;
  
  computer-readable program code that analyzes said speech samples to extract a separate set of sound parameter values associated with personal speech characteristics of said input voice for each of first and second speech patterns, wherein the first and second speech patterns represent different manners of speaking in a same language, and wherein the sound parameter values relate to at least one quality selected from the group consisting of pitch, breathiness, tone, speed, volume, pitch variation, breathiness variation, tone variation, speed variation and volume variation;
  
  computer-readable program code that determines a separate set of output synthesis parameter values for each of the first and second speech patterns based, at least in part, on the respective separate set of sound parameter values of said input voice;
  
  computer-readable program code that stores each separate set of output synthesis parameter values as a separate speech profile element for the human;
  
  computer-readable program code that analyzes at least a portion of at least one second text, the at least one second text being unrelated to the first text, to select, from among the separate speech profile elements of the first and second speech patterns, a selected separate speech profile element based on how well the selected separate speech profile element corresponds to the at least a portion of the at least one second text; and
  
  computer-readable program code that applies the separate set of output synthesis parameter values of the selected separate speech profile element to synthesize the at least a portion of the at least one second text to speech that includes at least some of the personal speech characteristics of the input voice.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer program product of claim 11, wherein said computer-readable program code creating the one or more speech samples using the input voice includes computer-readable program code that elicits a plurality of speech patterns of said input voice, and wherein said computer-readable program code analyzing said speech samples includes computer-readable program code that associates sound parameters with speech patterns of the plurality of speech patterns of said input voice.
  - 13. The computer program product of claim 11, wherein said computer-readable program code analyzing the at least a portion of the at least one second text comprises computer-readable program code that determines that the at least a portion of the at least one second text contains one or more triggers associated with the speech pattern corresponding to the selected separate speech profile element.
  - 14. The computer program product of claim 11, wherein at least one of the first and second speech patterns represents an emotional state.
  - 15. The computer program product of claim 12, wherein the computer-readable program code that elicits the plurality of speech patterns comprises computer-readable program code that constructs the first text to elicit the plurality of speech patterns when the human utters the first text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Anglin, Debbie Ann, Kline, Nyralin Novella, Anglin, Howard Neil
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/912,496
Publication Number

US 20060031073A1
Time in Patent Office

2,343 Days
Field of Search

704/270
US Class Current

704/258
CPC Class Codes

G10L 17/26 Recognition of special voic...

Personalized voice playback for screen reader

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Personalized voice playback for screen reader

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links