System and method for blending synthetic voices

US 7,966,186 B2
Filed: 11/04/2008
Issued: 06/21/2011
Est. Priority Date: 01/08/2004
Status: Active Grant

First Claim

Patent Images

1. A tangible computer-readable medium storing instructions for controlling a computing device to generate a synthetic voice, the instructions comprising:

receiving a user selection of a first text-to-speech voice and a selected voice characteristic for modifying the first text-to-speech voice;

selecting the first text-to-speech voice from a plurality of text-to-speech voices;

selecting a second text-to-speech voice exhibiting the selected voice characteristic; and

presenting the user with a new text-to-speech voice comprising the first text-to-speech voice modified with at least the selected voice characteristic from the second text-to-speech voice.

View all claims

17 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

29 Citations

View as Search Results

21 Claims

1. A tangible computer-readable medium storing instructions for controlling a computing device to generate a synthetic voice, the instructions comprising:
- receiving a user selection of a first text-to-speech voice and a selected voice characteristic for modifying the first text-to-speech voice;
  
  selecting the first text-to-speech voice from a plurality of text-to-speech voices;
  
  selecting a second text-to-speech voice exhibiting the selected voice characteristic; and
  
  presenting the user with a new text-to-speech voice comprising the first text-to-speech voice modified with at least the selected voice characteristic from the second text-to-speech voice.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The tangible computer-readable medium of claim 1, the instructions further comprising:
    - presenting the new text-to-speech voice to the user for preview;
      
      receiving user-selected adjustments; and
      
      presenting a revised text-to-speech voice to the user for preview according to the user-selected adjustments.
  - 3. The tangible computer-readable medium of claim 2, wherein the segment parameters relate to prosodic characteristics.
  - 4. The tangible computer-readable medium of claim 3, wherein the prosodic characteristics are selected from a group comprising pitch contour, spectral envelope, volume contour and phone durations.
  - 5. The tangible computer-readable medium of claim 4, wherein the prosodic characteristics are further selected from a group comprising:
    - syllable accent, language accent and emotion.
  - 6. The tangible computer-readable medium of claim 1, wherein generating the new text-to-speech voice further comprises interpolating between corresponding segment parameters of the first text-to-speech voice and the second text-to-speech voice.
  - 7. The tangible computer-readable medium of claim 1, wherein the new text-to-speech voice is generated by extracting a prosodic characteristic from a Linear-Predictive Coding residual of the first text-to-speech voice and the Linear-Predictive Coding residual of the second text-to-speech voice and interpolating between the extracted prosodic characteristics.
  - 8. The tangible computer-readable medium of claim 7, wherein the prosodic characteristic is pitch and wherein the interpolation of the extracted pitches from the first text-to-speech voice and the second text-to-speech voice generates a new blended pitch.
  - 9. The tangible computer-readable medium of claim 1, wherein the first text-to-speech voice is blended with a plurality of other text-to-speech voices to generate the new text-to-speech voice.
  - 10. The tangible computer-readable medium of claim 1, wherein the voice characteristic relates to mis-pronunciations.

11. A method of generating a synthetic voice, the method comprising:
- receiving a user selection of a first text-to-speech voice and a selected voice characteristic for modifying the first text-to-speech voice;
  
  selecting the first text-to-speech voice from a plurality of text-to-speech voices;
  
  selecting a second text-to-speech voice exhibiting the selected voice characteristic; and
  
  presenting the user with a new text-to-speech voice comprising the first text-to-speech voice modified with at least the selected voice characteristic from the second text-to-speech voice.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The method of claim 11, wherein the first text-to-speech voice exhibiting the selected voice characteristic is generated by blending the first text-to-speech voice with the second text-to-speech voice.
  - 13. The method of claim 12, wherein the second text-to-speech voice includes the selected voice characteristic.
  - 14. The method of claim 13, wherein the new text-to-speech voice is generated to exhibit the selected voice characteristic by blending the first text-to-speech voice with at least the second text-to-speech voice.
  - 15. The method of claim 11, further comprising:
    - presenting the new text-to-speech voice to the user for preview;
      
      receiving user-selected adjustments associated with the selected voice characteristic; and
      
      presenting a revised text-to-speech voice for the user for preview according to the user selected adjustments to the selected voice characteristic.
  - 16. The method of claim 11, wherein the voice characteristic relates to mispronunciations.

17. A system for generating a synthetic voice, the system comprising:
- a first module configured to control a processor to receive a user selection of a first text-to-speech voice and a selected voice characteristic for modifying the first text-to-speech voice;
  
  a second module configured to control the processor to select the first text-to-speech voice from a plurality of text-to-speech voices;
  
  a third module for configured to control the processor to select a second text-to-speech voice exhibiting the selected voice characteristic;
  
  a fourth module configured to control the processor to present the user with a new text-to-speech comprising the first text-to-speech voice modified with the selected voice characteristic from the second text-to-speech voice.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The system of claim 17, the system further comprising:
    - a fifth module configured to control the processor to present the new text-to-speech voice to the user for preview;
      
      a sixth module configured to control the processor to receive user selected adjustments associated with a selected voice characteristic; and
      
      a seventh module configured to control the processor to present a second new text-to-speech voice to the user for preview according to the user-selected adjustments of the selected voice characteristic.
  - 19. The system of claim 18, wherein each voice of the plurality of text-to-speech voices has speaker-specific parameters.
  - 20. The system of claim 19, wherein the speaker-specific parameters comprise at least prosodic parameters associated with each text-to-speech voice.
  - 21. The system of claim 20, wherein the speaker-specific parameters further comprise speaker-specific pronunciations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactions, LLC
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Schroeter, Juergen, Kapilow, David A., Rosen, Kenneth H.
Primary Examiner(s)
Chawan; Vijay B

Application Number

US12/264,622
Publication Number

US 20090063153A1
Time in Patent Office

959 Days
Field of Search

704/258, 704/260, 704270-275, 704/261, 704/267, 704/268, 704/269
US Class Current

704/269
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

System and method for blending synthetic voices

First Claim

17 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for blending synthetic voices

First Claim

17 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links