SYSTEM FOR TUNING SYNTHESIZED SPEECH

US 20080167875A1
Filed: 01/09/2007
Published: 07/10/2008
Est. Priority Date: 01/09/2007
Status: Active Grant

First Claim

Patent Images

1. A method of tuning synthesized speech, said method comprising:

entering a plurality of user supplied text into a text field;

clicking a graphical user interface button to send said plurality of user supplied text to a text-to-speech engine;

synthesizing said plurality of user supplied text to produce a plurality of speech by way of said text-to-speech engine;

maintaining state information related to said plurality of speech;

allowing a user to modify a plurality of duration cost factors associated with said plurality of speech to change the duration of said plurality of speech;

allowing said user to modify a plurality of pitch cost factors associated with said plurality of speech to change the pitch of said plurality of speech;

allowing said user to indicate a plurality of speech units to skip during re-synthesis of said plurality of user supplied text; and

re-synthesizing said plurality of speech based on said plurality of user supplied text, said user modified said plurality of duration cost factors, said user modified said plurality of pitch cost factors, and said user effectuated modifications.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

Citations

17 Claims

1. A method of tuning synthesized speech, said method comprising:
- entering a plurality of user supplied text into a text field;
  
  clicking a graphical user interface button to send said plurality of user supplied text to a text-to-speech engine;
  
  synthesizing said plurality of user supplied text to produce a plurality of speech by way of said text-to-speech engine;
  
  maintaining state information related to said plurality of speech;
  
  allowing a user to modify a plurality of duration cost factors associated with said plurality of speech to change the duration of said plurality of speech;
  
  allowing said user to modify a plurality of pitch cost factors associated with said plurality of speech to change the pitch of said plurality of speech;
  
  allowing said user to indicate a plurality of speech units to skip during re-synthesis of said plurality of user supplied text; and
  
  re-synthesizing said plurality of speech based on said plurality of user supplied text, said user modified said plurality of duration cost factors, said user modified said plurality of pitch cost factors, and said user effectuated modifications.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method in accordance with claim 1, further comprising:
    - allowing said user to interact with said plurality of speech by viewing said plurality of speech, replaying said plurality of speech, and manipulating a waveform associated with said plurality of speech.
  - 3. The method in accordance with claim 1, further comprising:
    - allowing said user to highlight a portion of a graphical representation of said plurality of speech.
  - 4. The method in accordance with claim 3, wherein allowing said user to highlight in claim 3 further includes allowing said user to click on the highlighted portion to convert said plurality of speech to a SSML representation.
  - 5. The method in accordance with claim 4, further comprising:
    - adding a paralinguistic as SSML codes to said plurality of user supplied text.
  - 6. The method in accordance with claim 5, wherein said paralinguistic is at least one of the following:
    - i) a breath;
      
      ii) a cough;
      
      iii) a laugh;
      
      iv) a sigh;
      
      v) a throat clear;
      
      orvi) a sniffle.
  - 7. The method in accordance with claim 4, further comprising:
    - adding a speaking style as SSML codes to said plurality of user supplied text.
  - 8. The method in accordance with claim 5, further comprising:
    - adding a speaking style as SSML codes to said plurality of user supplied text.
  - 9. The method in accordance with claim 8, wherein said speaking style is apologetic.
  - 10. The method in accordance with claim 8, further comprising:
    - allowing said user to provide prosody by providing a sample recording.

11. A method of tuning synthesized speech, said method comprising:
- entering a plurality of user supplied text into a text field, said plurality of user supplied text can be text, SSML, and or extended SSML;
  
  synthesizing said plurality of user supplied text to produce a plurality of speech by way of a text-to-speech engine;
  
  allowing a user to interact with said plurality of speech by viewing said plurality of speech, replaying said plurality of speech, and manipulating a waveform associated with said plurality of speech;
  
  allowing said user to modify a plurality of duration cost factors of said plurality of speech to change the duration of said plurality of speech;
  
  allowing said user to modify a plurality of pitch cost factors of said plurality of speech to change the pitch of said plurality of speech;
  
  allowing said user to indicate a plurality of speech units to skip during re-synthesis of said plurality of speech;
  
  allowing said user to indicate a plurality of speech units to retain during re-synthesis of said plurality of speech;
  
  allowing said user to provide prosody by providing a sample recording; and
  
  re-synthesizing said plurality of speech based on said plurality of user supplied text, said user modified said plurality of duration cost factors, said user modified said plurality of pitch cost factors, and said user effectuated modifications.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method in accordance with claim 11, further comprising:
    - allowing said user to highlight a portion of a graphical representation of said plurality of speech.
  - 13. The method in accordance with claim 11, wherein allowing said user to highlight in claim 12 further includes allowing said user to click on the highlighted portion to convert said plurality of speech to a SSML representation.
  - 14. The method in accordance with claim 13, further comprising:
    - adding a paralinguistic as SSML codes to said plurality of user supplied text.
  - 15. The method in accordance with claim 14, further comprising:
    - adding a speaking style as SSML codes to said plurality of user supplied text.
  - 16. The method in accordance with claim 15, further comprising:
    - allowing said user to provide prosody by providing a sample recording.
  - 17. The method in accordance with claim 16, wherein said waveform is a pitch contour of said plurality of speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Pieraccini, Roberto, Eide, Ellen M., Bakis, Raimo, Smith, Maria E., Zeng, Jie

Granted Patent

US 8,438,032 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/08 Text analysis or generation...

SYSTEM FOR TUNING SYNTHESIZED SPEECH

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM FOR TUNING SYNTHESIZED SPEECH

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links