System for tuning synthesized speech

US 8,438,032 B2
Filed: 01/09/2007
Issued: 05/07/2013
Est. Priority Date: 01/09/2007
Status: Active Grant

First Claim

Patent Images

1. A method of tuning synthesized speech, said method comprising:

synthesizing user supplied text to produce synthesized speech by a text-to-speech engine;

maintaining state information related to said synthesized speech;

receiving a user modification of duration cost factors associated with said synthesized speech to change the duration of said synthesized speech, including modifying a search of speech units when the text is re-synthesized to favor shorter speech units in response to user marking of any speech units in the synthesized speech as too long and modifying the search of speech units to favor longer speech units in response to user marking of any speech units in the synthesized speech as too short;

receiving a user modification of pitch cost factors associated with said synthesized speech to change the pitch of said synthesized speech;

receiving a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of said speech;

displaying a waveform associated with said synthesized speech and receiving user manipulations of the waveform; and

re-synthesizing said speech based on said user supplied text, said user modified duration cost factors, said user modified pitch cost factors, said user indicated segments to skip and said user manipulations of the waveform.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

25 Citations

View as Search Results

17 Claims

1. A method of tuning synthesized speech, said method comprising:
- synthesizing user supplied text to produce synthesized speech by a text-to-speech engine;
  
  maintaining state information related to said synthesized speech;
  
  receiving a user modification of duration cost factors associated with said synthesized speech to change the duration of said synthesized speech, including modifying a search of speech units when the text is re-synthesized to favor shorter speech units in response to user marking of any speech units in the synthesized speech as too long and modifying the search of speech units to favor longer speech units in response to user marking of any speech units in the synthesized speech as too short;
  
  receiving a user modification of pitch cost factors associated with said synthesized speech to change the pitch of said synthesized speech;
  
  receiving a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of said speech;
  
  displaying a waveform associated with said synthesized speech and receiving user manipulations of the waveform; and
  
  re-synthesizing said speech based on said user supplied text, said user modified duration cost factors, said user modified pitch cost factors, said user indicated segments to skip and said user manipulations of the waveform.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method in accordance with claim 1, further comprising:
    - highlighting, in response to a user input, a portion of a graphical representation of said synthesized speech.
  - 3. The method in accordance with claim 2, wherein highlighting further includes receiving a user selection of the highlighted portion to convert said synthesized speech to a SSML representation.
  - 4. The method in accordance with claim 3, further comprising:
    - adding a paralinguistic as SSML codes to said user supplied text.
  - 5. The method in accordance with claim 4, wherein said paralinguistic is at least one of the following:
    - i) a breath;
      
      ii) a cough;
      
      iii) a laugh;
      
      iv) a sigh;
      
      v) a throat clear;
      
      orvi) a sniffle.
  - 6. The method in accordance with claim 3, further comprising:
    - adding a speaking style as SSML codes to said user supplied text.
  - 7. The method in accordance with claim 6, wherein said speaking style is apologetic.
  - 8. The method in accordance with claim 6, further comprising:
    - receiving a sample recording from said user to provide prosody.
  - 9. The method in accordance with claim 1, further comprising receiving a user indication of segments of the text that are to be used during re-synthesis of said speech.

10. A method of tuning synthesized speech, said method comprising:
- synthesizing user supplied text to produce synthesized speech by a text-to-speech engine, said user supplied text including text, SSML or extended SSML;
  
  displaying a waveform associated with said synthesized speech and receiving user manipulations of the waveform;
  
  receiving a user modification of duration cost factors of said synthesized speech to change the duration of said synthesized speech;
  
  receiving a user modification of pitch cost factors of said synthesized speech to change the pitch of said synthesized speech, including modifying a search of speech units when the text is re-synthesized to favor lower pitched speech units in response to user marking of any speech units in the synthesized speech as too high pitched and modifying the search of speech units to favor higher pitched speech units in response to user marking of any speech units in the synthesized speech as too low pitched;
  
  receiving a user indication of segments of the user supplied text and/or the synthesized speech to skip during re-synthesis of said speech;
  
  receiving a user indication of speech units to retain during re-synthesis of said speech; and
  
  re-synthesizing said speech based on said user supplied text, said user modified duration cost factors, said user modified pitch cost factors, said user indicated segments to skip and said user manipulations of the waveform.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method in accordance with claim 10, further comprising:
    - highlighting, in response to a user input, a portion of a graphical representation of said synthesized speech.
  - 12. The method in accordance with claim 11, wherein highlighting further includes receiving a user selection of the highlighted portion to convert said synthesized speech to a SSML representation.
  - 13. The method in accordance with claim 12, further comprising:
    - adding a paralinguistic as SSML codes to said user supplied text.
  - 14. The method in accordance with claim 13, further comprising:
    - adding a speaking style as SSML codes to said user supplied text.
  - 15. The method in accordance with claim 14, further comprising:
    - receiving a sample recording from said user to provide prosody.
  - 16. The method in accordance with claim 15, wherein said waveform is a pitch contour of said synthesized speech.
  - 17. The method in accordance with claim 10, further comprising receiving a user indication of segments of the text, SSML or extended SSML that are to be used during re-synthesis of said speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bakis, Raimo, Eide, Ellen M., Pieraccini, Roberto, Smith, Maria E., Zeng, Jie
Primary Examiner(s)
Godbold, Douglas

Application Number

US11/621,347
Publication Number

US 20080167875A1
Time in Patent Office

2,310 Days
Field of Search

704/258, 704/260
US Class Current

704/266
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/08 Text analysis or generation...

System for tuning synthesized speech

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

25 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System for tuning synthesized speech

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links