Interactive debugging and tuning method for CTTS voice building

US 7,487,092 B2
Filed: 10/17/2003
Issued: 02/03/2009
Est. Priority Date: 10/17/2003
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for debugging and tuning synthesized audio, comprising the steps of:

(a) receiving a user-supplied text with a visual user interface;

(b) generating synthesized audio generated from concatenated phonetic units, the synthesized audio being a voice rendering of the user-supplied text;

(c) displaying a waveform corresponding to the synthesized audio generated from concatenated phonetic units;

(d) displaying parameters corresponding to at least one of the phonetic units, the parameters including configuration parameters comprising at least one weight for adjusting at least one search cost function, the at least one weight comprising at least one of a pitch cost weight and a duration cost weight;

(e) displaying an original recording containing a selected phonetic unit;

(f) receiving an editing input from the user;

(g) adjusting at least one configuration parameter in accordance with the editing input and storing the at least one configuration parameter in a text-to-speech engine configuration file, wherein adjusting includes repositioning a phonetic alignment marker;

(h) highlighting in the display of the original recording at least one user-selected phonetic unit;

(i) correcting elements of a text-to-speech segment dataset of parameters corresponding to a segment of the synthesized audio identified as be problematic;

(j) generating a new synthesized waveform corresponding to one or more adjusted parameters; and

(k) repeating steps (b)-(j) until a desired synthesized output is generated.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.

26 Citations

View as Search Results

7 Claims

1. A computer-implemented method for debugging and tuning synthesized audio, comprising the steps of:
- (a) receiving a user-supplied text with a visual user interface;
  
  (b) generating synthesized audio generated from concatenated phonetic units, the synthesized audio being a voice rendering of the user-supplied text;
  
  (c) displaying a waveform corresponding to the synthesized audio generated from concatenated phonetic units;
  
  (d) displaying parameters corresponding to at least one of the phonetic units, the parameters including configuration parameters comprising at least one weight for adjusting at least one search cost function, the at least one weight comprising at least one of a pitch cost weight and a duration cost weight;
  
  (e) displaying an original recording containing a selected phonetic unit;
  
  (f) receiving an editing input from the user;
  
  (g) adjusting at least one configuration parameter in accordance with the editing input and storing the at least one configuration parameter in a text-to-speech engine configuration file, wherein adjusting includes repositioning a phonetic alignment marker;
  
  (h) highlighting in the display of the original recording at least one user-selected phonetic unit;
  
  (i) correcting elements of a text-to-speech segment dataset of parameters corresponding to a segment of the synthesized audio identified as be problematic;
  
  (j) generating a new synthesized waveform corresponding to one or more adjusted parameters; and
  
  (k) repeating steps (b)-(j) until a desired synthesized output is generated.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said displaying parameters step further comprises displaying the parameters responsive to a user selection of at least a portion of the waveform, the displayed parameters correlating to the selected portion of the waveform.
  - 3. The method of claim 1, wherein said displaying parameters step further comprises identifying a portion of the waveform responsive to a user selection of at least one of the parameters, the identified portion of the waveform correlating to the selected parameters.
  - 4. The method of claim 1, wherein said adjusting step comprises at least one action selected from the group consisting of deleting a pitch mark, inserting a pitch mark, and repositioning a pitch mark by deleting a phonetic unit label, adding a phonetic unit label, modifying the phonetic unit label, and repositioning the phonetic unit boundaries.
  - 5. The method of claim 1, wherein said displaying parameters step further comprises the step of displaying a waveform from the original recording along with the phonetic unit.
  - 6. The method of claim 5, wherein edits to the waveform adjust parameters in the segment dataset.
  - 7. The method of claim 1 wherein the parameter updates and segment dataset corrections are applied in regenerating the synthesized audio.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Viswanathan, Mahesh, Zeng, Jie Z., Gleason, Philip, Smith, Maria E.
Primary Examiner(s)
Chawan; Vijay B

Application Number

US10/688,041
Publication Number

US 20050086060A1
Time in Patent Office

1,936 Days
Field of Search

704/260, 704/258, 704/270.1
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

Interactive debugging and tuning method for CTTS voice building

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive debugging and tuning method for CTTS voice building

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links