System and method for correcting errors when generating a TTS voice

US 7,742,921 B1
Filed: 09/27/2005
Issued: 06/22/2010
Est. Priority Date: 09/27/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method of enabling human workers to find errors when developing a text-to-speech (TTS) voice, the method comprising:

presenting via a processor a graphical user interface, wherein after a first pass of automatic speech recognition (ASR) of a speech corpus is complete, the interface presents to a worker a graphical representation of an alignment of the ASR results, associated words and phonemes and the audio;

color-coding via the processor each word based on a composition of the color-coding associated with each phoneme;

receiving via the processor a graphical input from the worker associated with a selection of a word or phoneme; and

presenting via the processor the audio associated with the selected word or phoneme.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are various innovations associated with a toolkit used for generating a TTS voice for use in a spoken dialog system. The inventions in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of enabling human workers to find errors when developing a text-to-speech (TTS) voice. The method comprises presenting a graphical user interface wherein after a first pass of automatic speech recognition (ASR) of a speech corpus is complete, the interface presents to a worker a graphical representation of an alignment of the ASR results, associated words and phonemes and the audio, receiving a graphical input from the worker associated with a selection of a word or phoneme and presenting the audio associated with the selected word or phoneme.

42 Citations

View as Search Results

18 Claims

1. A method of enabling human workers to find errors when developing a text-to-speech (TTS) voice, the method comprising:
- presenting via a processor a graphical user interface, wherein after a first pass of automatic speech recognition (ASR) of a speech corpus is complete, the interface presents to a worker a graphical representation of an alignment of the ASR results, associated words and phonemes and the audio;
  
  color-coding via the processor each word based on a composition of the color-coding associated with each phoneme;
  
  receiving via the processor a graphical input from the worker associated with a selection of a word or phoneme; and
  
  presenting via the processor the audio associated with the selected word or phoneme.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - color-coding each phoneme according to a confidence score.
  - 3. The method of claim 1, further comprising:
    - color-coding each word according to a confidence score.
  - 4. The method of claim 1, further comprising presenting only words and phonemes to the worker that have confidence scores below a certain threshold.
  - 5. The method of claim 4, further comprising:
    - receiving a selection of at least one word or phoneme from the worker; and
      
      presenting a text transcription and corresponding audio to the worker for the selected word or phoneme.
  - 6. The method of claim 4, further comprising presenting a listing of transcriptions to the worker associated with the presented words and phonemes.
  - 7. The method of claim 1, further comprising presenting a spectrogram associated with the selected word or phoneme.
  - 8. The method of claim 1, further comprising:
    - receiving an indication of an ASR mistake from the worker;
      
      correcting speaker dependent entries associated with the mistake; and
      
      rerunning ASR on all utterances containing the word or phoneme associated with the mistake.

9. A tangible computer-readable storage medium storing instructions for controlling a computing device to enable human workers to find errors when developing a text-to-speech (TTS) voice, the instructions comprising:
- presenting a graphical user interface wherein after a first pass of automatic speech recognition (ASR) of a speech corpus is complete, the interface presents to a worker a graphical representation of an alignment of the ASR results, associated words and phonemes and the audio;
  
  color-coding each word based on a composition of the color-coding associated with each phoneme;
  
  receiving a graphical input from the worker associated with a selection of a word or phoneme; and
  
  presenting the audio associated with the selected word or phoneme.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The tangible computer-readable storage medium of claim 9, the instructions further comprising:
    - color-coding each phoneme according to a confidence score.
  - 11. The tangible computer-readable storage medium of claim 9, the instructions further comprising:
    - color-coding each word according to a confidence score.
  - 12. The tangible computer-readable storage medium of claim 9, the instructions further comprising presenting only words and phonemes to the worker that have confidence scores below a certain threshold.
  - 13. The tangible computer-readable storage medium of claim 12, the instructions further comprising:
    - receiving a selection of at least one word or phoneme from the worker; and
      
      presenting a text transcription and corresponding audio to the worker for the selected word or phoneme.
  - 14. The tangible computer-readable storage medium of claim 12, the instructions further comprising presenting a listing of transcriptions to the worker associated with the presented words and phonemes.
  - 15. The tangible computer-readable storage medium of claim 9, the instructions further comprising presenting a spectrogram associated with the selected word or phoneme.
  - 16. The tangible computer-readable storage medium of claim 9, the instructions further comprising:
    - receiving an indication of an ASR mistake from the worker;
      
      correcting speaker dependent entries associated with the mistake; and
      
      rerunning ASR on all utterances containing the word or phoneme associated with the mistake.

17. A computing device for enabling human workers to find errors when developing a text-to-speech (TTS) voice, the computing device comprising:
- a processor;
  
  a module configured to control the processor to present a graphical user interface wherein after a first pass of automatic speech recognition (ASR) of a speech corpus is complete, the interface presents to a worker a graphical representation of an alignment of the ASR results, associated words and phonemes and the audio;
  
  a module configured to control the processor to color-code each word based on a composition of the color-coding associated with each phoneme;
  
  a module configured to control the processor to receive a graphical input from the worker associated with a selection of a word or phoneme; and
  
  a module configured to control the processor to present the audio associated with the selected word or phoneme.
- View Dependent Claims (18)
- - 18. The computing device of claim 17, further comprising:
    - a module configured to control the processor to color-code each phoneme according to a confidence score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Davis, Steven Lawrence, Schulz, David Eugene, Loney, Louise, Gustafson, Beverly, Fetters, Shane
Primary Examiner(s)
Hudspeth, David R
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US11/235,821
Time in Patent Office

1,729 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

G10L 21/06 Transformation of speech in...

System and method for correcting errors when generating a TTS voice

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

System and method for correcting errors when generating a TTS voice

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others