TTS and prosody based authoring system

US 6,081,780 A
Filed: 04/28/1998
Issued: 06/27/2000
Est. Priority Date: 04/28/1998
Status: Expired due to Fees

First Claim

Patent Images

1. An information signal content authoring system, comprising:

a speech analyzer, responsive to a spoken utterance signal provided by a narrator, the spoken utterance signal being representative of information available to the narrator, the speech analyzer generating a speech signal representative of at least one prosodic parameter associated with the narrator;

a text-to-speech converter, responsive to a text signal representative of the information available to the narrator, the converter generating a phonetic representation signal from the text signal and synthesizing a speech signal from the text signal, the text-to-speech converter also generating at least one prosodic parameter from the text signal;

a spectrum comparator, operatively coupled to the speech analyzer and the text-to-speech converter, for comparing the at least one prosodic parameter of the speech signal generated by the speech analyzer to the speech signal synthesized by the converter and generating a variance signal indicative of a spectral distance between the two speech signals, the variance signal being provided to the text-to-speech converter to adjust the at least one prosodic parameter; and

an output portion, operatively coupled to the text-to-speech converter, for outputting the phonetic representation signal and the at least one prosodic parameter from the converter as a composite encoded signal representative of information content available to the narrator.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information signal content authoring system is provided. The authoring system includes a speech analyzer, responsive to a spoken utterance signal provided by a narrator. The spoken utterance signal is representative of information available to the narrator. The speech analyzer generates a speech signal representative of one or more prosodic parameters associated with the narrator. A text-to-speech converter, responsive to a text signal representative of the information available to the narrator, generates a phonetic representation signal from the text signal and synthesizes a speech signal from the text signal. The text-to-speech converter also generates one or more prosodic parameters from the text signal. A spectrum comparator, operatively coupled to the speech analyzer and the text-to-speech converter, compares the spectral parameters of the speech signal generated by the speech analyzer to the speech signal synthesized by the converter and generates a variance signal indicative of a spectral distance between the two speech signals. The variance signal is provided to the text-to-speech converter to adjust the prosodic parameters. An output portion, operatively coupled to the text-to-speech converter, outputs the phonetic representation signal and the prosodic parameters from the converter as a composite encoded signal representative of information content available to the narrator. The output portion further preferably includes an editor, response to editing commands issued by the narrator, for editing at least a portion of the composite encoded signal.

487 Citations

12 Claims

1. An information signal content authoring system, comprising:
- a speech analyzer, responsive to a spoken utterance signal provided by a narrator, the spoken utterance signal being representative of information available to the narrator, the speech analyzer generating a speech signal representative of at least one prosodic parameter associated with the narrator;
  
  a text-to-speech converter, responsive to a text signal representative of the information available to the narrator, the converter generating a phonetic representation signal from the text signal and synthesizing a speech signal from the text signal, the text-to-speech converter also generating at least one prosodic parameter from the text signal;
  
  a spectrum comparator, operatively coupled to the speech analyzer and the text-to-speech converter, for comparing the at least one prosodic parameter of the speech signal generated by the speech analyzer to the speech signal synthesized by the converter and generating a variance signal indicative of a spectral distance between the two speech signals, the variance signal being provided to the text-to-speech converter to adjust the at least one prosodic parameter; and
  
  an output portion, operatively coupled to the text-to-speech converter, for outputting the phonetic representation signal and the at least one prosodic parameter from the converter as a composite encoded signal representative of information content available to the narrator.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the output portion further comprises editing means, response to editing commands issued by the narrator, for editing at least a portion of the composite encoded signal.
  - 3. The system of claim 1, wherein the text-to-speech converter further comprises:
    - a text-to-phonemes converter for converting the text signal to the phonetic representation signal;
      
      a phonemes-to-allophones converter, operatively coupled to the text-to-speech converter, for converting phonemes associated with the phonetic representation signal to allophones;
      
      storage means, operatively coupled to the phonemes-to-allophones converter, for storing previously recorded allophones characteristic of at least one narration voice and for selecting the previously recorded allophones which substantially match the allophones provided by the allophone converter; and
      
      speech synthesizing means, operatively coupled to the storage means, for generating the synthesized speech signal representative of the text signal in response to the previously recorded allophones from the storage means.
  - 4. The system of claim 3, wherein the storage means of the text-to-speech conversion means further comprises a plurality of dictionaries of previously recorded allophones, each dictionary corresponding to a different narration voice, and means for switching between the dictionaries, at the request of the narrator, such that allophones from the selected dictionary are provided to the synthesizing means.
  - 5. The system of claim 1, further comprising audio feedback means, operatively coupled to the text-to-speech converter, for providing the narrator with an audio feedback of the synthesized speech signal.
  - 6. The system of claim 5, further comprising means for mixing a background audio signal with the synthesized speech signal feedback to the narrator.

7. An information signal content authoring system, comprising:
- speech analysis means, responsive to a spoken utterance signal provided by a narrator, the spoken utterance signal being representative of information available to the narrator, the speech analysis means generating a speech signal representative of at least one prosodic parameter associated with the narrator;
  
  text-to-speech conversion means, responsive to a text signal representative of the information available to the narrator, the conversion means generating a phonetic representation signal from the text signal and synthesizing a speech signal from the text signal, the text-to-speech conversion means also generating at least one prosodic parameter from the text signal;
  
  comparing means, operatively coupled to the speech analysis means and the conversion means, for comparing the at least one prosodic parameter of the speech signal generated by the speech analysis means to the speech signal synthesized by the conversion means and generating a variance signal indicative of a spectral distance between the two speech signals, the variance signal being provided to the text-to-speech conversion means to adjust the at least one prosodic parameter; and
  
  output means, operatively coupled to the speech analysis means, the conversion means and the comparing means, for outputting the phonetic representation signal and the at least one prosodic parameter from the conversion means as a composite encoded signal representative of information content available to the narrator.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the output means further comprises editing means, response to editing commands issued by the narrator, for editing at least a portion of the composite encoded signal.
  - 9. The system of claim 7, wherein the text-to-speech conversion means further comprises:
    - a text-to-phonemes converter for converting the text signal to the phonetic representation signal;
      
      a phonemes-to-allophones converter, operatively coupled to the text-to-speech converter, for converting phonemes associated with the phonetic representation signal to allophones;
      
      storage means, operatively coupled to the phonemes-to-allophones converter, for storing previously recorded allophones characteristic of at least one narration voice and for selecting the previously recorded allophones which substantially match the allophones provided by the allophone converter; and
      
      speech synthesizing means, operatively coupled to the storage means, for generating the synthesized speech signal representative of the text signal in response to the previously recorded allophones from the storage means.
  - 10. The system of claim 9, wherein the storage means of the text-to-speech conversion means further comprises a plurality of dictionaries of previously recorded allophones, each dictionary corresponding to a different narration voice, and means for switching between the dictionaries, at the request of the narrator, such that allophones from the selected dictionary are provided to the synthesizing means.
  - 11. The system of claim 7, further comprising audio feedback means, operatively coupled to the text-to-speech conversion means, for providing the narrator with an audio feedback of the synthesized speech signal.
  - 12. The system of claim 11, further comprising means for mixing a background audio signal with the synthesized speech signal feedback to the narrator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Lumelsky, Leon
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/067,526
Time in Patent Office

791 Days
Field of Search

704/260, 704/270, 704/235, 704/272, 704/275
US Class Current

704/260
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 19/00   Speech or audio signals ana...

H04M 2201/60   Medium conversion

H04M 3/487   Arrangements for providing ...

H04M 7/12   for working between exchang...

TTS and prosody based authoring system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

487 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

TTS and prosody based authoring system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

487 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links