TTS and prosody based authoring system
First Claim
1. An information signal content authoring system, comprising:
- a speech analyzer, responsive to a spoken utterance signal provided by a narrator, the spoken utterance signal being representative of information available to the narrator, the speech analyzer generating a speech signal representative of at least one prosodic parameter associated with the narrator;
a text-to-speech converter, responsive to a text signal representative of the information available to the narrator, the converter generating a phonetic representation signal from the text signal and synthesizing a speech signal from the text signal, the text-to-speech converter also generating at least one prosodic parameter from the text signal;
a spectrum comparator, operatively coupled to the speech analyzer and the text-to-speech converter, for comparing the at least one prosodic parameter of the speech signal generated by the speech analyzer to the speech signal synthesized by the converter and generating a variance signal indicative of a spectral distance between the two speech signals, the variance signal being provided to the text-to-speech converter to adjust the at least one prosodic parameter; and
an output portion, operatively coupled to the text-to-speech converter, for outputting the phonetic representation signal and the at least one prosodic parameter from the converter as a composite encoded signal representative of information content available to the narrator.
1 Assignment
0 Petitions
Accused Products
Abstract
An information signal content authoring system is provided. The authoring system includes a speech analyzer, responsive to a spoken utterance signal provided by a narrator. The spoken utterance signal is representative of information available to the narrator. The speech analyzer generates a speech signal representative of one or more prosodic parameters associated with the narrator. A text-to-speech converter, responsive to a text signal representative of the information available to the narrator, generates a phonetic representation signal from the text signal and synthesizes a speech signal from the text signal. The text-to-speech converter also generates one or more prosodic parameters from the text signal. A spectrum comparator, operatively coupled to the speech analyzer and the text-to-speech converter, compares the spectral parameters of the speech signal generated by the speech analyzer to the speech signal synthesized by the converter and generates a variance signal indicative of a spectral distance between the two speech signals. The variance signal is provided to the text-to-speech converter to adjust the prosodic parameters. An output portion, operatively coupled to the text-to-speech converter, outputs the phonetic representation signal and the prosodic parameters from the converter as a composite encoded signal representative of information content available to the narrator. The output portion further preferably includes an editor, response to editing commands issued by the narrator, for editing at least a portion of the composite encoded signal.
487 Citations
12 Claims
-
1. An information signal content authoring system, comprising:
-
a speech analyzer, responsive to a spoken utterance signal provided by a narrator, the spoken utterance signal being representative of information available to the narrator, the speech analyzer generating a speech signal representative of at least one prosodic parameter associated with the narrator; a text-to-speech converter, responsive to a text signal representative of the information available to the narrator, the converter generating a phonetic representation signal from the text signal and synthesizing a speech signal from the text signal, the text-to-speech converter also generating at least one prosodic parameter from the text signal; a spectrum comparator, operatively coupled to the speech analyzer and the text-to-speech converter, for comparing the at least one prosodic parameter of the speech signal generated by the speech analyzer to the speech signal synthesized by the converter and generating a variance signal indicative of a spectral distance between the two speech signals, the variance signal being provided to the text-to-speech converter to adjust the at least one prosodic parameter; and an output portion, operatively coupled to the text-to-speech converter, for outputting the phonetic representation signal and the at least one prosodic parameter from the converter as a composite encoded signal representative of information content available to the narrator. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An information signal content authoring system, comprising:
-
speech analysis means, responsive to a spoken utterance signal provided by a narrator, the spoken utterance signal being representative of information available to the narrator, the speech analysis means generating a speech signal representative of at least one prosodic parameter associated with the narrator; text-to-speech conversion means, responsive to a text signal representative of the information available to the narrator, the conversion means generating a phonetic representation signal from the text signal and synthesizing a speech signal from the text signal, the text-to-speech conversion means also generating at least one prosodic parameter from the text signal; comparing means, operatively coupled to the speech analysis means and the conversion means, for comparing the at least one prosodic parameter of the speech signal generated by the speech analysis means to the speech signal synthesized by the conversion means and generating a variance signal indicative of a spectral distance between the two speech signals, the variance signal being provided to the text-to-speech conversion means to adjust the at least one prosodic parameter; and output means, operatively coupled to the speech analysis means, the conversion means and the comparing means, for outputting the phonetic representation signal and the at least one prosodic parameter from the conversion means as a composite encoded signal representative of information content available to the narrator. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification