SYSTEM-EFFECTED TEXT ANNOTATION FOR EXPRESSIVE PROSODY IN SPEECH SYNTHESIS AND RECOGNITION
First Claim
1. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
- inputting text to be machine spoken to a computerized system;
system identifying of segmental units and suprasegmental units in the text;
system annotating of the text to indicate the system-identified segmental units; and
system generation of synthesized speech modulated according to the annotations in the text;
or system comparison of uttered speech with acoustic units corresponding with the annotated text to facilitate identification of text units corresponding with the uttered speech and outputting text recognized as corresponding with the uttered speech.
1 Assignment
0 Petitions
Accused Products
Abstract
The inventive system can automatically annotate the relationship of text and acoustic units for the purposes of: (a) predicting how the text is to be pronounced as expressively synthesized speech, and (b) improving the proportion of expressively uttered speech as correctly identified text representing the speaker'"'"'s message. The system can automatically annotate text corpora for relationships of uttered speech for a particular speaking style and for acoustic units in terms of context and content of the text to the utterances. The inventive system can use kinesthetically defined expressive speech production phonetics that are recognizable and controllable according to kinesensic feedback principles. In speech synthesis embodiments of the invention, the text annotations can specify how the text is to be expressively pronounced as synthesized speech. Also, acoustically-identifying features for dialects or mispronunciations can be identified so as to expressively synthesize alternative dialects or stylistic mispronunciations for a speaker from a given text. In speech recognition embodiments of the invention, each text annotation can be uniquely identified from the corresponding acoustic features of a unit of uttered speech to correctly identify the corresponding text. By employing a method of rules-based text annotation, the invention enables expressiveness to be altered to reflect syntactic, semantic, and/or discourse circumstances found in text to be synthesized or in an uttered message.
-
Citations
20 Claims
-
1. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
-
inputting text to be machine spoken to a computerized system; system identifying of segmental units and suprasegmental units in the text; system annotating of the text to indicate the system-identified segmental units; and system generation of synthesized speech modulated according to the annotations in the text;
or system comparison of uttered speech with acoustic units corresponding with the annotated text to facilitate identification of text units corresponding with the uttered speech and outputting text recognized as corresponding with the uttered speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17)
-
-
13. A computer-implemented rule-based method of recognizing speech, the method comprising:
-
inputting uttered speech to be recognized to a computerized system; system comparison of the uttered speech with acoustic units corresponding with annotated text to facilitate identification of text units corresponding with the uttered speech wherein the annotated text comprises the product of system identifying of segmental units and suprasegmental units in the text and system annotating of the text to indicate the system-identified segmental units; and outputting text recognized as corresponding with the uttered speech. - View Dependent Claims (14, 15, 16)
-
-
18. A computerized system for synthesizing speech from text received into the system, the system comprising:
-
a discourse engine configured to identify segmental units and suprasegmental units in the text and to annotate the text to indicate the system-identified segmental units; and a speech recognition module configured to compare uttered speech with acoustic units corresponding with the annotated text and configured to identify text units corresponding with the uttered speech and to output text recognized as corresponding with the uttered speech. - View Dependent Claims (19, 20)
-
Specification