System-effected text annotation for expressive prosody in speech synthesis and recognition

US 8,175,879 B2
Filed: 08/08/2008
Issued: 05/08/2012
Est. Priority Date: 08/08/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:

inputting text to be machine spoken to a computerized system;

system identifying of segmental units and suprasegmental units in the text;

system annotating of the text to indicate the system-identified segmental units; and

system generation of synthesized speech modulated according to the annotations in the text;

wherein the system annotating of the text comprises identifying of discourse-givenness, contrastiveness, and/or cue phrase lookups to identify and annotate text with discourse prominence or discourse non-prominence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The inventive system can automatically annotate the relationship of text and acoustic units for the purposes of: (a) predicting how the text is to be pronounced as expressively synthesized speech, and (b) improving the proportion of expressively uttered speech as correctly identified text representing the speaker'"'"'s message. The system can automatically annotate text corpora for relationships of uttered speech for a particular speaking style and for acoustic units in terms of context and content of the text to the utterances. The inventive system can use kinesthetically defined expressive speech production phonetics that are recognizable and controllable according to kinesensic feedback principles. In speech synthesis embodiments of the invention, the text annotations can specify how the text is to be expressively pronounced as synthesized speech. Also, acoustically-identifying features for dialects or mispronunciations can be identified so as to expressively synthesize alternative dialects or stylistic mispronunciations for a speaker from a given text. In speech recognition embodiments of the invention, each text annotation can be uniquely identified from the corresponding acoustic features of a unit of uttered speech to correctly identify the corresponding text. By employing a method of rules-based text annotation, the invention enables expressiveness to be altered to reflect syntactic, semantic, and/or discourse circumstances found in text to be synthesized or in an uttered message.

Citations

20 Claims

1. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
- inputting text to be machine spoken to a computerized system;
  
  system identifying of segmental units and suprasegmental units in the text;
  
  system annotating of the text to indicate the system-identified segmental units; and
  
  system generation of synthesized speech modulated according to the annotations in the text;
  
  wherein the system annotating of the text comprises identifying of discourse-givenness, contrastiveness, and/or cue phrase lookups to identify and annotate text with discourse prominence or discourse non-prominence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method according to claim 1 wherein the system annotating of the text comprises identifying of consonants as sustainable or percussive, providing linking information, and adding connectives for vowels that are followed by another vowel.
  - 3. A method according to claim 1 wherein the system annotating of the text comprises identifying inflections or pitch movements of syllables based on their positions within a phrase.
  - 4. A method according to claim 1 comprising system analysis of text to be spoken to select syllables, words or other text units to receive an expressive prosody and applying the expressive prosody to the selected syllables or words.
  - 5. A method according to claim 4 wherein the system analysis of the text to be spoken comprises a linguistic analysis operable with or without semantic considerations and wherein the text to be spoken comprises one or more discourses.
  - 6. A method according to claim 5 wherein the linguistic analysis comprises identifying and annotating operative words in a discourse for acoustic modulation of the lexical pronunciations of the operative words by applying a suitable intonation or intonations to give the operative words prominence.
  - 7. A method according to claim 1 wherein system annotating of the text employs grapheme-phoneme pairs to identify relationships between text units and corresponding acoustic units wherein each grapheme-phoneme pair comprises a visible prosodic-indicating grapheme corresponding to the text to be pronounced and a corresponding phoneme or phonemes, the phoneme or phonemes being functional in the digital domain.
  - 8. A method according to claim 1 wherein the system annotating of the text comprises dividing of a text sentence into groups of meanings and indicating the locations of long and short pauses, employing punctuations, phrase length, syntactic constituency, and balance.
  - 9. A method according to claim 8 wherein the system annotating of the text comprises identifying, for each phrase in the text, an operative word introducing a new idea to carry the argument forward as the sentences progress, the method employing discourse properties, semantic properties, and/or syntactic properties of relevant words.
  - 10. A method according to claim 9 wherein the system annotating of the text comprises representing intonation contours on a pitch change scale encompassing the pitch range of the speech to be synthesized or recognized.
  - 11. A method according to claim 10 wherein the system analysis of the text to be spoken comprises a linguistic analysis and the linguistic analysis comprises identifying and annotating operative words in a discourse for acoustic modulation of the lexical pronunciations of the operative words by applying a suitable intonation or intonations to give the operative words prominence.

12. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
- inputting text to be machine spoken to a computerized system;
  
  system identifying of segmental units and suprasegmental units in the text;
  
  system annotating of the text to indicate the system-identified segmental units; and
  
  system generation of synthesized speech modulated according to the annotations in the text;
  
  wherein the system annotating of the text comprises dividing of a text sentence into groups of meanings and indicating the locations of long and short pauses, employing syntactic constituency, and balance.

13. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
- inputting text to be machine spoken to a computerized system;
  
  system identifying of segmental units and suprasegmental units in the text;
  
  system annotating of the text to indicate the system-identified segmental units; and
  
  system generation of synthesized speech modulated according to the annotations in the text;
  
  wherein the system annotating of the text comprises identifying, for each phrase in the text an operative word introducing a new idea to carry the argument forward as the sentences progress, the method employing discourse properties, semantic properties, and/or syntactic properties of relevant words.
- View Dependent Claims (14)
- - 14. A method according to claim 13 wherein system annotating of the text employs grapheme-phoneme pairs to identify relationships between text units and corresponding acoustic units wherein each grapheme-phoneme pair comprises a visible prosodic-indicating grapheme corresponding to the text to be pronounced and a corresponding phoneme or phonemes, the phoneme or phonemes being functional in the digital domain.

15. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
- inputting text to be machine spoken to a computerized system;
  
  system identifying of segmental units and suprasegmental units in the text;
  
  system annotating of the text to indicate the system-identified segmental units; and
  
  system generation of synthesized speech modulated according to the annotations in the text;
  
  wherein the system annotating of the text comprises representing intonation contours on a pitch change scale encompassing the pitch range of the speech to be synthesized or recognized.
- View Dependent Claims (16)
- - 16. A method according to claim 15 wherein the system annotating of the text comprises identifying or applying intonational prominence according to whether a word is an operative or a non-operative word, a content or a function word, a monosyllabic or a polysyllabic word, or whether the syllable carries a primary, a secondary, or no lexical stress, and whether a syllable precedes a syllable with higher pitch.

17. A computer-implemented rule-based method of recognizing speech, the method comprising:
- inputting uttered speech to be recognized to a computerized system;
  
  system comparison of the uttered speech with acoustic units corresponding with annotated text to facilitate identification of text units corresponding with the uttered speech wherein the annotated text comprises the product of system identifying of segmental units and suprasegmental units in the text and system annotating of the text to indicate the system-identified segmental units; and
  
  outputting text recognized as corresponding with the uttered speech.
- View Dependent Claims (18, 19, 20)
- - 18. A method according to claim 17 comprising identifying prosody-related acoustic features of a speech unit in the uttered speech input and employing the prosody-related acoustic features to facilitate identification of a corresponding text unit.
  - 19. A method according to claim 17 comprising system annotating each text unit, with coding or markings, to show the identified prosody-related acoustic feature visually in the output text, employing a rules-based method of annotation to relate the annotations to the prosody-related acoustic features and/or to facilitate identification of a corresponding text unit.
  - 20. A method according to claim 17 wherein the rules comprise rules to indicate an appropriate prosodic modulation of the lexical pronunciation of a syllable, word or other speech unit according to the context of the speech unit in a discourse containing the speech unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lessac Technologies, Inc.
Original Assignee
Lessac Technologies, Inc.
Inventors
Nitisaroj, Rattima, Marple, Gary, Chandra, Nishant
Primary Examiner(s)
Abebe, Daniel D

Application Number

US12/188,763
Publication Number

US 20090048843A1
Time in Patent Office

1,369 Days
Field of Search

704/251, 704/258, 704/260
US Class Current

704/260
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

G10L 15/1807 using prosody or stress

System-effected text annotation for expressive prosody in speech synthesis and recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System-effected text annotation for expressive prosody in speech synthesis and recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links