SYSTEM-EFFECTED TEXT ANNOTATION FOR EXPRESSIVE PROSODY IN SPEECH SYNTHESIS AND RECOGNITION

US 20090048843A1
Filed: 08/08/2008
Published: 02/19/2009
Est. Priority Date: 08/08/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:

inputting text to be machine spoken to a computerized system;

system identifying of segmental units and suprasegmental units in the text;

system annotating of the text to indicate the system-identified segmental units; and

system generation of synthesized speech modulated according to the annotations in the text;

or system comparison of uttered speech with acoustic units corresponding with the annotated text to facilitate identification of text units corresponding with the uttered speech and outputting text recognized as corresponding with the uttered speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The inventive system can automatically annotate the relationship of text and acoustic units for the purposes of: (a) predicting how the text is to be pronounced as expressively synthesized speech, and (b) improving the proportion of expressively uttered speech as correctly identified text representing the speaker'"'"'s message. The system can automatically annotate text corpora for relationships of uttered speech for a particular speaking style and for acoustic units in terms of context and content of the text to the utterances. The inventive system can use kinesthetically defined expressive speech production phonetics that are recognizable and controllable according to kinesensic feedback principles. In speech synthesis embodiments of the invention, the text annotations can specify how the text is to be expressively pronounced as synthesized speech. Also, acoustically-identifying features for dialects or mispronunciations can be identified so as to expressively synthesize alternative dialects or stylistic mispronunciations for a speaker from a given text. In speech recognition embodiments of the invention, each text annotation can be uniquely identified from the corresponding acoustic features of a unit of uttered speech to correctly identify the corresponding text. By employing a method of rules-based text annotation, the invention enables expressiveness to be altered to reflect syntactic, semantic, and/or discourse circumstances found in text to be synthesized or in an uttered message.

Citations

20 Claims

1. A computer-implemented rule-based method of synthesizing speech from text, the method comprising:
- inputting text to be machine spoken to a computerized system;
  
  system identifying of segmental units and suprasegmental units in the text;
  
  system annotating of the text to indicate the system-identified segmental units; and
  
  system generation of synthesized speech modulated according to the annotations in the text;
  
  or system comparison of uttered speech with acoustic units corresponding with the annotated text to facilitate identification of text units corresponding with the uttered speech and outputting text recognized as corresponding with the uttered speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17)
- - 2. A method according to claim 1 wherein the system annotating of the text comprises identifying of discourse-givenness, contrastiveness, and/or cue phrase lookups to identify and annotate text with discourse prominence or discourse non-prominence.
  - 3. A method according to claim 1 wherein the system annotating of the text comprises dividing of a text sentence into groups of meanings and indicating the locations of long and short pauses, employing, optionally, from first to last priority, punctuations, phrase length, high-level syntactic constituency, and balance.
  - 4. A method according to claim 1 wherein the system annotating of the text comprises identifying, for each phrase in the text, a word, optionally an operative word, that introduces a new idea that carries the argument forward as the sentences progress, the method employing discourse, semantic, and/or syntactic properties of relevant words.
  - 5. A method according to claim 1 wherein the system annotating of the text comprises identifying of consonants as sustainable or percussive, providing linking information, optionally direct linking, play and link, and/or prepare and link information, and adding connectives for vowels that are followed by another vowel.
  - 6. A method according to claim 1 wherein the system annotating of the text comprises representing intonation contours on a multi-level scale encompassing the pitch range of the speech to be synthesized or recognized.
  - 7. A method according to claim 1 wherein the system annotating of the text comprises identifying or applying intonational prominence according to whether a word is an operative or a non-operative word, a content or a function word, a monosyllabic or a polysyllabic word, or whether the syllable carries a primary, a secondary, or no lexical stress, and whether a syllable precedes a syllable with higher pitch.
  - 8. A method according to claim 1 wherein the system annotating of the text comprises identifying inflections or pitch movements of syllables based on their positions within a phrase.
  - 9. A method according to claim 1 comprising system analysis of text to be spoken to select syllables, words or other text units to receive an expressive prosody and applying the expressive prosody to the selected syllables or words.
  - 10. A method according to claim 9 wherein the system analysis of the text to be spoken comprises a linguistic analysis operable with or without semantic considerations and, optionally, wherein the text to be spoken comprises one or more discourses.
  - 11. A method according to claim 10 comprising system linguistic analysis identifying and annotating operative words in a discourse for acoustic modulation of the lexical pronunciations of the operative words, optionally by applying a suitable intonation or intonations to give the operative words prominence.
  - 12. A method according to claim 1 wherein system annotating of the text comprises employing grapheme-phoneme pairs to identify relationships between text units and corresponding acoustic units wherein each grapheme-phoneme pair comprises a visible prosodic-indicating grapheme corresponding to the text to be pronounced and corresponding phonemes, the phonemes being functional in the digital domain wherein, optionally, each grapheme pair comprises a single grapheme and a single phoneme.
  - 17. A computerized system comprising software to implement the method of claim 1.

13. A computer-implemented rule-based method of recognizing speech, the method comprising:
- inputting uttered speech to be recognized to a computerized system;
  
  system comparison of the uttered speech with acoustic units corresponding with annotated text to facilitate identification of text units corresponding with the uttered speech wherein the annotated text comprises the product of system identifying of segmental units and suprasegmental units in the text and system annotating of the text to indicate the system-identified segmental units; and
  
  outputting text recognized as corresponding with the uttered speech.
- View Dependent Claims (14, 15, 16)
- - 14. A method according to claim 13 comprising identifying prosody-related acoustic features of a speech unit in the uttered speech input and employing the prosody-related acoustic features to facilitate identification of a corresponding text unit.
  - 15. A method according to claim 13 comprising system annotating each text unit, optionally with coding or markings, to show the identified prosody-related acoustic feature visually in the output text, optionally employing a rules-based method of annotation to relate the annotations to the prosody-related acoustic features and/or to facilitate identification of a corresponding text unit.
  - 16. A method according to claim 13 wherein the rules comprise rules to indicate an appropriate prosodic modulation of the lexical pronunciation of a syllable, word or other speech unit according to the context of the speech unit in a discourse containing the speech unit.

18. A computerized system for synthesizing speech from text received into the system, the system comprising:
- a discourse engine configured to identify segmental units and suprasegmental units in the text and to annotate the text to indicate the system-identified segmental units; and
  
  a speech recognition module configured to compare uttered speech with acoustic units corresponding with the annotated text and configured to identify text units corresponding with the uttered speech and to output text recognized as corresponding with the uttered speech.
- View Dependent Claims (19, 20)
- - 19. A computerized system according to claim 18 wherein the discourse engine is configured to identify discourse-givenness, contrastiveness, and/or cue phrase lookups and to identify and annotate text with discourse prominence or discourse non-prominence.
  - 20. A computerized system according to claim 18 comprising one or more components selected from the group consisting of:
    - a pronunciation dictionary comprising consonant and vowel units and lexical stress;
      
      a phrasing module configured to divide a text sentence into groups of meanings and indicating the locations of long and short pauses, employing, optionally, from first to last priority, punctuations, phrase length, high-level syntactic constituency, and balance;
      
      an operative word engine configured to identify, for each phrase in the text, a word, optionally an operative word, that introduces a new idea that carries the argument forward as the sentences progress, by reference to discourse, semantic, and/or syntactic properties of relevant words;
      
      a link-and-play engine configured to identify consonants as sustainable or percussive, to provide linking information, optionally direct linking, play and link, and/or prepare and link information, and to add connectives for vowels that are followed by another vowel;
      
      an intonation engine configured to represent intonation contours on a multi-level scale encompassing the pitch range of the speech to be synthesized or recognized and optionally to identify or apply intonational prominence according to whether a word is an operative or a non-operative word, a content or a function word, a monosyllabic or a polysyllabic word, or whether the syllable carries a primary, a secondary, or no lexical stress, and whether a syllable precedes a syllable with higher pitch; and
      
      an inflection module configured to identify inflections or pitch movements of syllables based on their positions within a phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lessac Technologies, Inc.
Original Assignee
Lessac Technologies, Inc.
Inventors
Chandra, Nishant, NITISAROJ, Rattima, Marple, Gary

Granted Patent

US 8,175,879 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

G10L 15/1807 using prosody or stress

SYSTEM-EFFECTED TEXT ANNOTATION FOR EXPRESSIVE PROSODY IN SPEECH SYNTHESIS AND RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM-EFFECTED TEXT ANNOTATION FOR EXPRESSIVE PROSODY IN SPEECH SYNTHESIS AND RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links