Method and apparatus for speech synthesis based on prosodic analysis

US 5,384,893 A
Filed: 09/23/1992
Issued: 01/24/1995
Est. Priority Date: 09/23/1992
Status: Expired due to Fees

First Claim

Patent Images

1. A system for synthesizing a speech signal from strings of words, comprising:

means for entering into the system strings of characters comprising words;

a first memory, wherein predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags;

parsing means, in communication with the entering means and the first memory, for grouping syntax tags of entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another and for verifying the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another, wherein the sequences of the phrases correspond to the entered words;

first means, in communication with the parsing means, for retrieving from the first memory the phonetic transcriptions associated with the syntax tags grouped into phrases conforming to the second set of rules, for translating predetermined strings of entered characters into words, and for generating strings of phonetic transcriptions and prosody markers corresponding to respective strings of entered and translated words;

second means, in communication with the first means, for adding markers for rhythm and stress to the strings of phonetic transcriptions and prosody markers and for converting the strings of phonetic transcriptions and prosody markers into arrays having prosody information on a diphone-by-diphone basis;

a second memory, wherein predetermined diphone waveforms are stored; and

third means, in communication with the second means and the second memory, for retrieving diphone waveforms corresponding to the entered and translated words from the second memory, for adjusting the retrieved diphone waveforms based on the prosody information in the arrays, and for concatenating the adjusted diphone waveforms to form the speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for synthesizing a speech signal from strings of words, which are themselves strings of characters, includes a memory in which predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags. A parser accesses the memory and groups the syntax tags of the entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another. The parser also verifies the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another. The system retrieves the phonetic transcriptions associated with the syntax tags that were grouped into phrases conforming to the second set of rules, and also translates predetermined strings of characters into words. The system generates strings of phonetic transcriptions and prosody markers corresponding to respective strings of the words, and adds markers for rhythm and stress to the strings, which are then converted into data arrays having prosody information on a diphone-by-diphone basis. Predetermined diphone waveforms are retrieved from memory that correspond to the entered words, and these retrieved waveforms are adjusted based on the prosody information in the arrays. The adjusted diphone waveforms, which may also be adjusted for coarticulation, are then concatenated to form the speech signal. Methods in a digital computer are also disclosed.

Citations

11 Claims

1. A system for synthesizing a speech signal from strings of words, comprising:
- means for entering into the system strings of characters comprising words;
  
  a first memory, wherein predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags;
  
  parsing means, in communication with the entering means and the first memory, for grouping syntax tags of entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another and for verifying the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another, wherein the sequences of the phrases correspond to the entered words;
  
  first means, in communication with the parsing means, for retrieving from the first memory the phonetic transcriptions associated with the syntax tags grouped into phrases conforming to the second set of rules, for translating predetermined strings of entered characters into words, and for generating strings of phonetic transcriptions and prosody markers corresponding to respective strings of entered and translated words;
  
  second means, in communication with the first means, for adding markers for rhythm and stress to the strings of phonetic transcriptions and prosody markers and for converting the strings of phonetic transcriptions and prosody markers into arrays having prosody information on a diphone-by-diphone basis;
  
  a second memory, wherein predetermined diphone waveforms are stored; and
  
  third means, in communication with the second means and the second memory, for retrieving diphone waveforms corresponding to the entered and translated words from the second memory, for adjusting the retrieved diphone waveforms based on the prosody information in the arrays, and for concatenating the adjusted diphone waveforms to form the speech signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The synthesizing system of claim 1, wherein the first means interprets punctuation characters in the entered character strings as requiring various amounts of pausing, deduces differences between entered character strings having declarative, exclamatory, and interrogative punctuation characters, and places the deduced differences in the strings of phonetic transcriptions and prosody markers.
  - 3. The synthesizing system of claim 2, wherein the first means generates and places markers for starting and ending predetermined types of clauses in a synth log.
  - 4. The synthesizing system of claim 1, wherein the second means adds extra pauses after highly stressed entered words, adjusts duration before and stress following predetermined punctuation characters in the entered character strings, and adjusts rhythm by adding marks for more or less duration onto phonetic transcriptions corresponding to selected syllables of the entered words based on a stress pattern of the selected syllables.
  - 5. The synthesizing system of claim 1, wherein the third means adjusts the retrieved diphone waveforms for coarticulation.
  - 6. The synthesizing system of claim 1, wherein the parsing means verifies the conformance of a plurality of parallel sequences of phrases and phrase combinations to the second set of grammatical rules, each of the plurality of parallel sequences comprising a respective one of the sequences possible for the entered words.

7. In a computer, a method for synthesizing a speech signal by processing natural language sentences, each sentence having at least one word, comprising the steps of:
- entering a sentence;
  
  storing the entered sentence;
  
  finding syntax tags associated with the words of the stored entered sentence in a word dictionary;
  
  finding in a phrase table non-terminals associated with the syntax tags associated with the entered words as each word is entered;
  
  tracking, in parallel as the words are entered, a plurality of possible sequences of the found non-terminals;
  
  verifying the conformance of sequences of the found non-terminals to rules associated with predetermined sequences of non-terminals;
  
  retrieving, from the word dictionary, phonetic transcriptions associated with the syntax tags of the entered words of one of the sequences conforming to the rules;
  
  generating a string of phonetic transcriptions and prosody markers corresponding to the entered words of said one sequence;
  
  adding markers for rhythm and stress to the string of phonetic transcriptions and prosody markers and converting said string into arrays having prosody information on a diphone-by-diphone basis;
  
  retrieving, from a memory wherein predetermined diphone waveforms are stored, diphone waveforms corresponding to said string and the entered words of said one sequence;
  
  adjusting the retrieved diphone waveforms based on the prosody information in the arrays; and
  
  concatenating the adjusted diphone waveforms to form the speech signal.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The synthesizing method of claim 7, wherein the generating step comprises the steps of interpreting punctuation characters in the entered sentence as requiring corresponding amounts of pausing, deducing differences between declarative, exclamatory, and interrogative sentences, and placing the deduced differences in the string of phonetic transcriptions and prosody markers.
  - 9. The synthesizing method of claim 8, wherein the generating step includes placing markers for starting and ending predetermined types of clauses in a synth log.
  - 10. The synthesizing method of claim 7, wherein the adding step comprises the steps of adding extra pauses after highly stressed entered words, adjusting duration before and stress following predetermined punctuation characters in the entered sentence, and adjusting rhythm by adding marks for more or less duration onto phonetic transcriptions corresponding to selected syllables of the entered words based on a stress pattern of the selected syllables.
  - 11. The synthesizing method of claim 7, wherein the adjusting step comprises adjusting the retrieved diphone waveforms for coarticulation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emerson & Stern Associates, Inc.
Original Assignee
Emerson & Stern Associates, Inc.
Inventors
Hutchins, Sandra E.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/949,208
Time in Patent Office

853 Days
Field of Search

381/51-53, 395/2.67-2.78
US Class Current

704/267
CPC Class Codes

G10L 13/08 Text analysis or generation...

G10L 13/10 Prosody rules derived from ...

Method and apparatus for speech synthesis based on prosodic analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speech synthesis based on prosodic analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links