Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation

US 5,652,828 A
Filed: 03/01/1996
Issued: 07/29/1997
Est. Priority Date: 03/19/1993
Status: Expired due to Term

First Claim

Patent Images

1. A method of synthesizing human audible speech from restricted text having a predetermined information content and predetermined format characteristics, the method comprising the steps of:

generating prosody indica for the restricted text as a function of the predetermined information content and predetermined format characteristics by performing the steps of;

a) identifying major prosodic groupings within the restricted text by utilizing major demarcation features which are a function of the predetermined format characteristics to define the beginning and end of the major prosodic groupings;

b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the restricted text as a function of the predetermined information content for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;

c) identifying within the prosodic subgroupings prosodically separable subgroup components;

d) generating prosodic indica which include salience signifiers, the salience signifiers controlling the salience of segments of the synthesized speech, the step of generating the prosodic indica including the steps of;

(i) generating salience signifiers within the prosodic subgroupings in accordance with predetermined salience placement rules relating to the components of the subgroupings themselves;

(ii) modifying the salience at the beginning and end of each prosodic subgroup; and

(iii) modifying the salience at the beginning and end of each major prosodic grouping; and

generating and outputting audible speech from the restricted text and prosodic indica.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.

295 Citations

29 Claims

1. A method of synthesizing human audible speech from restricted text having a predetermined information content and predetermined format characteristics, the method comprising the steps of:
- generating prosody indica for the restricted text as a function of the predetermined information content and predetermined format characteristics by performing the steps of;
  
  a) identifying major prosodic groupings within the restricted text by utilizing major demarcation features which are a function of the predetermined format characteristics to define the beginning and end of the major prosodic groupings;
  
  b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the restricted text as a function of the predetermined information content for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;
  
  c) identifying within the prosodic subgroupings prosodically separable subgroup components;
  
  d) generating prosodic indica which include salience signifiers, the salience signifiers controlling the salience of segments of the synthesized speech, the step of generating the prosodic indica including the steps of;
  
  (i) generating salience signifiers within the prosodic subgroupings in accordance with predetermined salience placement rules relating to the components of the subgroupings themselves;
  
  (ii) modifying the salience at the beginning and end of each prosodic subgroup; and
  
  (iii) modifying the salience at the beginning and end of each major prosodic grouping; and
  
  generating and outputting audible speech from the restricted text and prosodic indica.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 2. The method of claim 1,wherein the predetermined information content includes a carrier phrase including word strings that have a structuring purpose and information words;
    - wherein the step of identifying major prosodic groupings includes the step of identifying the carrier phrase.
  - 3. The method of claim 2, wherein the information words include names with prefixed titles and wherein the method further comprises the steps of:
    - increasing a speaking rate of the word strings that have a structuring purpose relative to a speaking rate of the information words.
  - 4. The method of claim 3, wherein the information words include names which include prefixed titles followed by a word of the name, the method further comprising the step of:
    - modifying the generated salience indicators to assign less salience to the prefixed title than the word following the prefixed title.
  - 5. The method of claim 4, wherein a first time speech is generated from a word it is assigned greater salience then when speech is subsequently generated from the same word.
  - 6. The method of claim 5, further comprising the steps of:
    - repeatedly outputting the audible speech corresponding to a first segment of text;
      
      decreasing a rate of annunciation of the first segment of text after a first number of successive repeats of the audible speech corresponding to the first segment of text.
  - 7. The method of claim 6,wherein the step of modifying the salience at the beginning and end of each prosodic subgroup includes the steps of:
    - modifying the generated salience signifiers to increase the salience at the beginning of each prosodic subgroup; and
      
      modifying the generated salience signifiers to decrease the salience at the end of each prosodic subgroup; and
      
      wherein the step of modifying the salience at the beginning and end of each major prosodic grouping includes the steps of;
      
      modifying the generated salience signifiers to increase the salience at the beginning of each major prosodic grouping; and
      
      modifying the generated salience signifiers to decrease the salience at the end of each prosodic subgroup.
  - 8. The method of claim 6, wherein each word of a name includes a plurality of letters, the method further comprising the steps of:
    - arranging the letters of a word of a name into groups; and
      
      generating indica of prosodic boundaries between the groups of letters to insert a slight pause between the groups of letters when audible speech is generated therefrom.
  - 9. The method of claim 8, further comprising the step of:
    - generating audible speech representing the spelling of the name following the generation of audible speech from the groups of letters.
  - 10. The method of claim 9, further comprising the steps of:
    - allowing users to obtain repeats of audible speech segments generated from text segments;
      
      changing the rate of annunciation of a first audible speech segment after a first number of successive repeats of the first audible speech segment for the first user;
      
      decreasing the rate of annunciation of a second audible speech segment generated from a second text segment for the first user after the first number of successive repeats of the first audible speech segment; and
      
      increasing the rate of annunciation for a third audible speech segment generated from a third text segment if the first user does not obtain repeats of the second audible speech segment.
  - 11. The method of claim 10, further comprising the step of:
    - adjusting the initial annunciation rate for subsequent users as a function of the number of consecutive prior users for whom the rate of annunciation has been altered.
  - 12. The method of claim 1, wherein the step identifying within the prosodic subgroupings prosodically separable subgroup components includes the steps of:
    - a) identifying predetermined textual indicators which mark divisions of text groupings around them;
      
      b) utilizing the predetermined textual indicators to separate the text within the prosodic subgrouping into units of nominal text which do not include said predetermined textual indicators; and
      
      c) identifying within the units of nominal text other indicators of textual groupings that are not predetermined textual indicators.
  - 13. The method of claim 12, further comprising the steps of:
    - repeatedly outputting the audible speech corresponding to a first segment of text;
      
      decreasing a rate of annunciation of the first segment of text after a first number of successive repeats of the audible speech corresponding to the first segment of text.
  - 14. The method of claim 13,wherein the prosodic indica are generated by a set of prosody rules with predetermined discourse constraints which are a function of the context of the synthesis of the restricted text;
    - andwherein the restricted text includes name and address information.
  - 15. The method of claim 14,wherein the a major prosodic grouping is a sentence, a prosodic subgrouping is a name including a plurality of words, and a subgroup component is a word in a name.
  - 16. The method of claim 15, wherein the salience signifiers are indica of pitch.
  - 17. The method of claim 16, further comprising the step of:
    - arranging letters of a name into groups;
      
      generating indica of prosodic boundaries between the groups of letters.
  - 18. The method of claim 17, wherein the generated indica of prosodic boundaries between groups of letters results in the insertion of a slight pause between the groups of letters when audible speech is generated therefrom.
  - 19. The method of claim 18, further comprising the step of:
    - generating audible speech representing the spelling of the name following the generation of audible speech from the groups of letters.
  - 20. The method of claim 16, further comprising the step of:
    - generating audible speech representing the spelling of a name.
  - 21. The method of claim 1, wherein the audible speech is generated for a plurality of users, the method further comprising the steps of:
    - outputting at a first annunciation rate and to a first user, a first segment of audible speech corresponding to a first segment of text;
      
      repeatedly outputting to the first user the first segment of audible speech; and
      
      decreasing a rate of annunciation of the first segment of audible speech after a first number of successive repeats of the first segment of audible speech.
  - 22. The method of claim 21, further comprising the step of:
    - outputting the first segment of audible speech corresponding to the first segment of text to a second user at a second annunciation rate which is determined as a function of the number of times the first segment of audible speech was output to the first user.
  - 23. The method of claim 22, wherein the second annunciation rate is lower than the first annunciation rate.
  - 24. The method of claim 1, further comprising the steps of:
    - allowing users to obtain repeats of audible speech segments generated from text segments;
      
      changing the rate of annunciation of a first audible speech segment after a first number of successive repeats of the first audible speech segment for the first user;
      
      decreasing the rate of annunciation of a second audible speech segment generated from a second text segment for the first user after the first number of successive repeats of the first audible speech segment; and
      
      increasing the rate of annunciation for a third audible speech segment generated from a thirds text segment if the first user does not obtain repeats of the second audible speech segment.
  - 25. The method of claim 24, further comprising the step of:
    - adjusting the initial annunciation rate for subsequent users as a function of the number of consecutive prior users for whom the rate of annunciation has been altered.

26. A method of synthesizing human audible speech from text including a predetermined information content and having predetermined format characteristics, the method comprising the steps of:
- generating prosody indica for the text as a function of the predetermined information content and predetermined format characteristics of the text by performing the steps of;
  
  a) identifying major prosodic groupings within the restricted text by utilizing major demarcation features which are a function of the predetermined format characteristics to define the beginning and end of the major prosodic groupings;
  
  b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the restricted text as a function of the predetermined information content for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;
  
  c) identifying within the prosodic subgroupings prosodically separable subgroup components, at least one subgroup component being a word in the name;
  
  d) generating prosodic indica which include salience signifiers, the salience signifiers controlling the salience of segments of the synthesized speech, the step of generating the prosodic indica including the steps of;
  
  (i) generating salience signifiers within the prosodic subgroupings in accordance with salience placement rules solely relating to the components of the subgroupings themselves;
  
  (ii) modifying the generated salience signifiers to increase the salience at the start of each prosodic subgroup and to further signify the salience at the end of each prosodic subgroup; and
  
  (iii) further modifying the salience signifiers to further increase the salience of the beginning of the major prosodic grouping and further signify the salience of the end of the major prosodic grouping.
- View Dependent Claims (27, 28)
- - 27. The method of claim 26, further comprising the steps of:
    - arranging letters of the name into groups;
      
      generating indica of prosodic boundaries between the groups of letters, the generated indica of prosodic boundaries between groups of letters resulting in the insertion of a slight pause between the groups of letters when audible speech is generated therefrom.
  - 28. The method of claim 27, wherein the audible speech is generated for a plurality of users, the method further comprising the steps of:
    - outputting to a first user at a first annunciation rate a first segment of audible speech corresponding to a first segment of text;
      
      repeatedly outputting to the first user the first segment of audible speech; and
      
      decreasing the rate of annunciation of the first segment of audible speech after a first number of successive repeats of the first segment of audible speech.

29. An apparatus for synthesizing human audible speech from a machine readable representation of restricted text having a predetermined information content and predetermined format characteristics, comprising:
- prosody preprocessor means for receiving the restricted text and for generating prosody indica by assigning the prosody indica on the basis of the predetermined informational content of the restricted text, means for;
  
  a) identifying major prosodic groupings by utilizing major demarcation features to define the beginning and end of the major prosodic groupings;
  
  b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the text for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;
  
  c) identifying within the prosodic subgroupings prosodically separable subgroup components; and
  
  d) generating prosodic indicia which include salience signifiers utilizable by the speech synthesizer means to vary the salience of segments of the synthesized speech such that;
  
  (i) the salience signifiers within the prosodic subgroupings are first generated in accordance with predetermined salience placement rules solely relating to the components themselves,(ii) thereafter the first generated salience signifiers are modified to increase the salience at the start of the prosodic subgroup and further signify the salience at the end of the prosodic subgroup, and(iii) the salience signifiers arc subsequently further modified to further increase the salience of the beginning of the major prosodic grouping and further signify the salience of the end of the major prosodic grouping; and
  
  speech synthesizer means for synthesizing human audible speech from text, the speech synthesizer means including means for generating prosody indica on unrestricted text and for interpreting and executing prosody indica received from the prosody preprocessor means, the prosody indica from the prosody preprocessor means being used to override and supplement the prosody indica generated by the internal prosody indica generating means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
NYNEX Science & Technology, Inc. (Verizon Communications Inc.)
Inventors
Silverman, Kim Ernest Alexander
Primary Examiner(s)
HAFIZ, TARIQ R

Application Number

US08/641,480
Time in Patent Office

515 Days
Field of Search

395/2.1, 395/2.67, 395/2.69, 395/2.76
US Class Current

704/260
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

295 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

295 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others