Text to speech

US 20030163316A1
Filed: 12/31/2002
Published: 08/28/2003
Est. Priority Date: 04/21/2000
Status: Active Grant

First Claim

Patent Images

1. A method for converting text to speech using a computing device having memory, comprising:

(a) receiving text into said memory of said computing device;

(b) applying a set of the lexical parsing rules to parse said text into a plurality of components;

(c) associating pronunciation, and meaning information with said components;

(d) applying a set of phrase parsing rules to generate marked up text;

(e) phonetically parsing said marked up text using phonetic parsing rules;

(f) parsing said marked up text using Lessac expressive parsing rules; and

(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and

(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. The inventive method comprises examining a text to be spoken to an audience for a specific communications purpose, followed by marking-up the text according to a phonetic markup systems such as the Lessac System pronunciation rules notations. A set of rules to control a speech to text generator based on speech principles, such as Lessac principles. Such rules are of the tide normally implemented on prior art text-to-speech engines, and control the operation of the software and the characteristics of the speech generated by a computer using the software. A computer is used to speak the marked-up text expressively. The step of using a computer to speak the marked-up text expressively is repeated using alternative pronunciations of the selected style of expression where each of the tonal, structural, and consonant energies, have a different balance in the speech, are also spoken to a trained speech practitioners that listened to the spoken speech generated by the computer. The spoken speech generated by the computer is then evaluated for consistency with style criteria and/or expressiveness. And audience is then assembled and the spoken speech generated by the computer is played back to the audience. Audience comprehension of spoken speech generated by the computer is evaluated and correlated to a particular implemented rule or rules, and those rules which resulted relatively high audience comprehension are selected.

264 Citations

20 Claims

1. A method for converting text to speech using a computing device having memory, comprising:
- (a) receiving text into said memory of said computing device;
  
  (b) applying a set of the lexical parsing rules to parse said text into a plurality of components;
  
  (c) associating pronunciation, and meaning information with said components;
  
  (d) applying a set of phrase parsing rules to generate marked up text;
  
  (e) phonetically parsing said marked up text using phonetic parsing rules;
  
  (f) parsing said marked up text using Lessac expressive parsing rules; and
  
  (g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
  
  (h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules.
- View Dependent Claims (2)
- - 2. A method as in claim 1, for the comprising:
    - (h) filtering said raw speech signal to generate an output speech signal.

3. A method for converting text to speech using a computing device having a memory, comprising:
- (a) receiving a text comprising a plurality of words into said memory of said computing device;
  
  (b) deriving a plurality of phonemes from said text;
  
  (c) associating with each of said phonemes a prosody record based on a database of prosody records associated with a plurality of words;
  
  (d) applying a first set of the artificial intelligence rules to determine context information associated with said text;
  
  (e) for each of said phonemes;
  
  (i) determining context influenced prosody changes;
  
  (ii) applying a second set of rules based on Lessac theory to determine Lessac derived prosody changes;
  
  (iii) amending the prosody record in response to said context influenced prosody changes and said Lessac derived prosody changes;
  
  (iv) reading from said memory sound information associated with said phonemes;
  
  (v) amending said sound information based on the prosody record as amended in response to said context influenced prosody changes and said Lessac derived prosody changes to generate amended sound information; and
  
  (f) outputting said sound information to generate a speech signal.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 4. A method for converting text to speech as in claim 3, wherein the prosody of said speech signal is varied whereby increased realism is achieved in said speech signal.
  - 5. A method for converting text to speech as in claim 3, wherein the prosody of said speech signal is varied in a manner which is random or which appears to be random, whereby increased realism is achieved in said speech signal.
  - 6. A method for converting text to speech as in claim 3, wherein said sound information is associated with different speakers, and a set of artificial intelligence rules are used to determine the identity of the speaker associated with the sound information that is to be output.
  - 7. A method of converting text to speech as in claim 3, wherein said amending of the prosody record in response to said context influenced prosody changes is based on the words in said text and their sequence.
  - 8. A method of converting text to speech as in claim 3, wherein said amending of the prosody record in response to said context influenced prosody changes is based on the emotional context of words in said text.
  - 9. A method for converting text to speech as in claim 8, wherein the prosody of said speech signal is varied whereby increased realism is achieved in said speech signal.
  - 10. A method for converting text to speech as in claim 9, wherein the prosody of said speech signal is varied in a manner which is random or which appears to be random, whereby increased realism is achieved in said speech signal.
  - 11. A method for converting text to speech as in claim 10, wherein said sound information is associated with different speakers, and a set of artificial intelligence rules are used to determine the identity of the speaker associated with the sound information that is to be output.
  - 12. A method of converting text to speech as in claim 11, wherein said amending of the prosody record in response to said context influenced prosody changes is based on the words in said text and their sequence.
  - 13. A method as in claim 12, further comprising filtering said speech signal to obtain a filtered amended sound information signal, said filtered amended sound information signal being output to generate a speech signal.
  - 14. A method as in claim 13, wherein said filtering of said amended sound information comprises introducing echo.
  - 15. A method as in claim 13, wherein said filtering of said speech signal comprises passing said amended sound information through an analog or digital resonant circuit wherein the resonance characteristics keyed to vowel information.
  - 16. A method as in claim 13, wherein said filtering of said speech signal comprises damping said amended sound information.
  - 17. A method as in claim 12, further comprising filtering said speech signal by introducing echo, passing said amended sound information through an analog or digital resonant circuit wherein the resonance characteristics keyed to vowel information, and damping said amended sound information.
  - 18. A method as in claim 3, further comprising filtering said speech signal by introducing echo, passing said amended sound information through an analog or digital resonant circuit wherein the resonance characteristics keyed to vowel information, and damping said amended sound information.
  - 19. A method as in claim 3, further comprising adding background sound logically consistent with the context of said text in response to artificial intelligence rules operating on said text and/or in response to a human input.

20. A method, comprising:
- (a) examining a text to be spoken to an audience for a specific communications purpose;
  
  (b) marking-up said text according to a phonetic markup systems such as the Lessac System pronunciation rules notations;
  
  (c) implementing a set of rules to control a speech to text generator based on speech principles, such as Lessac principles;
  
  (d) using a computer to speak said marked-up text expressively;
  
  (e) repeating the step of using a computer to speak said marked-up text expressively using alternative pronunciations of the selected style of expression where each of the tonal, structural, and consonant energies, have a different balance in the speech, are also spoken;
  
  (f) listening to said spoken speech generated by said computer;
  
  (g) evaluating said spoken speech generated by said computer for consistency with style criteria and/or expressiveness;
  
  (h) assembling a an audience;
  
  (i) playing back to said spoken speech generated by said computer to said audience;
  
  (j) evaluating comprehension by said audience of spoken speech generated by said computer correlated to a particular implemented rule or rules; and
  
  (k) selecting out those rules which resulted relatively high audience comprehension.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lessac Technologies Incorporated
Original Assignee
Lessac Technologies Incorporated
Inventors
Krebs, Nancy, Wilson, H. Donald, Marple, Gary, Handal, Anthony H., Addison, Edwin R.

Granted Patent

US 6,865,533 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G09B 5/04   with audible presentation o...

G10L 13/10   Prosody rules derived from ...

G10L 15/063   Training

G10L 2015/0638   Interactive procedures

Text to speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

264 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Text to speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

264 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links