Text to speech
First Claim
1. A method for converting text to speech using a computing device having memory, comprising:
- (a) receiving text into said memory of said computing device;
(b) applying a set of the lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation, and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules;
(f) parsing said marked up text using Lessac expressive parsing rules; and
(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules.
1 Assignment
0 Petitions
Accused Products
Abstract
A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. The inventive method comprises examining a text to be spoken to an audience for a specific communications purpose, followed by marking-up the text according to a phonetic markup systems such as the Lessac System pronunciation rules notations. A set of rules to control a speech to text generator based on speech principles, such as Lessac principles. Such rules are of the tide normally implemented on prior art text-to-speech engines, and control the operation of the software and the characteristics of the speech generated by a computer using the software. A computer is used to speak the marked-up text expressively. The step of using a computer to speak the marked-up text expressively is repeated using alternative pronunciations of the selected style of expression where each of the tonal, structural, and consonant energies, have a different balance in the speech, are also spoken to a trained speech practitioners that listened to the spoken speech generated by the computer. The spoken speech generated by the computer is then evaluated for consistency with style criteria and/or expressiveness. And audience is then assembled and the spoken speech generated by the computer is played back to the audience. Audience comprehension of spoken speech generated by the computer is evaluated and correlated to a particular implemented rule or rules, and those rules which resulted relatively high audience comprehension are selected.
264 Citations
20 Claims
-
1. A method for converting text to speech using a computing device having memory, comprising:
-
(a) receiving text into said memory of said computing device;
(b) applying a set of the lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation, and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules;
(f) parsing said marked up text using Lessac expressive parsing rules; and
(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules. - View Dependent Claims (2)
-
-
3. A method for converting text to speech using a computing device having a memory, comprising:
-
(a) receiving a text comprising a plurality of words into said memory of said computing device;
(b) deriving a plurality of phonemes from said text;
(c) associating with each of said phonemes a prosody record based on a database of prosody records associated with a plurality of words;
(d) applying a first set of the artificial intelligence rules to determine context information associated with said text;
(e) for each of said phonemes;
(i) determining context influenced prosody changes;
(ii) applying a second set of rules based on Lessac theory to determine Lessac derived prosody changes;
(iii) amending the prosody record in response to said context influenced prosody changes and said Lessac derived prosody changes;
(iv) reading from said memory sound information associated with said phonemes;
(v) amending said sound information based on the prosody record as amended in response to said context influenced prosody changes and said Lessac derived prosody changes to generate amended sound information; and
(f) outputting said sound information to generate a speech signal. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method, comprising:
-
(a) examining a text to be spoken to an audience for a specific communications purpose;
(b) marking-up said text according to a phonetic markup systems such as the Lessac System pronunciation rules notations;
(c) implementing a set of rules to control a speech to text generator based on speech principles, such as Lessac principles;
(d) using a computer to speak said marked-up text expressively;
(e) repeating the step of using a computer to speak said marked-up text expressively using alternative pronunciations of the selected style of expression where each of the tonal, structural, and consonant energies, have a different balance in the speech, are also spoken;
(f) listening to said spoken speech generated by said computer;
(g) evaluating said spoken speech generated by said computer for consistency with style criteria and/or expressiveness;
(h) assembling a an audience;
(i) playing back to said spoken speech generated by said computer to said audience;
(j) evaluating comprehension by said audience of spoken speech generated by said computer correlated to a particular implemented rule or rules; and
(k) selecting out those rules which resulted relatively high audience comprehension.
-
Specification