Text to speech
First Claim
1. A method for converting text to speech using a computing device having memory, comprising:
- (a) receiving text into said memory of said computing device;
(b) applying a set of the lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules;
(f) parsing said marked up text using Lessac expressive parsing rules;
(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules.
1 Assignment
0 Petitions
Accused Products
Abstract
A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. The inventive method comprises examining a text to be spoken to an audience for a specific communications purpose, followed by marking-up the text according to a phonetic markup systems such as the Lessac System pronunciation rules notations. A set of rules to control a speech to text generator based on speech principles, such as Lessac principles. Such rules are of the tide normally implemented on prior art text-to-speech engines, and control the operation of the software and the characteristics of the speech generated by a computer using the software. A computer is used to speak the marked-up text expressively. The step of using a computer to speak the marked-up text expressively is repeated using alternative pronunciations of the selected style of expression where each of the tonal, structural, and consonant energies, have a different balance in the speech, are also spoken to a trained speech practitioners that listened to the spoken speech generated by the computer. The spoken speech generated by the computer is then evaluated for consistency with style criteria and/or expressiveness. And audience is then assembled and the spoken speech generated by the computer is played back to the audience. Audience comprehension of spoken speech generated by the computer is evaluated and correlated to a particular implemented rule or rules, and those rules which resulted relatively high audience comprehension are selected.
308 Citations
48 Claims
-
1. A method for converting text to speech using a computing device having memory, comprising:
-
(a) receiving text into said memory of said computing device;
(b) applying a set of the lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules;
(f) parsing said marked up text using Lessac expressive parsing rules;
(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules. - View Dependent Claims (2)
-
-
3. A method for converting text to speech using a computing device having a memory, comprising:
-
(a) receiving a text comprising a plurality of words into said memory of said computing device;
(b) deriving a plurality of phonemes from said text;
(c) associating with each of said phonemes a prosody record based on a database of prosody records associated with a plurality of words;
(d) applying a first set of the artificial intelligence rules to determine context information associated with said text;
(e) for each of said phonemes;
(i) determining context influenced prosody changes;
(ii) applying a second set of rules based on Lessac theory to determine Lessac derived prosody changes;
(iii) amending the prosody record in response to said context influenced prosody changes and said Lessac derived prosody changes;
(iv) reading from said memory sound information associated with said phonemes;
(v) amending said sound information based on the prosody record as amended in response to said context influenced prosody changes and said Lessac derived prosody changes to generate amended sound information; and
(f) outputting said sound information to generate a speech signal. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method, comprising:
-
(a) examining a text to be spoken to an audience for a specific communications purpose;
(b) marking-up said text according to a phonetic markup systems, wherein the phonetic markup system comprises the Lessac System pronunciation rules notations;
(c) implementing a set of rules to control a speech to text generator based on speech principles, wherein the speech principles are Lessac principles;
(d) using a computer to speak said marked-up text expressively;
(e) repeating the step of using a computer to speak said marked-up text expressively using alternative pronunciations of the selected style of expression wherein each of the tonal, structural, and consonant energies having a different balance in the speech, are also spoken;
(f) listening to said spoken speech generated by said computer;
(g) evaluating said spoken speech generated by said computer for consistency with style criteria and/or expressiveness;
(h) assembling an audience;
(i) playing back said spoken speech generated by said computer to said audience;
(j) evaluating comprehension by said audience of spoken speech generated by said computer correlated to a particular implemented rule or rules; and
(k) selecting out those rules providing high audience comprehension.
-
-
21. A method for converting input text to a synthesized speech output using a computing device having memory, the method comprising:
-
(a) receiving the input text into the computing device memory;
(b) applying a set of the lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules and parsing said marked up text using Lessac expressive parsing rules;
(f) applying artificial intelligence to recognize the meaning of the text and to identify the emotional nature of the message to be communicated;
(g) employing a grapheme-to-phoneme database to instruct the computing device in appropriate pronunciation to reflect the identified emotion;
(g) storing a plurality of sounds in memory each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification