Expressive parsing in computerized conversion of text to speech
First Claim
1. A method for converting text to speech using a computing device having memory, the method comprising:
- (a) receiving text into said memory of said computing device;
(b) applying a set of lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules;
(f) parsing said phonetically parsed marked up text using expressive parsing rules;
(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules.
1 Assignment
0 Petitions
Accused Products
Abstract
A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. Text, being made up of a plurality of words, is received into the memory of the computing device. A plurality of phonemes are derived from the text. Each of the phonemes is associated with a prosody record based on a database of prosody records associated with a plurality of words. A first set of the artificial intelligence rules is applied to determine context information associated with the text. The context influenced prosody changes for each of the phonemes is determined. Then a second set of rules, based on Lessac theory to determine Lessac derived prosody changes for each of the phonemes is applied. The prosody record for each of the phonemes is amended in response to the context influenced prosody changes and the Lessac derived prosody changes. Then a reading from the memory sound information associated with the phonemes is performed. The sound information is amended, based on the prosody record as amended in response to the context influenced prosody changes and the Lessac derived prosody changes to generate amended sound information for each of the phonemes. Then the sound information is outputted to generate a speech signal.
-
Citations
40 Claims
-
1. A method for converting text to speech using a computing device having memory, the method comprising:
-
(a) receiving text into said memory of said computing device;
(b) applying a set of lexical parsing rules to parse said text into a plurality of components;
(c) associating pronunciation and meaning information with said components;
(d) applying a set of phrase parsing rules to generate marked up text;
(e) phonetically parsing said marked up text using phonetic parsing rules;
(f) parsing said phonetically parsed marked up text using expressive parsing rules;
(g) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(h) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A computerized system for converting text to speech comprising:
-
(a) a memory to receive text to be converted;
(b) a digital audio module to output a speech signal or audible speech; and
(c) text to speech software comprising one or more software modules for;
(i) applying a set of lexical parsing rules to parse said text into a plurality of components;
(ii) associating pronunciation and meaning information with said components;
(iii) applying a set of phrase parsing rules to generate marked up text;
(iv) phonetically parsing said marked up text using phonetic parsing rules;
(v) parsing said phonetically parsed marked up text using expressive parsing rules;
(vi) storing a plurality of sounds in memory, each of said sounds being associated with said pronunciation information; and
(vii) recalling the sounds associated with said text to generate a raw speech signal from said marked up text after said parsing using phonetic and expressive parsing rules.
-
Specification