Hyper text mark up language document to speech converter
First Claim
1. A computer system for converting a hyper text markup language (HTML) document into audio signals comprising:
- an HTML parser receiving data of an HTML formatted document for parsing out content text, HTML text tags that structure said content text and control rules used only for translating said received data into sound,an HTML to speech (HTS) control parser for parsing out of said control rules for converting said received data into sound, said HTS control parser modifying entries in one or more of a tag mapping table, an audio data table, a parameter set table, an enunciation modification table and a terminology translation table depending on each of said parsed control rules,a text normalizer for modifying enunciation of each text string of said content text for which said enunciation modification table has an entry, according to an enunciation modification indicated in said respective enunciation table entry, and for translating each text string of said content text for which said terminology translation table has an entry, according to a translation indicated in said respective terminology translation table entry,a tag converter for modifying an intonation and a speed of audio generated from said content text encapsulated by, and for inserting audio data at, each text tag for which said tag mapping table has an entry, as specified in corresponding entries of said parameter set table and said audio data table pointed to by pointers in entries of said tag mapping table indexed by each of said text tags, respectively, anda text to speech converter for converting said content text, as modified, translated and appended by said text normalizer and said tag converter, to speech audio.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for converting a hyper text markup language (HTML) document to speech includes an HTML parser, an HTML to speech (HTS) control parser, a tag converter, a text normalizer and a TTS converter. The HTML parser receives data of an HTML formatted document and parses out content text, HTML text tags that structure the content text and control rules used only for translating the received data into sound. The HTS control parser parses control rules for converting the received data into sound. The HTS control parser modifies entries in one or more of a tag mapping table, an audio data table, a parameter set table, an enunciation modification table and a terminology translation table depending on each of the parsed control rules. The text normalizer modifies enunciation of each text string of the content text of the HTML document for which the enunciation modification table has an entry, according to an enunciation modification indicated in the respective enunciation table entry. The text normalizer also translates each text string of the content text of the HTML document for which the terminology translation table has an entry, according to a translation indicated in the respective terminology translation table entry. The tag converter modifies an intonation and a speed of audio generated from the content text of the HTML document encapsulated by each text tag for which the tag mapping table has an entry, as specified in corresponding entries of the parameter set table pointed to by pointers in the tag mapping table. The tag converter also inserts audio for each text tag for which the tag mapping table has an entry, as specified in corresponding entries of the audio data table pointed to by entries of the tag mapping table. The TTS converter converts the content text of the HTML document, as modified, translated and appended by the text normalizer and the tag converter, to speech audio.
322 Citations
15 Claims
-
1. A computer system for converting a hyper text markup language (HTML) document into audio signals comprising:
-
an HTML parser receiving data of an HTML formatted document for parsing out content text, HTML text tags that structure said content text and control rules used only for translating said received data into sound, an HTML to speech (HTS) control parser for parsing out of said control rules for converting said received data into sound, said HTS control parser modifying entries in one or more of a tag mapping table, an audio data table, a parameter set table, an enunciation modification table and a terminology translation table depending on each of said parsed control rules, a text normalizer for modifying enunciation of each text string of said content text for which said enunciation modification table has an entry, according to an enunciation modification indicated in said respective enunciation table entry, and for translating each text string of said content text for which said terminology translation table has an entry, according to a translation indicated in said respective terminology translation table entry, a tag converter for modifying an intonation and a speed of audio generated from said content text encapsulated by, and for inserting audio data at, each text tag for which said tag mapping table has an entry, as specified in corresponding entries of said parameter set table and said audio data table pointed to by pointers in entries of said tag mapping table indexed by each of said text tags, respectively, and a text to speech converter for converting said content text, as modified, translated and appended by said text normalizer and said tag converter, to speech audio.
-
-
2. In a hyper text markup language (HTML) text to speech (HTS) control parser, a method for converting data of an HTML document to speech comprising the steps of:
-
parsing one or more intonation/speed modification rules that specify intonation and speed modification parameters for generating speech encapsulated by particular text tags of an HTML document and one or more rules that specify audio data to be inserted for particular text tags of an HTML document, and generating a tag mapping table mapping said text tags to corresponding tag identifiers, a parameter set table of entries containing parameter sets pointed to by pointers in corresponding tagged entries of said tag mapping table, and an audio data table of entries containing audio data pointed to by pointers in corresponding tagged entries of said tag mapping table, according to said parsed intonation/speed modification and audio data rules, respectively, and parsing one or more rules for modifying enunciation of particular strings of content text of an HTML document and one or more rules for translating particular strings of said content text of an HTML document to terms that can be converted to speech by a text to speech converter, and generating an enunciation modification table mapping particular ones of said particular strings to replacement enunciation strings and a terminology translation table mapping particular ones of said particular strings to replacement terminology strings, according to said parsed enunciation modification and terminology translation rules, respectively.
-
-
3. In a parcer and text normalizer, a method for converting data of a hyper text markup language (HTML) document to speech audio comprising the steps of:
-
parsing one or more HTML to speech (HTS) control rules, including generating a tag mapping table entry indexed by an HTML text tag specified in an audio data rule and containing a tag identifier unique to said HTML text tag, and generating an audio data table entry, pointed to by said entry of said tag mapping table indexed by said tag specified in said audio data rule, and containing audio data indicated by said audio data rule, replacing each instance of a string of one or more content text characters of an HTML document, for which an enunciation modification table has an entry, with an enunciation replacement string of text characters indicated in said entry, said enunciation replacement string being converted to speech audio of a particular one of multiple permissible enunciations of said replaced string of content text characters, and replacing each instance of a second string of content text characters of an HTML document, for which a terminology translation table has an entry, with a translation string of text characters in said entry, said translation string of text characters being convertible to speech audio, and at least part of said second replaced string of content text characters being unconvertible to speech audio, by a predetermined text to speech converter.
-
-
4. In a tag converter for intonation modification and audio data insertion, a method for converting data of a hyper text markup language (HTML) document comprising the steps of:
-
modifying the intonation and speed of speech audio generated for content text encapsulated by, and inserting audio data at, each instance of an HTML text tag for which a tag mapping table has an entry, an indication to access a parameter set table, and a first pointer to a particular entry of said parameter set table, according to intonation and speed parameters specified in said entry of said parameter set table pointed to by said first pointer, and generating a particular audio sound for each instance of an HTML text tag, for which said tag mapping table has an entry, an indication to access an audio data table, and a second pointer to a particular entry of said audio data table, from audio data specified in said entry of said audio table pointed to by said second pointer.
-
-
5. A method for converting data of a hyper text markup language (HTML) document to speech comprising the steps of:
parsing one or more HTML to speech (HTS) control rules, said step of parsing comprising the steps of; in response to an intonation/speed rule, generating a tag mapping table entry indexed by an HTML text tag specified in said intonation/speed rule and containing a tag identifier unique to said HTML text tag, and generating a parameter set table entry, pointed to by said entry of said tag mapping table indexed by said tag specified in said intonation/speed rule, and containing a set of intonation and speed parameters indicated by said intonation/speed rule, in response to an audio data rule, generating a tag mapping table entry indexed by an HTML text tag specified in said audio data rule and containing a tag identifier unique to said HTML text tag, and generating an audio data table entry, pointed to by said entry of said tag mapping table indexed by said tag specified in said audio data rule, and containing audio data indicated by said audio data rule, in response to an enunciation rule, generating an enunciation table entry indexed by a text string in an HTML document and containing at least a replacement text string, that is converted to a particular audio sound of one of plural enunciations of said index text string, indicated by said enunciation rule, and in response to a terminology translation rule, generating a terminology translation table entry indexed by a text string in an HTML document that cannot be converted to an audio sound by a predetermined text to speech converter and containing a replacement text string that can be converted to an audio sound by said predetermined text to speech converter. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. In a text normalizer and tag converter, a method for converting data of a hyper text markup language (HTML) document to speech audio comprising the steps of:
-
replacing each instance of a string of one or more content text characters of an HTML document, for which an enunciation modification table has an entry, with an enunciation replacement string of text characters indicated in said entry, said enunciation replacement string being converted to speech audio of a particular one of multiple permissible enunciations of said replaced string of content text characters, replacing each instance of a second string of content text characters of an HTML document, for which a terminology translation table has an entry, with a translation string of text characters in said entry, said translation string of text characters being convertible to speech audio, and at least part of said second replaced string of content text characters being unconvertible to speech audio, by a predetermined text to speech converter, and inserting audio data at each text tag for which a tag mapping table has an entry, as specified in corresponding entries of an audio data table.
-
Specification