High quality concatenative reading system
First Claim
1. A high quality concatenative reading system for converting an input string into a sequence for audible synthesis, comprising:
- a dictionary of complete word speech samples corresponding to entire words stored in a computer-readable medium;
a word list generator receptive of said input string for building and storing word list tokens in a word list, the word list generator building said word list from words stored in said dictionary that correspond to the input string;
said word list generator further having a list of prosodic environment tokens representing a plurality of intonation types, said word list generator assigning at least one of said prosodic environment tokens to at least some of the word list tokens;
phonological feature analyzer that analyzes said word list tokens and said assigned prosodic environment tokens and selects said complete word speech samples from said dictionary to build a sample list based on (a) the word list tokens, (b) the prosodic environment tokens and (c) the phonological features of adjacent words; and
output for concatenatively supplying said sample list to an analog conversion unit to produce an audible text-to-speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Computer-stored text, such as numerical information, is processed by a word list generator to develop a word list corresponding to those words that are to be spoken by the system. The word list generator assigns a prosodic environment state or token to each entry in the list. The prosodic environment identifies how the word functions in its current prosodic context. Different intonations are applied based on the prosodic environment. Next, the preceding and adjacent words are examined to determine how each word may need to be pronounced differently, based on the ending phoneme of the preceding word and the beginning phoneme of the following word. Using this phonological information along with the prosodic information, a sample list is generated by accessing a dictionary of stored samples. The sample list is then serially played through suitable digital-to-analog conversion circuitry to generate the text-to-speech output. The result is a natural, human-like reading, complete with appropriate intonation changes suitable to the context of the text material.
199 Citations
12 Claims
-
1. A high quality concatenative reading system for converting an input string into a sequence for audible synthesis, comprising:
-
a dictionary of complete word speech samples corresponding to entire words stored in a computer-readable medium; a word list generator receptive of said input string for building and storing word list tokens in a word list, the word list generator building said word list from words stored in said dictionary that correspond to the input string; said word list generator further having a list of prosodic environment tokens representing a plurality of intonation types, said word list generator assigning at least one of said prosodic environment tokens to at least some of the word list tokens; phonological feature analyzer that analyzes said word list tokens and said assigned prosodic environment tokens and selects said complete word speech samples from said dictionary to build a sample list based on (a) the word list tokens, (b) the prosodic environment tokens and (c) the phonological features of adjacent words; and output for concatenatively supplying said sample list to an analog conversion unit to produce an audible text-to-speech signal. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of text-to-speech conversion, comprising:
-
receiving an input string representing text to be covered into audible synthesized speech; constructing a word list of word tokens corresponding to the input string by accessing a dictionary of complete word speech samples corresponding to entire words stored in a computer-readable medium; supplementing said word list with prosodic environment tokens that represent different intonation types, such that at least some of the word tokens in said word list are associated with a corresponding prosodic environment token; analyzing the phonological attributes associated with the word tokens in said word list by examining the phonological features of adjacent words in said list; selecting complete word speech samples from said predetermined dictionary of complete word speech samples corresponding to entire words based on (a) said word list tokens, (b) said corresponding prosodic environment tokens, and (c) said phonological attributes; and building a sample of list said selected complete word speech samples and supplying said sample list for concatenative output to an analog conversion unit to produce an audible text-to-speech signal. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification