Method for producing a speech rendition of text from diphone sounds
First Claim
1. A method for producing a speech rendition of text comprising:
- parsing a sentence into punctuation and a plurality of words;
comparing at least one word of the plurality of words to a list of pre-recorded words;
in the event that the compared word is not on the list of pre-recorded words, determining whether the compared word includes at least one number, and audibly spelling the compared word out in the event that the compared word includes at least one number, in the event that the compared word is not on the list of pre-recorded words and does not include at least one number, dividing the compared word into a plurality of diphones, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.
1 Assignment
0 Petitions
Accused Products
Abstract
A text-to-speech system utilizes a method for producing a speech rendition of text based on dividing some or all words of a sentence into component diphones. A phonetic dictionary is aligned so that each letter within each word has a single corresponding phoneme. The aligned dictionary is analyzed to determine the most common phoneme representation of the letter in the context of a string of letters before and after it. The results for each letter are stored in phoneme rule matrix. A diphone database is created using a way editor to cut 2,000 distinct diphones out of specially selected words. A computer algorithm selects a phoneme for each letter. Then, two phonemes are used to create a diphone. Words are then read aloud by concatenating sounds from the diphone database. In one embodiment, diphones are used only when a word is not one of a list of pre-recorded words.
-
Citations
17 Claims
-
1. A method for producing a speech rendition of text comprising:
-
parsing a sentence into punctuation and a plurality of words;
comparing at least one word of the plurality of words to a list of pre-recorded words;
in the event that the compared word is not on the list of pre-recorded words, determining whether the compared word includes at least one number, and audibly spelling the compared word out in the event that the compared word includes at least one number, in the event that the compared word is not on the list of pre-recorded words and does not include at least one number, dividing the compared word into a plurality of diphones, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones. - View Dependent Claims (2, 3, 4)
-
-
5. A method for producing a speech rendition of text comprising:
-
providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on letters that precede and succeed the particular letter, at least one word of the predetermined group of words including two or more letters that collectively have a single phonetic representation, wherein a first letter of the two or more letters is represented by a phoneme that corresponds to the single phonetic representation and wherein remaining letters of the two or more letters are represented by blank phonemes;
parsing a sentence into punctuation and a plurality of words;
dividing each word of the plurality of words into a plurality of diphones based on combinations of letters in the letter to phoneme rules database;
combining sound files corresponding to the plurality of diphones; and
playing the combined sound files. - View Dependent Claims (6, 7)
-
-
8. A method for producing a speech rendition of text comprising:
-
providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on letters that precede and succeed the particular letter, at least one word of the predetermined group of words including two or more letters that collectively have a single phonetic representation, wherein a first letter of the two or more letters is represented by a phoneme that corresponds to the single phonetic representation and wherein remaining letters of the two or more letters are represented by blank phonemes;
parsing a sentence into punctuation and a plurality of words;
comparing at least one word of the plurality of words to a list of pre-recorded words;
in the event that the compared word is not on the list of pre-recorded words, dividing the compared word into a plurality of diphones based on combinations of letters in the letter to phoneme rule database, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.
-
-
9. A method for producing a speech rendition of text comprising:
-
providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on three letters that precede and three letters that succeed the particular letter;
parsing a sentence into punctuation and a plurality of words;
comparing at least one word of the plurality of words to a list of pre-recorded words, in the event that the compared word is not on the list of pre-recorded words, dividing the compared word into a plurality of diphones based on combinations of letters in the letter phoneme rules database, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.
-
-
10. A method for producing a speech rendition of text comprising:
-
providing a letter to pronounce rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on one letter that precedes and two letters that succeed the particular letter;
parsing a sentence into punctuation and a plurality of words;
comparing at least one word of the plurality of words to a list of pre-recorded words;
in the event that the compared word is not on the list of pre-recorded words, dividing the compared word into a plurality of diphones based on combinations of letters in the letter to phoneme rules database, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.
-
-
11. A method for producing a speech rendition of text comprising:
-
providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on three letters that precede and three letters that succeed the particular letter;
parsing a sentence into punctuation and a plurality of words;
dividing each word of the plurality of words into a plurality of diphones based on combinations of letters in the letter to phoneme rules database;
combining sound files corresponding to the plurality of diphones; and
playing the combined sound files.
-
-
12. A method for producing a speech rendition of text comprising:
-
providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on one letter that precedes and two letters that succeed the particular letter;
parsing a sentence into punctuation and a plurality of words;
dividing each word of the plurality of words into a plurality of diphones based on combinations of letters in the letter to phoneme rules database;
combining sound files corresponding to the plurality of diphones; and
playing the combined sound files.
-
-
13. A method for producing a speech rendition of text comprising:
-
parsing a sentence into a plurality of words;
comparing a first word of the plurality of words to a list of homographs;
in the event that the first word is on the list of homographs, determining parts of speech for words adjacent the first word;
selecting a sound file for the first word based on the parts of speech of the adjacent words, the sound file being independent of sound files corresponding to diphones associated with the first word, and playing the selected sound file;
in the event that the first word is not on the list of homographs, comparing the first word to a list of pre-recorded words;
in the event that the first word is not on the list of homographs, comparing the first word to a list of pre-recorded words;
in the event that the first word is not on the list of homographs and is not on the list of pre-recorded words, dividing the first word into a plurality of diphones, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
in the event that the first word is not on the list of homographs and is on the list of pre-recorded words, playing a sound file corresponding to the first word, the sound file being independent of the sound files corresponding to the plurality of diphones. - View Dependent Claims (14, 15, 16, 17)
-
Specification