Method for producing a speech rendition of text from diphone sounds

US 6,879,957 B1
Filed: 09/01/2000
Issued: 04/12/2005
Est. Priority Date: 10/04/1999
Status: Active Grant

First Claim

Patent Images

1. A method for producing a speech rendition of text comprising:

parsing a sentence into punctuation and a plurality of words;

comparing at least one word of the plurality of words to a list of pre-recorded words;

in the event that the compared word is not on the list of pre-recorded words, determining whether the compared word includes at least one number, and audibly spelling the compared word out in the event that the compared word includes at least one number, in the event that the compared word is not on the list of pre-recorded words and does not include at least one number, dividing the compared word into a plurality of diphones, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;

in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text-to-speech system utilizes a method for producing a speech rendition of text based on dividing some or all words of a sentence into component diphones. A phonetic dictionary is aligned so that each letter within each word has a single corresponding phoneme. The aligned dictionary is analyzed to determine the most common phoneme representation of the letter in the context of a string of letters before and after it. The results for each letter are stored in phoneme rule matrix. A diphone database is created using a way editor to cut 2,000 distinct diphones out of specially selected words. A computer algorithm selects a phoneme for each letter. Then, two phonemes are used to create a diphone. Words are then read aloud by concatenating sounds from the diphone database. In one embodiment, diphones are used only when a word is not one of a list of pre-recorded words.

Citations

17 Claims

1. A method for producing a speech rendition of text comprising:
- parsing a sentence into punctuation and a plurality of words;
  
  comparing at least one word of the plurality of words to a list of pre-recorded words;
  
  in the event that the compared word is not on the list of pre-recorded words, determining whether the compared word includes at least one number, and audibly spelling the compared word out in the event that the compared word includes at least one number, in the event that the compared word is not on the list of pre-recorded words and does not include at least one number, dividing the compared word into a plurality of diphones, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
  
  in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further comprising:
    - adding inflection to at least one word of the plurality of words in accordance with the punctuation of the sentence.
  - 3. The method of claim 1, wherein the step of dividing the compared word into a plurality of diphones comprises comparing combinations of letters in the compared word to a database of diphones.
  - 4. The method of claim 1, further comprising:
    - comparing at least a second word of the plurality of words to a list of homographs;
      
      in the event that the second word of the plurality of words is on the list of homographs, determining parts of speech for words adjacent the second word, selecting a sound file for the second word based on the parts of speech of the adjacent words, and playing the selected sound file.

5. A method for producing a speech rendition of text comprising:
- providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on letters that precede and succeed the particular letter, at least one word of the predetermined group of words including two or more letters that collectively have a single phonetic representation, wherein a first letter of the two or more letters is represented by a phoneme that corresponds to the single phonetic representation and wherein remaining letters of the two or more letters are represented by blank phonemes;
  
  parsing a sentence into punctuation and a plurality of words;
  
  dividing each word of the plurality of words into a plurality of diphones based on combinations of letters in the letter to phoneme rules database;
  
  combining sound files corresponding to the plurality of diphones; and
  
  playing the combined sound files.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5, further comprising:
    - adding inflection to at least one word of the plurality of words in accordance with the punctuation of the sentence.
  - 7. The method of claim 5, wherein the step of dividing each word of the plurality of words into a plurality of diphones comprises comparing combinations of letters in each word of the plurality of words to the combinations of letters in the letter to phoneme rules database.

8. A method for producing a speech rendition of text comprising:
- providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on letters that precede and succeed the particular letter, at least one word of the predetermined group of words including two or more letters that collectively have a single phonetic representation, wherein a first letter of the two or more letters is represented by a phoneme that corresponds to the single phonetic representation and wherein remaining letters of the two or more letters are represented by blank phonemes;
  
  parsing a sentence into punctuation and a plurality of words;
  
  comparing at least one word of the plurality of words to a list of pre-recorded words;
  
  in the event that the compared word is not on the list of pre-recorded words, dividing the compared word into a plurality of diphones based on combinations of letters in the letter to phoneme rule database, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
  
  in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.

9. A method for producing a speech rendition of text comprising:
- providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on three letters that precede and three letters that succeed the particular letter;
  
  parsing a sentence into punctuation and a plurality of words;
  
  comparing at least one word of the plurality of words to a list of pre-recorded words, in the event that the compared word is not on the list of pre-recorded words, dividing the compared word into a plurality of diphones based on combinations of letters in the letter phoneme rules database, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
  
  in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.

10. A method for producing a speech rendition of text comprising:
- providing a letter to pronounce rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on one letter that precedes and two letters that succeed the particular letter;
  
  parsing a sentence into punctuation and a plurality of words;
  
  comparing at least one word of the plurality of words to a list of pre-recorded words;
  
  in the event that the compared word is not on the list of pre-recorded words, dividing the compared word into a plurality of diphones based on combinations of letters in the letter to phoneme rules database, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
  
  in the event that the compared word is on the list of pre-recorded words, playing a sound file corresponding to the compared word, the sound file being independent of the sound files corresponding to the plurality of diphones.

11. A method for producing a speech rendition of text comprising:
- providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on three letters that precede and three letters that succeed the particular letter;
  
  parsing a sentence into punctuation and a plurality of words;
  
  dividing each word of the plurality of words into a plurality of diphones based on combinations of letters in the letter to phoneme rules database;
  
  combining sound files corresponding to the plurality of diphones; and
  
  playing the combined sound files.

12. A method for producing a speech rendition of text comprising:
- providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on one letter that precedes and two letters that succeed the particular letter;
  
  parsing a sentence into punctuation and a plurality of words;
  
  dividing each word of the plurality of words into a plurality of diphones based on combinations of letters in the letter to phoneme rules database;
  
  combining sound files corresponding to the plurality of diphones; and
  
  playing the combined sound files.

13. A method for producing a speech rendition of text comprising:
- parsing a sentence into a plurality of words;
  
  comparing a first word of the plurality of words to a list of homographs;
  
  in the event that the first word is on the list of homographs, determining parts of speech for words adjacent the first word;
  
  selecting a sound file for the first word based on the parts of speech of the adjacent words, the sound file being independent of sound files corresponding to diphones associated with the first word, and playing the selected sound file;
  
  in the event that the first word is not on the list of homographs, comparing the first word to a list of pre-recorded words;
  
  in the event that the first word is not on the list of homographs, comparing the first word to a list of pre-recorded words;
  
  in the event that the first word is not on the list of homographs and is not on the list of pre-recorded words, dividing the first word into a plurality of diphones, combining sound files corresponding to the plurality of diphones, and playing the combined sound files;
  
  in the event that the first word is not on the list of homographs and is on the list of pre-recorded words, playing a sound file corresponding to the first word, the sound file being independent of the sound files corresponding to the plurality of diphones.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, further comprising:
    - in the event that the first word is not on the list of pre-recorded words and prior to dividing the first word into a plurality of diphones, determining whether the first word includes at least one number, and in the event that the first word includes at least one number, audibly spelling the first word out instead of dividing the first word into a plurality of diphones, combining sound files, and playing the combined sound files.
  - 15. The method of claim 13, further comprising:
    - providing a letter to phoneme rules database containing phonetic representations of a predetermined group of words, each letter of each word in the predetermined group of words being represented by a corresponding phoneme, the phoneme for a particular letter being determined based on letters that precede and succeed the particular letter;
      
      wherein the step of dividing the first word into a plurality of diphones comprises dividing the first word into a plurality of diphones based on combinations of letters in the letter to phoneme rules database.
  - 16. The method of claim 15, wherein at least one word of the predetermined group of words includes two or more letters that collectively have a single phonetic representation, wherein a first letter of the two or more letters is represented by a phoneme that corresponds to the single phonetic representation, and wherein remaining letters of the two or more letters are represented by blank phonemes.
  - 17. The method of claim 15, wherein the corresponding phoneme for a particular letter is determined based on three letters that preceded and three letters that succeed the particular letter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Asapp, Inc.
Original Assignee
Joseph E. Pechter, William H. Pechter
Inventors
Pechter, Joseph E., Pechter, William H.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/653,382
Time in Patent Office

1,684 Days
Field of Search

704/267, 704/260
US Class Current

704/267
CPC Class Codes

G10L 13/08 Text analysis or generation...

Method for producing a speech rendition of text from diphone sounds

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method for producing a speech rendition of text from diphone sounds

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links