Method and system for improving pronunciation in a voice control system

US 5,787,231 A
Filed: 02/02/1995
Issued: 07/28/1998
Est. Priority Date: 02/02/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A voice enunciation system in a data processing system comprising:

a. a processor comprising a central processing unit and memory;

b. an audio signal output device;

c. the processor memory further comprisingi. a work queue for receiving text words for processing;

ii. a playback queue for receiving text words from the work queue for audibly pronouncing the text words on the audio signal output device, andiii. a dictionary for storing preferred pronunciations of words; and

d. the processor further providing means fori. storing text words in a memory;

ii. sequentially extracting text words from the memory;

iii. attempting to look up each of the sequentially extracted words in a dictionary and if a word is found in the dictionary, placing that word on a work queue as a wave file entry, and if the word is not found in the dictionary, placing that word on the work queue as a word string entry;

iv. continuing to place words on the work queue until a predetermined threshold number of words have been placed on the work queue;

v. when the predetermined threshold number of words have been placed on the work queues starting an asynchronous play thread, the asynchronous play thread comprising(a) extracting an entry from the work queue;

(b) determining if the entry is a wave file entry or a word string entry;

(c) if the entry is a wave file entry, audibly playing the wave file, and(d) if the entry is a word string audibly playing the word string phonetically;

vi. once an entry has been audibly played, placing that entry on a playback queue until the playback queue is full; and

vii. once the playback queue is full, deleting the oldest entry from the playback queue.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice enunciation system and method provides a user with the capability to sound out text files. As the files are audibly played, if the user is not satisfied with the pronunciation of a particular word, the system provides the user with the means of replacing the word with his own particular pronunciation. The preferred pronunciation is also stored in an override dictionary so that any subsequent encounter with that particular word is pronounced correctly.

52 Citations

View as Search Results

26 Claims

1. A voice enunciation system in a data processing system comprising:
- a. a processor comprising a central processing unit and memory;
  
  b. an audio signal output device;
  
  c. the processor memory further comprisingi. a work queue for receiving text words for processing;
  
  ii. a playback queue for receiving text words from the work queue for audibly pronouncing the text words on the audio signal output device, andiii. a dictionary for storing preferred pronunciations of words; and
  
  d. the processor further providing means fori. storing text words in a memory;
  
  ii. sequentially extracting text words from the memory;
  
  iii. attempting to look up each of the sequentially extracted words in a dictionary and if a word is found in the dictionary, placing that word on a work queue as a wave file entry, and if the word is not found in the dictionary, placing that word on the work queue as a word string entry;
  
  iv. continuing to place words on the work queue until a predetermined threshold number of words have been placed on the work queue;
  
  v. when the predetermined threshold number of words have been placed on the work queues starting an asynchronous play thread, the asynchronous play thread comprising(a) extracting an entry from the work queue;
  
  (b) determining if the entry is a wave file entry or a word string entry;
  
  (c) if the entry is a wave file entry, audibly playing the wave file, and(d) if the entry is a word string audibly playing the word string phonetically;
  
  vi. once an entry has been audibly played, placing that entry on a playback queue until the playback queue is full; and
  
  vii. once the playback queue is full, deleting the oldest entry from the playback queue.
- View Dependent Claims (2, 3, 4)
- - 2. The voice enunciation system of claim 1 wherein the receipt of text data for processing by the work queue is asynchronous with the receipt of text data by the playback queue.
  - 3. The voice enunciation system of claim 2 further comprising means for providing uninterrupted receipt of text data by the playback queue from the work queue.
  - 4. The voice enunciation system of claim 1 further comprising means for selectively storing preferred pronunciations in the dictionary.

5. A voice enunciation method comprising the steps of:
- a. storing text words in a memory;
  
  b. sequentially extracting text words from the memory;
  
  c. attempting to look up each of the sequentially extracted words in a dictionary and if a word is found in the dictionary, placing that word on a work queue as a wave file entry, and if the word is not found in the dictionary, placing that word on the work queue as a word string entry;
  
  d. continuing to place words on the work queue until a predetermined threshold number of words have been placed on the work queue;
  
  e. when the predetermined threshold number of words have been placed on the work queue, starting an asynchronous play thread, the asynchronous play thread comprisingi. extracting an entry from the work queue;
  
  ii. determining if the entry is a wave file entry or a word string entry;
  
  iii. if the entry is a wave file entry, audibly playing the wave file; and
  
  iv. if the entry is a word string audibly playing(l the word string phonetically;
  
  f. once an entry has been audibly played, placing that entry on a playback queue until the playback queue is full; and
  
  g. once the playback queue is full, deleting the oldest entry from the playback queue.
- View Dependent Claims (6, 7, 8, 9, 10)
- - 6. The method of claim 5, further comprising the steps of:
    - a. continuing to place words on the work queue until the work queue is full; and
      
      b. when the work queue is full, waiting until memory space is available on the work queue.
  - 7. The method of claim 5 further comprising the step of interrupting the audible playing of words from the work queue.
  - 8. The method of claim 7 further comprising the step of audibly playing words from the playback queue in last-in-first out order.
  - 9. The method of claim 8 further comprising the step of replacing an entry in the playback queue.
  - 10. The method of claim 8 further comprising the step of updating the dictionary with a user selectable wave file.

11. A method in a data processing system for enhancing voice pronunciation of a textual input stream comprising the steps of:
- receiving text from the textual input stream;
  
  customizing a customizable pronunciation dictionary by a user immediately upon recognition by the user that one or more textual portions from the textual input stream was mispronounced the customizing step further comprisinginvoking a process interruption by a user during processing of the textual input stream,automatically suspending the process before completing processing of the textual input stream, andpresenting an appropriate interface for selecting and editing the textual portions for proper pronunciations;
  
  comparing the text with the customizable pronunciation dictionary;
  
  determining a sound interface input in accordance with one of a plurality of playing methods for playing sound associated with the text; and
  
  routing the sound interface input to an appropriate device interface in accordance with the one of a plurality of playing methods.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11, wherein the step of determining a sound interface input further comprises the steps of:
    - receiving a found status or a not found status upon search of the text with the customizable pronunciation dictionary;
      
      preparing the text for a first interface which will play sound according to the text provided as input to the first interface when the status is a not found status; and
      
      preparing a wave file associated with the text for a second interface which will play sound according to the wave file provided as input to the second interface and which corresponds to the text matched in the customizable pronunciation dictionary when the status is a found status.
  - 13. The method of claim 11 wherein routing the sound interface input to an appropriate device interface comprises routing the input to a text-to-speech process.
  - 14. The method of claim 11 wherein routing the sound interface input to an appropriate device interface comprises routing the input to a wave file play process.
  - 15. The method of claim 14 wherein the step of invoking an interruption is carried out through a voice command.
  - 16. The method of claim 14 wherein proper pronunciations are saved into the customizable pronunciation dictionary.
  - 17. The method of claim 14 wherein the customizable pronunciation dictionary comprises one or more records, each record containing at least two fields, the at least two fields comprising a textual string field and an associated wave file field for sound associated with the textual string.
  - 18. The method of claim 11 wherein the step of presenting an appropriate interface permits playback of a previously defined number of entries.

19. Apparatus for enhancing voice pronunciation of a textual input stream in a data processing system comprising:
- means for receiving text from the textual input stream;
  
  means for comparing the text with a customizable pronunciation dictionary, the customizable pronunciation dictionary including means for customizing the pronunciation dictionary by a user immediately upon recognition by the user that one or more textual portions from the textual input stream was mispronounced, wherein the means for customizing further comprisesmeans for invoking a process interruption by a user during processing of the textual input stream.means for automatically suspending the process before completing processing of the textual input stream, andmeans for presenting an appropriate interface for selecting and editing the textual portions for proper pronunciations;
  
  means for determining a sound interface input in accordance with one of a plurality of playing methods for playing sound associated with the text; and
  
  means for routing the sound interface input to an appropriate device interface in accordance with the one of a plurality of playing methods.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
- - 20. The apparatus of claim 19, wherein the means for determining a sound interface input further comprises:
    - means for receiving a found status or a not found status upon search of the text with the customizable dictionary;
      
      means for preparing the text for a first interface which will play sound according to the text provided as input to the first interface when the status is a not found status; and
      
      means for preparing a wave file associated with the text for a second interface which will play sound according to the wave file provided as input to the second interface and which corresponds to the text matched in the customizable dictionary when the status is a found status.
  - 21. The apparatus of claim 19 wherein the means for routing the sound interface input to an appropriate device interface comprises a means for routing the input to a text-to-speech process.
  - 22. The apparatus of claim 19 wherein the means for routing the sound interface input to an appropriate device interface comprises a means for routing the input to a wave file play process.
  - 23. The apparatus of claim 19 wherein the means for invoking an interruption is actuated through a voice command.
  - 24. The apparatus of claim 19 further comprising means for saving proper pronunciations into the customizable dictionary.
  - 25. The apparatus of claim 19 wherein the customizable pronunciation dictionary comprises one or more records, each record containing at least two fields, the at least two fields comprising a textual string field and an associated wave file field for sound associated with the textual string.
  - 26. The apparatus of claim 19 wherein the means for presenting an appropriate interface permits playback of a previously defined number of entries.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Johnson, William, Weber, Owen
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
MATTSON, ROBERT

Application Number

US08/382,737
Time in Patent Office

1,272 Days
Field of Search

395/2.69, 395/2.84
US Class Current

704/260
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/04   Details of speech synthesis...

G10L 13/047   Architecture of speech synt...

Method and system for improving pronunciation in a voice control system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

52 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for improving pronunciation in a voice control system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

52 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links