Text to speech system and method having interactive spelling capabilities

US 7,490,039 B1
Filed: 12/13/2001
Issued: 02/10/2009
Est. Priority Date: 12/13/2001
Status: Active Grant

First Claim

Patent Images

1. A text-to-speech (TTS) system, comprising:

a memory operable to store a text file and an audio file; and

a TTS module operable to;

convert a plurality of textual words in the text file to a plurality of audible words;

store the audible words in an audio file, the audio file including a plurality of electronic markers embedded in the audio file; and

store for each audible word;

a first location locating the audible word in the audio file; and

a second location locating the corresponding textual word in the text file; and

transmit the audible words to a telecommunication device operable to play the audio file to a user;

an output device operable to play the audio file to a user;

an interface operable to receive a voice command to spell one of the audible words during the playing of the audio file; and

a processor operable to;

remove the electronic markers from the audio file during playback;

track the number of words played by counting the number of electronic markers removed;

determine the textual word corresponding to the audible word to be spelled; and

audibly spell the textual word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for audibly spelling a word in an audio file includes playing an audio file to a user, receiving a command to spell a word in the audio file from the user, identifying a textual word in a text file corresponding to the word, and audibly spelling the textual word. A text-to-speech system includes a memory and a text-to-speech module. The text-to-speech module generates an audio file from a text file, and stores in the memory locations for words in the audio file corresponding to locations of words in the text file.

245 Citations

32 Claims

1. A text-to-speech (TTS) system, comprising:
- a memory operable to store a text file and an audio file; and
  
  a TTS module operable to;
  
  convert a plurality of textual words in the text file to a plurality of audible words;
  
  store the audible words in an audio file, the audio file including a plurality of electronic markers embedded in the audio file; and
  
  store for each audible word;
  
  a first location locating the audible word in the audio file; and
  
  a second location locating the corresponding textual word in the text file; and
  
  transmit the audible words to a telecommunication device operable to play the audio file to a user;
  
  an output device operable to play the audio file to a user;
  
  an interface operable to receive a voice command to spell one of the audible words during the playing of the audio file; and
  
  a processor operable to;
  
  remove the electronic markers from the audio file during playback;
  
  track the number of words played by counting the number of electronic markers removed;
  
  determine the textual word corresponding to the audible word to be spelled; and
  
  audibly spell the textual word.

2. A method for relating words in an audio file to words in a text file, comprising:
- retrieving a text file comprising a textual word;
  
  converting the textual word to an audible word;
  
  storing the audible word in an audio file, the audio file including a plurality of electronic markers embedded in the audio file;
  
  storing a file map, the file map comprising;
  
  a first location locating the audible word within the audio file; and
  
  a second location locating the textual word within the text file; and
  
  transmitting the audio file to a telecommunication device operable to play the audio file to a user;
  
  removing the electronic markers from the audio file during playback;
  
  tracking the number of words played by counting the number of electronic markers removed;
  
  receiving a voice command from a user to spell the audible word;
  
  determining that the textual word corresponds to the audible word; and
  
  audibly spelling the textual word.
- View Dependent Claims (3)
- - 3. The method of claim 2, further comprising repeating the steps of the method for a plurality of textual words in the text file.

4. A method for relating words in an audio file to words in a text file, comprising:
- retrieving a text file comprising a plurality of textual words;
  
  converting the plurality of textual words to a plurality of audible words, each audible word comprising media stream packets;
  
  storing information relating each audible word to a corresponding textual word, wherein the information comprises a plurality of electronic markers embedded in the audio file;
  
  transmitting the audible words to a telecommunication device associated with a user in real time as the audible words are generated;
  
  removing the electronic markers from the audio file during playback;
  
  tracking the number of words played by counting the number of electronic markers removed;
  
  during the playing of the audible words, determining a current textual word corresponding to the audible word currently being played.
- View Dependent Claims (5, 6, 7)
- - 5. The method of claim 4, wherein the textual words comprise ASCII text.
  - 6. The method of claim 4, further comprising:
    - after each audible word is played, storing information about the audible word, the information comprising;
      
      an identifier for the textual word corresponding to the audible word; and
      
      a time at which the audible word was played.
  - 7. The method of claim 4, wherein the steps of the method are performed by logic embodied in a computer readable medium.

8. A method for relating words in an audio file to words in a text file, comprising:
- retrieving a text file comprising a textual word;
  
  converting the textual word to an audible word, the audible word comprising media stream packets;
  
  storing an identifier for the textual word;
  
  repeating the steps of the method for a plurality of textual words in the text file to generate an audio file of a plurality of audible words;
  
  storing information relating each audible word to a corresponding textual word, wherein the information comprises a plurality of electronic markers embedded in the audio file;
  
  transmitting the audio file to a telecommunication device operable to play the audio word to a user;
  
  removing the electronic markers from the audio file during playback; and
  
  tracking the number of words played by counting the number of electronic markers removed.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, further comprising repeating the steps of the method for a plurality of textual words in the text file.
  - 10. The method of claim 8, further comprising:
    - receiving a command from a user to spell the audible word;
      
      determining that the textual word corresponds to the audible word; and
      
      audibly spelling the textual word.

11. A method for audibly spelling a word in an audio file, comprising:
- retrieving a text file comprising a textual word;
  
  converting the textual word to an audible word, the audible word comprising media stream packets;
  
  storing the audible word in an audio file, the audio file comprising a plurality of audible words converted from a plurality of textual words and a plurality of electronic markers embedded in the audio file;
  
  playing the audio file to a user;
  
  removing the electronic markers from the audio file during playback;
  
  tracking the number of words played by counting the number of electronic markers removed;
  
  receiving from the user a voice command to spell an audible word in the audio file;
  
  in response to the voice command, using the number of electronic markers removed to identify in a text file a textual word corresponding to the audible word; and
  
  audibly spelling the textual word.
- View Dependent Claims (12, 13, 14)
- - 12. The method of claim 11, wherein receiving the command comprises receiving a barge-in command during the playing of the audio file, and the method further comprises:
    - stopping the playback of the audio file;
      
      identifying the last word played before the barge-in command was received; and
      
      selecting the last word played as the audible word to be spelled.
  - 13. The method of claim 12, further comprising:
    - receiving a command from the user to resume playing the audio file; and
      
      playing the audio file from the point at which playback was stopped.
  - 14. The method of claim 11, further comprising:
    - receiving a command from the user to select a new textual word from the text file; and
      
      audibly spelling the new textual word.

15. An interactive voice response server (IVR), comprising:
- an interface operable to;
  
  play an audio file to a user, the audio file comprising a plurality of audible words converted from a plurality of textual words and a plurality of electronic markers embedded in the audio file; and
  
  receive a voice command to spell an audible word in the audio file from the user; and
  
  a processor operable to;
  
  remove the electronic markers from the audio file during playback;
  
  track the number of words played by counting the number of electronic markers removed;
  
  identify an audible word to be spelled in response to the voice command to spell;
  
  in response to the voice command, identify a textual word in a text file corresponding to the audible word to be spelled; and
  
  audibly spell the textual word.
- View Dependent Claims (16, 17, 18)
- - 16. The IVR of claim 15, further comprising an adaptive speech recognition (ASR) module operable to:
    - receive speech from the user; and
      
      parse the speech into recognizable grammar, words or vocabulary.
  - 17. The IVR of claim 15, wherein:
    - the interface is further operable to receive a command from the user to resume playing the audio file; and
      
      the processor is further operable to resume playing the audio file in response to the command.
  - 18. The IVR of claim 15, wherein:
    - the interface is further operable to receive a command to select a new textual word from the text file; and
      
      the processor is further operable to select and to audibly spell the new textual word.

19. A computer readable medium encoded with logic capable of being executed by a processor to perform the steps of:
- retrieving a text file comprising a textual word;
  
  converting the textual word to an audible word, the audible word comprising media stream packets;
  
  playing an audio file to a user, the audio file comprising a plurality of audible words converted from a plurality of textual words and a plurality of electronic markers embedded in the audio file;
  
  removing the electronic markers from the audio file during playback;
  
  tracking the number of words played by counting the number of electronic markers removed;
  
  receiving from the user a voice command to spell an audible word in the audio file;
  
  in response to the voice command, identifying in a text file a textual word corresponding to the audible word; and
  
  audibly spelling the textual word.
- View Dependent Claims (20, 21, 22)
- - 20. The logic of claim 19, wherein receiving the command comprises receiving a barge-in command during the playing of the audio file, and the logic is further operable to perform the steps of:
    - stopping the playback of the audio file;
      
      identifying the last audible word played before the barge-in command was received; and
      
      selecting the last audible word played as the audible word to be spelled.
  - 21. The logic of claim 19, wherein the logic is further operable to perform the steps of:
    - receiving a command from the user to resume playing the audio file; and
      
      playing the audio file approximately from a point at which playback was stopped.
  - 22. The logic of claim 19, wherein the logic is further operable to perform the steps of:
    - receiving a command from the user to select a new textual word from the text file; and
      
      audibly spelling the new textual word.

23. A computer readable medium encoded with logic capable of being executed by a processor to perform the steps of:
- selecting a textual word in a text file;
  
  converting the textual word to an audible word;
  
  storing the audible word in an audio file;
  
  storing a file map, the file map comprising;
  
  a first location locating the audible word within the audio file; and
  
  a second location locating the textual word within the text file; and
  
  transmitting the audio file to a telecommunication device operable to play the audio file to a user;
  
  removing the electronic markers from the audio file during playback;
  
  tracking the number of words played by counting the number of electronic markers removed;
  
  receiving a voice command from a user to spell the audible word;
  
  determining that the textual word corresponds to the audible word; and
  
  audibly spelling the textual word.
- View Dependent Claims (24)
- - 24. The logic of claim 23, further operable to repeat the steps for a plurality of textual words in the text file.

25. A method for synchronizing audible words with textual words in a text file, comprising:
- retrieving a text file comprising a plurality of textual words;
  
  generating a plurality of audio files by converting the plurality of textual words to a plurality of audible words, each audio file comprising an audible word corresponding to one of the textual words;
  
  for each audio file, storing information relating the audio file to the corresponding textual word, the information comprising an electronic marker within the audio file that indicates the position of the audible word within the text file user;
  
  removing the electronic markers from the audio file during playback; and
  
  tracking the number of words played by counting the number of electronic markers removed.
- View Dependent Claims (26)
- - 26. The method of claim 25, wherein the steps are performed by logic embodied in a computer readable medium.

27. A system for spelling words in an audio file, comprising:
- means for playing an audio file to a user, the audio file comprising a plurality of audible words converted from a plurality of textual words;
  
  means for removing the electronic markers from the audio file during playback;
  
  means for tracking the number of words played by counting the number of electronic markers removed;
  
  means for receiving from the user a voice command to spell an audible word in the audio file;
  
  means for identifying in a text file a textual word corresponding to the audible word in response to the voice command; and
  
  means for audibly spelling the textual word.

28. A method for relating words in an audio file to words in a text file, comprising:
- retrieving a text file comprising a plurality of textual words;
  
  generating an audio file by converting the plurality of textual words to a plurality of audible words;
  
  storing information relating each audible word to a corresponding textual word, wherein the information comprises a plurality of electronic markers embedded in the audio file;
  
  transmitting the audio file to a telecommunication device operable to play the audio file to a user;
  
  removing the electronic markers from the audio file during playback; and
  
  tracking the number of words played by counting the number of electronic markers removed.
- View Dependent Claims (29, 30, 31, 32)
- - 29. The method of claim 28, wherein the textual words comprise ASCII text.
  - 30. The method of claim 28, wherein the audio file is stored in the form of a WAV file.
  - 31. The method of claim 28, wherein the information comprises a file map relating a location of each textual word within the text file to a location of the corresponding audible word in the audio file.
  - 32. The method of claim 28, wherein the steps of the method are performed by logic embodied in a computer readable medium.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Patel, Labhesh, Sarkar, Shantanu, Shaffer, Shmuel
Primary Examiner(s)
Armstrong; Angela A

Application Number

US10/020,102
Time in Patent Office

2,616 Days
Field of Search

704/235, 704/260, 704/270, 704/270.1, 704/275
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

G10L 15/26 Speech to text systems G10L...

Text to speech system and method having interactive spelling capabilities

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

245 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Text to speech system and method having interactive spelling capabilities

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

245 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links