Method and system for providing audio playback of a multi-source document

US 6,446,041 B1
Filed: 10/27/1999
Issued: 09/03/2002
Est. Priority Date: 10/27/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for linking audio to text in a multi-source input and playback system, said method comprising the steps of:

dictating one or more words;

transcribing the one or more words to form a first text set within a document;

storing the first text set on a storage medium;

comparing each audio element of a stored audio version of the one or more words with each corresponding text element of the first text set;

inserting second text into the document, wherein the second text is non-audio text;

associating a text-to-speech entry with the second text; and

forming a continuous stream of audio from (1) stored audio data corresponding to the first text set and (2) the text-to-speech entry corresponding to the second text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-source input and playback utility that accepts inputs from various sources, transcribes the inputs as text, and plays aloud user-selected portions of the text is disclosed. The user may select a portion of the text and request audio playback thereof. The utility examines each transcribed word in the selected text. If stored audio data is associated with a given word, that audio data is retrieved and played. If no audio data is associated, then a textto-speech entry or series of entries is retrieved and played instead.

139 Citations

30 Claims

1. A method for linking audio to text in a multi-source input and playback system, said method comprising the steps of:
- dictating one or more words;
  
  transcribing the one or more words to form a first text set within a document;
  
  storing the first text set on a storage medium;
  
  comparing each audio element of a stored audio version of the one or more words with each corresponding text element of the first text set;
  
  inserting second text into the document, wherein the second text is non-audio text;
  
  associating a text-to-speech entry with the second text; and
  
  forming a continuous stream of audio from (1) stored audio data corresponding to the first text set and (2) the text-to-speech entry corresponding to the second text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising the step of playing back the continuous stream of audio in an order corresponding to a placement of said first text and said second text within the document.
  - 3. The method of claim 2, wherein said second text is inserted between a first dictated word and a second dictated word.
  - 4. The method of claim 2, wherein a first portion of the second text precedes a first dictated word and a second portion of the second text follows the first dictated word.
  - 5. The method of claim 2, wherein a first portion of the first text set and a second portion of the first text set are separated from one another by a portion of the second text, and wherein a first portion of the second text and a second portion of the second text are separated from one another by a portion of the first text set.
  - 6. A computer configured for performing the method of claim 2.
  - 7. The method of claim 2, wherein the non-audio text is inserted into the document by one or more of (a) typing the non-audio text into the document using a keyboard, (b) copying the non-audio text into the document using a mouse, and (c) handwriting text which is converted to non-audio text using a handwriting recognition program module.
  - 8. A computer configured for performing the method of claim 1.

9. A computer-implemented method for creating and vocalizing a document, comprising the steps of:
- speaking one or more words into an input device;
  
  transcribing the one or more words as a first text entry within a document;
  
  storing the one or more words on a storage-medium;
  
  comparing each word of the one or more words with each word of said first text entry;
  
  inputting a second text entry within the document, wherein the step of inputting the second text entry does not comprise speaking;
  
  assigning a text-to-speech entry to said second text entry; and
  
  playing back the one or more words and the text-to-speech entry in an order corresponding to a placement of the first and second entries within said document.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 10. The method of claim 9, further comprising analyzing one or more vocal characteristics of the one or more words and adjusting one or more properties of the text-to-speech entry to match the one or more vocal characteristics of the one or more words.
  - 11. The method of claim 10, wherein a prosody element of the text-to-speech entry is adjusted.
  - 12. The method of claim 11, wherein the prosody element comprises pitch, speed, volume, or a combination thereof.
  - 13. The method of claim 9, further comprising:
14. The method of claim 9, wherein a cessation of the first text entry and a beginning of the second text entry is signaled by a non-alphanumeric character.
15. The method of claim 9, wherein the first and second text entries comprise pictographic characters.
16. The method of claim 15, wherein the pictographic characters are Kanji characters.
17. A computer configured for performing the method of claim 9.
18. The method of claim 9, wherein a shape of a letter within a word of the first text entry, the second text entry, or both, varies depending on a location of the letter within the word.
19. The method of claim 9, wherein the first and second text entries are read from right to left.
20. The method of claim 9, wherein the second text entry is inputted by one or more of (a) typing the second text entry into the document using a keyboard, (b) copying the second text entry into the document using a mouse, and (c) handwriting text which is converted to the second text entry using a handwriting recognition program module.

21. A computer-implemented method for providing audio playback of a text document, comprising the steps of:
- selecting a text set comprising at least one word, wherein each word comprises at least one phoneme;
  
  determining whether a user-dictated audio input corresponds to a first word of the text set;
  
  in the event that a user-dictated audio input corresponds to the first word, playing the user-dictated audio input through an audio output device;
  
  otherwise, determining whether one of a plurality of text-to-speech entries corresponds to the first word;
  
  in the event that a text-to-speech entry corresponds to the first word, playing the text-to-speech entry through an audio output device;
  
  otherwise, determining which of the plurality of text-to-speech entries corresponds to a phoneme of the first word; and
  
  in response to determining which of the plurality of text-to-speech entries corresponds to the phoneme of the first word, playing the corresponding text-to-speech entry through an audio output device.
- View Dependent Claims (22, 23, 24, 25)
- - 22. The method of claim 21, wherein:
23. The method of claim 22, further comprising:
- playing back the user-dictated audio input and the text-to-speech entry in an order corresponding to a placement of the first and second words in the text set.
24. The method of claim 21, further comprising:
- determining a plurality of words for which no corresponding user dictated audio input exists;
  
  passing the plurality of words to a text-to-speech module; and
  
  retrieving a text-to-speech entry for each of the plurality of words.
25. A computer configured for performing the method of claim 21.

26. A method for compiling and evaluating text within a document, said method comprising the steps of:
- inputting dictated words into a document;
  
  converting said dictated words into a first text set within said document by use of a voice recognition process;
  
  storing said dictated words separately but linked to said first text set for later audio playback;
  
  inputting non-audio text into said document as a second text set within said document, wherein said non-audio text is inputted by one or more of (a) typing the non-audio text into the document using a keyboard, (b) copying the non-audio text into the document using a mouse, and (c) handwriting text which is converted to the non-audio text using a handwriting recognition program module; and
  
  playing back audio corresponding to said first and second text sets in an order corresponding to a placement of said first and second text sets within said document, wherein a first portion of said audio corresponding to said first text set is provided by playback of said stored dictated words, and a second portion of said audio corresponding to said second text set is provided by playback of a text-to-speech process.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The method of claim 26, wherein said non-audio text is supplied by typing a text entry into said document using a keyboard.
  - 28. The method of claim 26, wherein said non-audio text is supplied by copying a text entry into said document using a mouse.
  - 29. The method of claim 26, herein said non-audio text is supplied by handwriting text onto a writing tablet, wherein the handwritten text is converted to a text entry for said document using a handwriting recognition program module.
  - 30. The method of claim 26, wherein a visual cue corresponding to the audio playback is displayed on a display screen.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Kim, Paul Kyong Hvan, Reynar, Jeffrey C., Rucker, Erik
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
NOLAN, DANIEL A

Application Number

US09/428,259
Time in Patent Office

1,042 Days
Field of Search

704/260, 704/258, 704/270, 704/235, 704/278, 704/275
US Class Current

704/260
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 13/04   Details of speech synthesis...

G10L 2015/221   Announcement of recognition...

Method and system for providing audio playback of a multi-source document

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

139 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for providing audio playback of a multi-source document

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

139 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links